Skip to Quick Links BarSkip to Page Content
NCSU Libraries
Search the Collection
Browse Subjects
Services
Library Information
Community
News & Events
Services
Get Answers Now

Home
New
Site Map

Genomes and Chromosomes

Genes

Nucleotide Sequences

Proteins
Sequence
Structure
Families and Domains
Function

Proteomics and Expression

Organisms
Model
Microorganisms
Fungi
Insects
Vertebrates

Plants
Major Collections
Food Plants:
Cereals, Grains, Grasses
Legumes, Fruits, Vegetables
Trees

Plant Pathogens

Taxonomy & Phylogeny

Faculty Publications

More
About
Library Services
Contact Us

Library Resources
Journals
Databases
Alerts

Eleanor Smith
Life Sciences Librarian
919-513-3969

Protein Family and Domain Information Sources

Featured Articles:

Evolution of protein domain promiscuity in eukaryotes

Basu MK, Carmel L, Rogozin IB, and Koonin EV.
GENOME RESEARCH 18(3):449-461, MARCH 2008

The folding and evolution of multidomain proteins

Han JH, Batey S, Nickson AA, Teichmann SA, and Clarke J.
NATURE REVIEWS MOLECULAR CELL BIOLOGY 8(4): 319-330, 2007

A Conserved Domain Database (CDD; National Center for Biotechnology Information)
http://www.ncbi.nlm.nih.gov/Structure
/cdd/cdd.shtml
NCBI’s Conserved Domain Database includes multiple sequence alignments for domains and full-length proteins. One section of the database is curated at NCBI and attempts to group ancient domains related by common descent into family hierarchies. The un-curated section includes domains imported from SMART, Pfam and COGs. These databases also provide descriptions and links to citations.

SCOP: Structural Classification of Proteins
http://scop.mrc-lmb.cam.ac.uk/scop/
The purpose of this database is to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the Protein Data Bank. The SCOP classification of proteins has been constructed manually by visual inspection and comparison of structures.

InterPro
http://www.ebi.ac.uk/interpro/
InterPro is a database of protein families, domains, and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences. InterPro provides an integrated view of the commonly used signature databases. While ach database has its own valuable and specific features, all are searchable together through InterPro.

The databases listed below are part of InterPro as well as independent databases regarding protein families, domains, and signatures (descriptions taken from InterPro or database web sites).

  • Uniprot
    http://www.ebi.ac.uk/uniprot/
    The UniProt (Universal Protein Resource) is a comprehensive catalog of information on proteins. It is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR.
  • BLOCKS
    http://blocks.fhcrc.org/
    Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins. The blocks for the Blocks Database are made automatically by looking for the most highly conserved regions in groups of proteins documented in the Prosite Database. These blocks are then calibrated against the SWISS-PROT database to obtain a measure of the chance distribution of matches. It is these calibrated blocks that make up the Blocks Database.

  • Gene3D
    http://cathwww.biochem.ucl.ac.uk:8080/Gene3D/
    Gene3D database describes protein families and domain architectures in complete genomes. Protein families are formed using a Markov clustering algorithm, followed by multi-linkage clustering according to sequence identity. Mapping of predicted structure and sequence domains is undertaken using hidden Markov models libraries representing CATH and Pfam domains. Functional annotation is provided to proteins from multiple resources.
  • PANTHER: Protein Analysis through Evolutionary Relationships
    http://www.pantherdb.org/
    PANTHER classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in the absence of direct experimental evidence. Proteins are classified by expert biologists into families and subfamilies of shared function, which are then categorized by molecular function and biological process ontology terms.
  • Pfam
    http://pfam.sanger.ac.uk/
    Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains.
  • PIRSF: Protein Information Resource (PIR) Superfamily Classification
    http://pir.georgetown.edu/pirwww/dbinfo/pirsf.shtml
    The PIRSF protein classification system is a network with multiple levels of sequence diversity from superfamilies to subfamilies that reflects the evolutionary relationship of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture).
  • PRINTS
    http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/
    PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterize a protein family. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs, their full diagnostic potency deriving from the mutual context afforded by motif neighbors.
  • ProSite
    http://ca.expasy.org/prosite/
    Prosite is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs.
  • SMART
    http://smart.embl-heidelberg.de/
    SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. More than 500 domain families found in signalling, extracellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues.
  • SUPERFAMILY
    http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/
    SUPERFAMILY is a library of profile hidden Markov models that represent all proteins of known structure. SUPERFAMILY provides structural (and hence implied functional) assignments to protein sequences at the superfamily level. A superfamily contains all proteins for which there is structural evidence of a common evolutionary ancestor.
  • TIGRFAMS
    http://www.tigr.org/TIGRFAMs/index.shtml
    TIGRFAMs is a collection of protein families, featuring curated multiple sequence alignments, hidden Markov models (HMMs) and annotation, which provides a tool for identifying functionally related proteins based on sequence homology.
NCSU Libraries Copyright | Disclaimer | Accessibility | Text Only | Contact Us | Staff Only NC State University