Protein Family and Domain Information Sources
A Conserved Domain Database (CDD; National Center for Biotechnology
Information)
http://www.ncbi.nlm.nih.gov/Structure
/cdd/cdd.shtml
NCBI’s Conserved Domain Database includes multiple sequence alignments
for domains and full-length proteins. One section of the database is curated
at NCBI and attempts to group ancient domains related by common descent into
family hierarchies. The un-curated section includes domains imported from
SMART, Pfam and COGs. These databases also provide descriptions and links
to citations.
SCOP: Structural Classification of Proteins
http://scop.mrc-lmb.cam.ac.uk/scop/
The purpose of this database is to provide a detailed and comprehensive
description of the structural and evolutionary relationships between all
proteins whose structure is known, including all entries in the Protein Data
Bank. The SCOP classification of proteins has been constructed manually by
visual inspection and comparison of structures.
InterPro
http://www.ebi.ac.uk/interpro/
InterPro is a database of protein families, domains, and functional sites
in which identifiable features found in known proteins can be applied to
unknown protein sequences. InterPro provides an integrated view of the commonly
used signature databases. While ach database has its own valuable and specific
features, all are searchable together through InterPro.
The databases listed below are part of InterPro as well as independent databases
regarding protein families, domains, and signatures (descriptions taken from
InterPro or database web sites).
- Uniprot
http://www.ebi.ac.uk/uniprot/
The UniProt (Universal Protein Resource) is a comprehensive catalog
of information on proteins. It is a central repository of protein sequence
and function created by joining the information contained in Swiss-Prot,
TrEMBL, and PIR.
- Gene3D
http://cathwww.biochem.ucl.ac.uk:8080/Gene3D/
Gene3D database describes protein families and domain architectures
in complete genomes. Protein families are formed using a Markov clustering
algorithm, followed by multi-linkage clustering according to sequence identity.
Mapping of predicted structure and sequence domains is undertaken using
hidden Markov models libraries representing CATH and Pfam domains. Functional
annotation is provided to proteins from multiple resources.
- PANTHER: Protein Analysis through Evolutionary Relationships
http://www.pantherdb.org/
PANTHER classifies genes by their functions, using published scientific
experimental evidence and evolutionary relationships to predict function
even in the absence of direct experimental evidence. Proteins are classified
by expert biologists into families and subfamilies of shared function, which
are then categorized by molecular function and biological process ontology
terms.
- Pfam
http://pfam.sanger.ac.uk/
Pfam is a large collection of multiple sequence alignments and hidden
Markov models covering many common protein domains.
- PIRSF: Protein Information Resource (PIR) Superfamily Classification
http://pir.georgetown.edu/pirwww/dbinfo/pirsf.shtml
The PIRSF protein classification system is a network with multiple levels
of sequence diversity from superfamilies to subfamilies that reflects the
evolutionary relationship of full-length proteins and domains. The primary
PIRSF classification unit is the homeomorphic family, whose members are both
homologous (evolved from a common ancestor) and homeomorphic (sharing full-length
sequence similarity and a common domain architecture).
- PRINTS
http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/
PRINTS is a compendium of protein fingerprints. A fingerprint is a
group of conserved motifs used to characterize a protein family. Fingerprints
can encode protein folds and functionalities more flexibly and powerfully
than can single motifs, their full diagnostic potency deriving from the
mutual context afforded by motif neighbors.
- ProSite
http://ca.expasy.org/prosite/
Prosite is a database of protein families and domains. It consists
of biologically significant sites, patterns and profiles that help to reliably
identify to which known protein family (if any) a new sequence belongs.
- SMART
http://smart.embl-heidelberg.de/
SMART (a Simple Modular Architecture Research Tool) allows the identification
and annotation of genetically mobile domains and the analysis of domain
architectures. More than 500 domain families found in signalling, extracellular
and chromatin-associated proteins are detectable. These domains are extensively
annotated with respect to phyletic distributions, functional class, tertiary
structures and functionally important residues.
- SUPERFAMILY
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/
SUPERFAMILY is a library of profile hidden Markov models that represent
all proteins of known structure. SUPERFAMILY provides structural (and hence
implied functional) assignments to protein sequences at the superfamily
level. A superfamily contains all proteins for which there is structural
evidence of a common evolutionary ancestor.
- TIGRFAMS
http://www.tigr.org/TIGRFAMs/index.shtml
TIGRFAMs is a collection of protein families, featuring curated multiple
sequence alignments, hidden Markov models (HMMs) and annotation, which
provides a tool for identifying functionally related proteins based on
sequence homology.
|