Skip to Quick Links BarSkip to Page Content
NCSU Libraries
Search the Collection
Browse Subjects
Services
Library Information
Community
News & Events
Services
Get Answers Now

Home
New
Site Map

Genomes and Chromosomes

Genes

Nucleotide Sequences

Proteins
Sequence
Structure
Families and Domains
Function

Proteomics and Expression

Organisms
Model
Microorganisms
Fungi
Insects
Vertebrates

Plants
Major Collections
Food Plants:
Cereals, Grains, Grasses
Legumes, Fruits, Vegetables
Trees

Plant Pathogens

Taxonomy & Phylogeny

Faculty Publications

More
About
Library Services
Contact Us

Library Resources
Journals
Databases

Eleanor Smith
Life Sciences Librarian
919-513-3969

Protein sequence & structure resources

Featured Article:

Beyond the best match: machine learning annotation of protein sequences by integration of different sources of information

Tetko IV, Rodchenkov IV, Walter MC , Rattei T, Mewes HW
BIOINFORMATICS 24(5):621-628, MARCH 2008

sequence | structure

Protein Sequence Databases

Entrez Protein (NCBI)
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein
The content of the Entrez Protein database is derived from several protein databases (Swiss-Prot, PIR, PDB) and translations of the GenBank nucleotide sequence records. Records link to related files for nucleotide and genome sequences and, where available, structural information.

IPI International Protein Index (EMBL-EBI, European Bioinformatics Institute)
http://www.ebi.ac.uk/IPI/IPIhelp.html
IPI provides access to the major databases covering the proteomes of higher eukaryotic organisms along with a database of cross references between the primary data sources. IPI is assembled from protein sequence information taken from: UniProtKB/Swiss-Prot and UniProtKB/TrEMBL, RefSeq, ENSEMBL, TAIR, Vega, and H-invDB. IPI creates a complete, minimally redundant, set of proteins for featured species.

PIR-PSD (Protein Information Resource-International Protein Sequence Database)
http://pir.georgetown.edu/pirwww/dbinfo/pir_psd.shtml
PIR-PSD was the world's first database of classified and functionally annotated protein sequences. The database is nonredundant and annotated by experts. The final version of PIR was released on PIR-PSD in Dec 2004. PIR is one of the co-founders of the UniProt consortium. PIR-PSD sequences and annotations are integrated into the UniProt Knowledgebase and with bi-directional cross-references between the two databases. PIR also provides several other protein related databases.

RefSeq (The Reference Sequence Collection; NCBI)
http://www.ncbi.nlm.nih.gov/RefSeq/
RefSeq aims to provide a comprehensive, non-redundant set of protein sequences for major research organisms (genomic DNA and transcript (RNA) are also included). RefSeq standards provide a stable reference for expression studies and comparative analyses.

Swiss-Prot (Swiss Institute of Bioinformatics and the European Bioinformatics Institute)
http://www.ebi.ac.uk/swissprot/
Swiss-Prot is a manually curated and extensively annotated protein sequence database with a minimal level of redundancy. Annotations may include descriptions of protein function, domains, structure, and post-translational modifications. Swiss-Prot is also integrated with other databases. Swiss-Prot is part of the UniProt consortiumand its contents have been incorporated into the UniProt KnowledgeBase.

TrEMBL (Swiss Institute of Bioinformatics and the European Bioinformatics Institute)
http://www.ebi.ac.uk/trembl/
TrEMBL (Translated EMBL) is an automatically annotated supplement to Swiss-Prot. The database contains the translations of EMBL/GenBank/DDBJ nucleotide sequences. TrEMBL is part of the UniProt consortium and its contents have been incorporated into the UniProt KnowledgeBase.

UniProt (Universal Protein Resource)
http://www.ebi.uniprot.org
Uni-Prot provides access to extensive and expertly curated protein information, including function, classification and cross-references. UniProt was created by joining the information contained in UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, and PIR. UniProt has three sections:

  • The UniProt Knowledgebase (UniProtKB) is the central database containing extensive and expertly annotated protein sequences from Swisss-Prot, TrEMBL, and PIR-PSD.
  • The UniProt Reference Clusters (UniRef) is based on the UniProt knowledgebase. UniRef provide non-redundant reference data by combining closely related sequences into a single record.
  • UniProt Archive (UniParc) is a comprehensive repository that stores the complete body of publicly available protein sequence data. Each unique sequence has a single record with cross-references to the data source. Source databases include, in addition to the UniProt databases, translations from the EMBL-Bank/DDBJ/GenBank nucleotide sequence databases, the International Protein Index (IPI), the Protein Data Bank (PDB), NCBI's Reference Sequence Collection (RefSeq), and several other databases.

Protein Structure Databases

Entrez Structures (The Molecular Modeling Database (MMDB) at NCBI)
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure
MMDB contains 3D protein structures obtained primarily from X-ray crystallography and NMR spectroscopy. The data is a subset of the PDB database. Structure information is linked to the rest of the NCBI databases, including sequences, citations, and taxonomic classifications.

Enzyme Structures Database (EC-PDB)
http://www.ebi.ac.uk/thornton-srv/databases/enzymes/
The database contains known enzyme structures that have been deposited in the Protein Data Bank (PDB). Enzyme structures are classified by their E.C. number and the database is searchable by E.C. number or keywords.

Macromolecular Structure Database (MSD)
http://www.ebi.ac.uk/msd
MSD is a member of the wwwPDB consortion. The wwwPDB is entrusted with the creation of a central depository for the collection, management and distribution of information about macromolecular structures. MSD is involved with several projects where it helps develop tools and systems for ensuring the quality of structure data deposited in databases. Examples of projects include working with the EMDB—Electron Microscopy Database and the CCPN—Collaborative Computing Project for the NMR community.

ModBase (UCSF)
http://modbase.compbio.ucsf.edu/modbase-cgi-new/search_form.cgi
ModBase is database of computationally derived protein structure models calculated by comparative modeling—structure related information is included for all sequences related to a known structure. In addition to protein structure models, MODBASE contains information about putative ligand binding sites, and protein-protein interactions, and SNP annotation.

Protein Data Bank
http://www.rcsb.org/pdb/Welcome.do
Primary database archive of experimentally determined three-dimensional structures of proteins and protein complexes. Three-dimensional structure is shown in two types of display: atomic coordinates and annotations. Includes links to sequence databases. PDB is a member of the wwPDB whose mission is to maintain a publically available single Protein Data Bank Archive of macromolecular structural data.

 

 

 

 

NCSU Libraries Copyright | Disclaimer | Accessibility | Text Only | Contact Us | Staff Only NC State University