Skip to Quick Links BarSkip to Page Content
NCSU Libraries
Search the Collection
Browse Subjects
Services
Library Information
Community
News & Events
Services
Get Answers Now

Home: BIT 410/510

Sequence Information

Databases

Journals

Methods & Protocols

Cloning Links

Search Tips
Gen Bank & Pub Med
BLAST

BIT 410: Manipulation of Recombinant DNA

Sequence Information and Analysis

National Center for Biotechnology Information (NCBI)

NCBI is part of the National Library of Medicine at the National Institutes of Health. It features many sequence and other types of databases and tools, such as GenBank (DNA sequences), BLAST for sequence similarity searching and comparisons, and maps of the human and bacterial genomes. PubMed is linked with NCBI, and many of the databases and tools (covering genes to proteins for 100s of organisms) are interlinked with journal articles as well.

RECENTLY RELEASED! Tip sheets for searching PubMed and GenBank, and how to do a BLAST search. Files are best viewed in Internet Explorer, or Mozilla.

Descriptions and Relationships between NCBI Databases

Descriptions and links to NCBI databases are available from the Site Map on the NCBI home page and the Quick Links Table allows you to link directly to the database or tool of your choice.

Searching NCBI: Entrez

Entrez is the basic tool for searching all NCBI databases. Entrez can be used to search individual databases, such as PubMed for literature or GenBank for gene sequences. You can also use Entrez to search several databases at once. This Data Model demonstrates the relationships between the NCBI databases.

null Top of Page

Nucleotide Databases

Genbank (NCBI)

The GenBank database is an annotated collection of all publicly available nucleotide sequences and their protein translations. The database includes sequences from >130,000 organisms. Records that are annotated with coding region (CDS) features also include amino acid translations. GenBank is updated daily.

GenBank is part of an international collaboration of sequence databases which also includes DDBJ (DNA Data Bank of Japan) and EMBL (European Molecular Biology Laboratory). The three databases share data daily. Record formats and search systems differ among the databases, but the accession numbers, sequence data, and annotations are the same in all of them.

Note: Use Entrez Nucleotide to search GenBank. More about Genbank.

RefSeq Database (NCBI)

The Reference Sequence (RefSeq) database provides a biologically non-redundant collection of DNA, RNA, and protein sequences. Each RefSeq represents a single, naturally occurring molecule from a particular organism. RefSeqs are frequently based on GenBank records but differ in that each RefSeq is a synthesis of information, not a piece of a primary research data in itself. Similar to a review article in the literature, a RefSeq is an interpretation by a particular group at a particular time. RefSeqs can be retrieved in several different ways: by searching the Entrez Nucleotide or Protein database, by BLAST searching, by FTP, or through links from other NCBI resources. (Description from the NCBI Handbook.)

Note: To limit your search to RefSeq begin your search in Entrez Nucleotide and use Limits to restrict your search RefSeq only.

null Top of Page

Protein Databases

Entrez Protein (NCBI)

Database includes protein sequences from a variety of sources, including SWISS-Prot, Protein Information Resource (PIR), Protein Data Bank (PDB), and Protein Resource Foundation (PRF). It also includes some sequences predicted from coding regions in DNA sequences from GenBank and RefSeq.

Swiss-Prot (EMBL)

Protein sequences in Swiss-Prot are reviewed and highly annotated by experts, and the database is mimimally redundant. Because of the high degree of review and annotation of protein sequences the database does not contain all currently available protein sequences. These sequences are stored in another database called TrEMBL (translations of EMBL).

PIR: Protein Information Resource

An database formed by an international collaboration. It contains highly annotated and reviewed protein sequences and links to related databases.

Note: You can also search Swiss-Prot and PIR by limiting your search to one of these database from within Entrez Protein.

Sequence Similarity Searching and Comparisons

BLAST (NCBI)

Basic Local Alignment Search Tool (BLAST) is a search algorithm that looks for local areas of sequence similarity (versus programs that perform global sequence alignments). BLAST performs only pairwise sequence alignments. Many different types of sequences can be compared (e.g., nucleotides to nucleotides, amino acids to amino acids, and translated nucleotide sequences to amino acids).

ClustalW (EMBL)

"Clustal W is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Evolutionary relationships can be seen via viewing Cladograms or Phylograms." (Description from the ClustalW home page.)

null Top of Page

Other Sequence Analysis Tools

123Genomics: Sequence, Structure & Function Analysis

An extensive collection of links to all types of sequence analysis tools that are free on the web. Other parts of the site contain links to related information such as sequence databases, methods and protocols, etc.

Other Resources at NCBI

NCBI also includes several resources for sequence, homology, and conserved domain searching; protein structure analysis and comparisons; and genomic maps and analyses. Most of these resources can be accessed from the NCBI home page or the Site Map.

Some Other Useful Links

Gene and Protein Database Guide

Prepared by Oak Ridge National Laboratory (ORNL) the guide provides a detailed overview of major gene and protein databases related to the Human Genome Project and human genes and diseases; includes links to search tips and links to revies, tip sheets, etc.

Nucleic Acids Research

The journal Nucleic Acids Research publishes two special issues each year on molecular biology resources:

Database Issue (January 1, open access)
The 2005 Database Issue of Nucleic Acids Research is the 12th in a series dedicated to factual biological databases (719 are described). Such databases are an essential resource for working biologists and this compilation provides descriptions of the most important of these databases and serves to introduce newly compiled databases that provide specialist information in the biological area.

The Database Issue is open to submissions from commercial databases that are not freely available. These articles undergo exactly the same peer-review process as normal articles but the commercial databases will have paid commercial rates instead of the usual author charge.

Web Server Issue (July 1, open access)
The Web Server Issue highlights the many servers that are available on the web to perform useful computations on DNA, RNA and protein sequences and structures. Between them, the two issues provide an unparalleled array of useful computational services. The new Web Server Issue aims to provide a repository in which authors of web servers can highlight their offerings and readers can find out what is available.

In the 2005 issue there are reports of 166 web servers that run the gamut from BLAST services to three-dimensional protein structure prediction. The servers described have all been subjected to rigorous peer review, are available free of charge and provide invaluable resources to the scientific community.

null Top of Page

Librarian Contact Information

NCSU Libraries Copyright | Disclaimer | Accessibility | Text Only | Contact Us | Staff Only NC State University