BIT 410: Manipulation of Recombinant DNA
Sequence Information and Analysis
National Center for Biotechnology Information (NCBI)
NCBI is part of the National Library of Medicine at the National
Institutes of Health. It features many sequence and other types of
databases and tools, such as GenBank (DNA sequences), BLAST for sequence
similarity searching and comparisons, and maps of the human and bacterial
genomes. PubMed is linked with NCBI, and many of the databases and
tools (covering genes to proteins for 100s of organisms) are interlinked
with journal articles as well.
RECENTLY RELEASED! Tip
sheets for searching PubMed
and GenBank, and
how to do a BLAST
search. Files are best viewed in Internet Explorer, or Mozilla.
Descriptions and Relationships between NCBI Databases
Descriptions and links to NCBI databases are available from the Site
Map on the NCBI home page and the Quick
Links Table allows you to link directly to the database or tool
of your choice.
Searching NCBI: Entrez
Entrez is the basic tool for searching all NCBI databases. Entrez
can be used to search individual databases, such as PubMed for literature
or GenBank for gene sequences. You can also use Entrez to search several
databases at once. This
Data Model demonstrates the relationships between the NCBI databases.
Top of Page
Nucleotide Databases
Genbank
(NCBI)
The GenBank database is an annotated collection of all publicly available
nucleotide sequences and their protein translations. The database
includes sequences from >130,000 organisms. Records that are annotated
with coding region (CDS) features also include amino acid translations.
GenBank is updated daily.
GenBank is part of an international collaboration of sequence databases
which also includes DDBJ (DNA Data Bank of Japan) and EMBL (European
Molecular Biology Laboratory). The three databases share data daily.
Record formats and search systems differ among the databases, but
the accession numbers, sequence data, and annotations are the same
in all of them.
Note: Use Entrez
Nucleotide to search GenBank. More
about Genbank.
RefSeq Database
(NCBI)
The Reference Sequence (RefSeq) database provides a biologically
non-redundant collection of DNA, RNA, and protein sequences. Each
RefSeq represents a single, naturally occurring molecule from a
particular organism. RefSeqs are frequently based on GenBank records
but differ in that each RefSeq is a synthesis of information, not
a piece of a primary research data in itself. Similar to a review
article in the literature, a RefSeq is an interpretation by a particular
group at a particular time. RefSeqs can be retrieved in several
different ways: by searching the Entrez Nucleotide or Protein database,
by BLAST searching, by FTP, or through links from other NCBI resources.
(Description from the NCBI Handbook.)
Note: To limit your search to RefSeq begin your search in Entrez
Nucleotide and use Limits to restrict your search RefSeq only.
Top of Page
Protein Databases
Entrez
Protein (NCBI)
Database includes protein sequences from a variety of sources,
including SWISS-Prot, Protein Information Resource (PIR), Protein
Data Bank (PDB), and Protein Resource Foundation (PRF). It also
includes some sequences predicted from coding regions in DNA sequences
from GenBank and RefSeq.
Swiss-Prot (EMBL)
Protein sequences in Swiss-Prot are reviewed and highly annotated
by experts, and the database is mimimally redundant. Because of
the high degree of review and annotation of protein sequences the
database does not contain all currently available protein sequences.
These sequences are stored in another database called TrEMBL (translations
of EMBL).
PIR: Protein Information
Resource
An database formed by an international collaboration. It contains
highly annotated and reviewed protein sequences and links to related
databases.
Note: You can also search Swiss-Prot and PIR by limiting your search
to one of these database from within Entrez Protein.
Sequence Similarity Searching and Comparisons
BLAST (NCBI)
Basic Local Alignment Search Tool (BLAST) is a search algorithm
that looks for local areas of sequence similarity (versus programs
that perform global sequence alignments). BLAST performs only pairwise
sequence alignments. Many different types of sequences can be compared
(e.g., nucleotides to nucleotides, amino acids to amino acids, and
translated nucleotide sequences to amino acids).
ClustalW
(EMBL)
"Clustal W is a general purpose multiple sequence alignment
program for DNA or proteins. It produces biologically meaningful
multiple sequence alignments of divergent sequences. It calculates
the best match for the selected sequences, and lines them up so
that the identities, similarities and differences can be seen.
Evolutionary relationships can be seen via viewing Cladograms
or Phylograms." (Description from the ClustalW home page.)
Top of Page
Other Sequence Analysis Tools
123Genomics:
Sequence, Structure & Function Analysis
An extensive collection of links to all types of sequence analysis
tools that are free on the web. Other parts of the site contain links
to related information such as sequence databases, methods and protocols,
etc.
Other
Resources at NCBI
NCBI also includes several resources for sequence, homology, and conserved
domain searching; protein structure analysis and comparisons; and genomic
maps and analyses. Most of these resources can be accessed from the
NCBI home page or the Site Map.
Some Other Useful Links
Gene
and Protein Database Guide
Prepared by Oak Ridge National Laboratory (ORNL) the guide provides
a detailed overview of major gene and protein databases related to
the Human Genome Project and human genes and diseases; includes links
to search tips and links to revies, tip sheets, etc.
Nucleic Acids Research
The journal Nucleic Acids Research publishes two special issues each
year on molecular biology resources:
Database Issue (January 1, open access)
The 2005 Database Issue of Nucleic Acids Research is the 12th
in a series dedicated to factual biological databases (719 are
described). Such databases are an essential resource for working
biologists and this compilation provides descriptions of the
most important of these databases and serves to introduce newly
compiled databases that provide specialist information in the
biological area.
The Database Issue is open to submissions from commercial databases
that are not freely available. These articles undergo exactly
the same peer-review process as normal articles but the commercial
databases will have paid commercial rates instead of the usual
author charge.
Web Server Issue (July 1, open access)
The Web Server Issue highlights the many servers that are available
on the web to perform useful computations on DNA, RNA and protein
sequences and structures. Between them, the two issues provide an
unparalleled array of useful computational services. The new Web
Server Issue aims to provide a repository in which authors of web
servers can highlight their offerings and readers can find out what
is available.
In the 2005 issue there are reports of 166 web servers that run
the gamut from BLAST services to three-dimensional protein structure
prediction. The servers described have all been subjected to rigorous
peer review, are available free of charge and provide invaluable
resources to the scientific community.
Top of Page
Librarian Contact Information
|