BIT 410: Manipulation of Recombinant DNA
Searching GenBank and PubMed
The Databases | Basic
Searching | Results Page Features |
Improving Searches | Genbank Record | Resources & Assistance
Both PubMed and GenBank are searched using
Entrez---the National Library of Medicine’s integrated
cross-database search system. Since there are so many similarities in
searching these two databases, this guide addresses them together. In
cases where there is a difference, a table describing the two systems
will be inserted.
1. Find the Databases:
Start all your searches at the NLM website:
http://www.ncbi.nlm.nih.gov/
|
GenBank
|
PubMed
|
Choose Nucleotide
from the drop down Search menu. |
Choose PubMed from the Search menu (this is
the default). |
-
Using Entrez Nucleotide will limit your
search to several sequence databases including GenBank and
EMBL, DDBJ (international partners) and a few other sources,
such as patents.
-
Generally, for this assignment, searching
Entrez Nucleotide is fine; but you can use limits (described
below) so that you are ONLY searching GenBank, or an even
more focused database, such as RefSeq.
|
|
Note: You can also search all the Entrez databases at once, and then just
look at the results in your databases of interest. From the PubMed home
page, click on “all databases” on the left side of the black
near the top of the page. Enter your search in the box, and the number
of results will be shown next to each database. Click on the name of the
database to see the results—the number of hits is listed next to
the database.
Top of Page
2. Ways to Search the Databases
To start searching, simply type in a term or combination of terms,
such as clock gene or ovarian cancer.
The database automatically puts an “AND” into your search
unless you use quotes to designate a specific phrase, e.g., “ovarian
cancer”.
A very specific way to search is to use the Gene Symbol—if
you know it.
Numerous databases are available online for looking up gene symbols.
These are usually arranged by organism, e.g., human, rat, mouse,
etc.
To locate a database, just to go Google (www.google.com)
and type in: gene symbol database
HUGO is a human gene symbol database: http://www.gene.ucl.ac.uk/nomenclature/
Other Important Search Information:
- Boolean operators (AND, OR, NOT)—must be in uppercase
- The truncation symbol is an asterisk (*)—used for variant
word endings, and singulars and plurals
- Use “quotes” for an exact phrase
Top of Page
3. Looking at your Results I: Special Features
Tabs—If you look at the gray bar immediately
above your search results, you will see some tabs. This is a new Entrez
feature and lets you select a predefined category of results to look
at.
| Tabs in Entrez Nucleotide |
Tabs in PubMed |
| All (all sequences in your results)
Bacteria
mRNA
RefSeq |
All (all articles in your results)
Review (for review articles) |
Links—there is also a “links”
hyperlink associated with each record—you can access this from
the results list, or from the page with the individual record. Links
is on the right side of the page and lets you link to related information
within the Entrez system.
- Click on Links to see what related information is available for
your record, the number of available links varies quite a bit.
- Some potentially useful links include OMIM (Online Mendelian Inheritance
in Man), PubMed (articles relating to the gene), and several others.
| Links in Entrez Nucleotide |
Links in PubMed |
| Gene
Protein
PubMed
OMIM
Related Sequences
SNPS
Taxonomy
Many others when available |
Nucloetide
Protein
OMIM
Others when available
However, there seem to be far fewer links from PubMed to Nucleotide
than the reverse |
It is highly recommended to start your search
in Nucleotide and
use the Links feature to go to PubMed records.
Top of Page
4. Looking at your Results II: Focusing your Search
If you have too many results, or too many irrelevant results, you
can focus your search in several ways. Begin by clicking on the first
tab, “Limits”, located just below the
search box.
Field Searching
Each GenBank and PubMed record has many sections that contain specific
types of information, these are called fields (examples of fields
are author and abstract in PubMed; gene name and sequence in GenBank).
One way to focus your search is to look for relevant terms only in
certain fields. On the Entrez limits page is a drop-down menu for
you to select a field. You can also put the field you want to search
in square brackets following your search term-e.g., Marriott[AUTH]
for all research authored by Marriott. This is especially useful if
you want to search several fields.
Fields available in both GenBank and PubMed:
- accession number[ACCN], gene name[GENE], organism[ORGN], properties[PROP],
protein name[PROT]
- fields related to a publication, including author[AUTH], journal
name[JOUR], and publication date[PDAT]
Fields in GenBank: the title[TITL] field of a record
can be a very useful field to search as it includes the organism,
product, gene symbol, molecule type, and indicates if the record contains
a partial or complete CDS (coding sequence).
Fields in PubMed: the specialized subject terminology
that PubMed uses (MeSH-- for medical subject headings) can also help
you make your search bvery specific. For more information about MeSH,
see: http://www.nlm.nih.gov/mesh/meshhome.html
Exclusions (in GenBank only)
Just below the Fields dropdown menu is a selection of data types.
By clicking in the appropriate box(es), GenBank gives you the option
to exclude any or all of these data types, such as patents or ESTs,
from your search.
Search Limits
Several dropdown menus are next on the limits page. They let you
limit your search, for example, to a specific type of molecule (genomic
DNA, mRNA, protein) or a specific type of document (clinical trial,
review, etc.).
Many of the limits differ between GenBank and PubMed—some of
the more useful limits for each database are listed in this table:
| Limits in Entrez Nucleotide |
Limits in PubMed |
Type of molecule: genomic DNA, mRNA,
protein
Gene location: nucleus, mitochondria, chloroplast
Database you want to search (‘only from’):
GenBank, RefSeq, etc. |
Publication type: review, meta-analysis, clinical
trial
Human or animal studies: either one or both
Language
Gender/age group (for human studies)
Publication date |
For more information on Fields and Limits in GenBank and PubMed,
see this Entrez Help Document: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html
5. Interpreting a GenBank Record
Sometimes a GenBank record can be confusing. The NCBI web site has
a sample GenBank record with hyperlinks that take you to a description
of each element. Find it at: http://www.ncbi.nih.gov/Sitemap/samplerecord.html
6. Resources and Assistance
This guide has only introduced the basics of Entrez searching. There
are numerous other useful features available. Many helpful guides
and resources are available on the NCBI web site, these links are
good places to start:
Top of Page
Librarian Contact Information
|