Matrix Science
Home Mascot Help  
  Help > Sequence Databases   

Sequence Databases

Information on relevant sequence databases can be found by following the links below. Additionally, the first issue every year of Nucleic Acids Research contains status reports from the curators of the major databases.


dbEST is the division of GenBank that contains "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms.


Entries from the DNA Databank of Japan (DDBJ) are wholly incorporated into GenBank.


The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences collected from the scientific literature and patent applications and directly submitted from researchers and sequencing groups. Data collection is done in collaboration with GenBank (USA) and the DNA Databank of Japan (DDBJ).


GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. There are approximately 1,622,000,000 bases in 2,356,000 sequence records as of June 1998. The complete release notes for the current version of GenBank are available by FTP. A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.


MSDB has not been updated since 2006 and should be considered obsolete.


NCBI maintains composite, non-identical protein and nucleic acid databases for their search tools BLAST and Entrez. The entries in the protein database, nr , have been compiled from GenBank CDS translations, PIR, SWISS-PROT, PRF, and PDB. NCBI has made strong efforts to cross-reference the sequences in these databases in order to avoid duplication.


OWL is a non-identical composite of four publicly-available protein databases: SWISS-PROT, PIR (1-3), GenBank (translation) and NRL-3D. OWL has not been updated since May 1999, and should be considered obsolete.


The Brookhaven Protein Data Bank (PDB) is a database of three-dimensional structures. This means that entries are invariably well characterised, with reliable sequence data which can also be found in the other databases. Entries which are unique to PDB tend to be variant proteins, with distorted structures, which were used to refine a structural determination.


The PIR (Protein Information Resource) database was initiated at the NBRF in the early 1960's by the late Margaret O. Dayhoff as a collection of sequences for the study of evolutionary relationships among proteins. The database is now an international collaboration of three data centers: the NBRF, the Munich Information Center for Protein Sequences (MIPS), and the Japan International Protein Information Database (JIPID). The three centers cooperate to produce and distribute a single database of `wild-type' protein sequences.


The Protein Research Foundation of Japan database contains protein sequences abstracted from scientific publications.


Swiss-Prot is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc), a minimal level of redundancy and high level of integration with other databases. It was established in 1986 and has been maintained collaboratively, since 1987, by the Department of Medical Biochemistry of the University of Geneva and the EMBL Data Library (now the EMBL Outstation of The European Bioinformatics Institute - EBI).

Copyright © 2010 Matrix Science Ltd. All Rights Reserved.