The qcGene Database


The qcGene database was constructed by extraction and curation of protein-coding sequences from GenBank to create a non-redundant set of transcript sequences. Each transcript nucleotide sequence has an associated protein sequence. Each nucleotide-protein tuple was assigned an unique identifier, in the form of "NP######"(where NP stands for nucleotide-protein).

In some cases, entries were created in qcGene by splicing together distinct GenBank accessions for each exon in those transcripts. In other cases, multiple entries were created in qcGene from one GenBank entry (for example for the mitochondrial nucleotide sequences - that code for multiple proteins).


These are the divisions of Genbank that were included in qcGene:

  • Primates: 'gbdiv_pri',
  • Mammals: 'gbdiv_mam',
  • Rodents: 'gbdiv_rod',
  • Vertebrates: 'gbdiv_vrt',
  • Invertebrates: 'gbdiv_inv',
  • Plants: 'gbdiv_pln',

  • Refseq-known: 'srcdb_refseq_known', including the sub-divisions:
    • validated
    • reviewed
    • provisional
    • predicted
    • inferred
  • Refseq-model: 'srcdb_refseq_model'

Here is an example of one entry in qcGene, for human insulin:


NP334521 Report


RECORD INFORMATION

Gene ID:  NP334521
Nucleotide Accession:  BC005255.1
Nucleotide gi:  13528923
Protein Accession:  AAH05255.1
Protein gi:  13528924
 
Protein name:  insulin [Homo sapiens]
Predicted :  NO
Genome:  Homo sapiens (Human)
Common gene name: 
MedLine ID:  12477932
 
Coding sequence length:  333
Transcript sequence length:  495
Assembly status:  THC2465636

ACCESSION DATA

NP334521 is derived from GenBank accessions:
BC005255.1 with gi: 13528923
Definition: Homo sapiens, insulin, clone MGC:12292, mRNA, complete cds

cDNA FEATURES

Feature End 5 End 3

coding region  60 392

SEQUENCE

nucleotide:
AGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCATCAAGCAGATCACTGTCCTTCTGCCA
TGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACC
CAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACC
TAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACC
TGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGG
CCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCT
CCCTCTACCAGCTGGAGAACTACTGCAACTAGACGCAGCCCGCAGGCAGCCCCCCACCCG
CCGCCTCCTGCACCGAGAGAGATGGAATAAAGCCCTTGAACCAACAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAA

Protein:

MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN