The qcGene database was constructed by extraction and curation of protein-coding sequences from GenBank to create a non-redundant set of transcript sequences. Each transcript nucleotide sequence has an associated protein sequence. Each nucleotide-protein tuple was assigned an unique identifier, in the form of "NP######"(where NP stands for nucleotide-protein). In some cases, entries were created in qcGene by splicing together distinct GenBank accessions for each exon in those transcripts. In other cases, multiple entries were created in qcGene from one GenBank entry (for example for the mitochondrial nucleotide sequences - that code for multiple proteins).
These are the divisions of Genbank that were included in qcGene:
Here is an example of one entry in qcGene, for human insulin:
NP334521 ReportRECORD INFORMATION
ACCESSION DATANP334521 is derived from GenBank accessions:BC005255.1 with gi: 13528923Definition: Homo sapiens, insulin, clone MGC:12292, mRNA, complete cds cDNA FEATURES
SEQUENCEnucleotide:AGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCATCAAGCAGATCACTGTCCTTCTGCCA TGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACC CAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACC TAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACC TGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGG CCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCT CCCTCTACCAGCTGGAGAACTACTGCAACTAGACGCAGCCCGCAGGCAGCCCCCCACCCG CCGCCTCCTGCACCGAGAGAGATGGAATAAAGCCCTTGAACCAACAAAAAAAAAAAAAAA AAAAAAAAAAAAAAA Protein: MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN | |||||||||||||||||||||||||||||||||||||||