DFCI Gene Indices Information Page
Definitions:
Protocol for Assembly of ESTs and Transcripts
Preparation of EST data
- Sequences were extracted from dbEST and were subjected to quality control screening (vector, E. coli, polyA, T, or CT removal, minimum length = 100 bp, < 3% N).
Preparation of transcript (ET) database
- All sequences from the appropriate divisions of GenBank (including RefSeq) were extracted.
- Non-coding sequences were discarded and cDNAs and coding sequences from genomic entries were saved.
- Sequences and related information (e.g. PubMed links) are stored in the qcGene database (qcGene).
Assembly
- Cleaned EST sequences and non-redundant transcript (ET) sequences were combined.
- Using the Paracel Transcript Assembler Program, sequences were assembled into contigs.
TCs are consensus sequences based on two or more ESTs (and possibly an ET) that overlap for
at least 40 bases with at least 94% sequence identity. These strict criteria help minimize
the creation of chimeric contigs. These contigs are assigned a TC (Tentative Consensus) number.
TCs may comprise ESTs derived from different tissues.
- The best hits for TC's were assigned by searching the TC set against a
non-redundant amino acid database(nraa) using BLAT.
The top five hits based on score
were selected and displayed for each TC.
Caveats
- TCs are only as good as the ESTs underlying them; there may be unspliced or chimeric ESTs and thus TCs
- There is still redundancy in the TC set because sequences must match end
to end and at a certain percent identity to be combined
- Directionality of the TCs should not be assumed
- Not all TCs contain protein-coding regions
References related to EST strategy [Entrez links]
Adams, MD et. al., "Complementary DNA sequencing: expressed sequence tags and human genome project" Science 252, 1651-6 (1991) [91262645]
Adams, MD et. al., "Sequence identification of 2,375 human brain genes", Nature 355: 632-4 (1992) [92168112]
Adams, MD et. al., "3,400 new expressed sequence tags identify diversity of transcripts in human brain", Nat Genet 4, 256-67 (1993) [93364420]
Adams, MD et. al., "Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library", Nat Genet 4, 373-80 (1993) [94004965]
Adams, MD et. al., "Initial assessment of human gene diversity and
expression patterns based upon 83 million nucleotides
of cDNA sequence", Nature 377(Suppl.): 3-174 (1995) [96026280].
Lee, N.H., Weinstock,
K.G., Kirkness, E.F., Earle-Hughes, J.A., Fuldner, R.A.,Marmaros, S.,
Glodek, A., Gocayne, J.D., Adams, M.D., Kerlavage, A.R.,Fraser, C.M., and
Venter, J.C., "Comparative EST analysis of differential gene expression profiles in PC12
cells before and after nerve growth factor treatment". Proceedings of the National Academy of Science, U.S.A.,
92:8303-8307 (1995). [95396786].
The Gene Index Publications

Comments and suggestions : Contact Us
| Acknowledgements |
 |
The Gene Index Project is supported in part by funding from the US National Science Foundation through grant #DBI-0552416. |