The DFCI Gene Indices

Frequently Asked Questions About the DFCI Gene Indices


The purpose of this page is to provide answers for commonly asked questions about the DFCI's Gene Indices.
Before you begin, please note:
  • All of the Gene Indices are built with the same structure so the answers to the questions are valid for ANY of the gene indices.
  • All answers requiring navigation of TGI web pages begin from the main index page, links are provided for each of the species by clicking on the picture icon or the organism name.

Category: Search
  1. I have a sequence I want to search against TGI databases or EGO, how do I begin? Can I do batch searches?
  2. I want to search the reverse complement of a sequence, do I need to change the input sequence to do this?
  3. How are the the results of a sequence search formatted?
  4. I did a sequence search which returned several sequences, how do I find out more information about these matches?
  5. How do I search the Gene Indices for tissue expression information?
  6. Can I do searches using GenBank accessions, clone names or EST ID? Can I do batch searches based on these accessions?
  7. Is it possible to get sequence information about the regions flanking a given EST?
  8. How do I find orthologs for my sequence?

1. I have a sequence I want to search against TGI databases or EGO, how do I begin? Can I do batch searches?

DFCI offers sequence searches at both nucleotide and peptide level using WU-BLAST 2.0. A nucleotide or peptide query sequence must be between 30 and 2000 bases or amino acids long, respectively.

To search your sequence:

  • First, link to the TGI BLAST page in one of two ways. Click on the "BLAST" tab at the main index page, or go to any gene index. Under the section heading "Search the Index by", click on "Nucleotide or Protein Sequence".
  • A search form will appear. Select the program you wish to run, blastn for a nucleotide query against a nucleotide database or tblastn for a protein query against a six-frame translation of a nucleotide database. Next, specify the database you wish to search against, the maximum number of hits/alignments you wish to return (the default is 20).
  • Type or paste in a fasta format sequence into the text box provided and click the "Submit BLAST Job" button to start the search.
  • Your search results will be returned to the browser page.

No, you can not do batch searches via the TGI BLAST page at this time. You can, however, download the TGI databases you are interested in and do batch searches locally. (Check question 2 from Category "Availability" for download)


2. I want to search the reverse complement of a sequence, do I need to change the input sequence to do this?

No, DFCI's nucleotide sequence searches are automatically done for both strands. You do not need to do a separate search with the complementary strand.


3. How are the results of a sequence search formatted?

Information about the search

This section includes a reference to the paper explaining BLAST, a notice about the parameter settings, a "Query" line showing the name of the sequence searched, a "Database" line showing the database searched and a "Searching" line showing the time-wise progression of the search.

Search Results

Under the heading "Sequences producing High-scoring Segment Pairs" are the significant matches to the query. These are listed in rows containing the TGI (DFCI Gene Index) number, the putative identification of the TGI number (if assigned), the Score and the Smallest Sum Probability. Beneath this is information about each alignment: the TGI number, the putative identification of the TGI number (if assigned), the length of the sequence, and the strand(Plus/Minus) that was searched. A summary of the alignment is given which includes Score, P-value, Identity ratio and percentage, positives (or number of matches) as a ratio and a percentage, the strand of query and the strand of match (or subject). Following this is a pair-wise display of the alignment.

The end of the report contains statistics about the database searched, thread and/or processors used, and CPU time.

For more information about WU-BLAST and the results it returns please see the WU-BLAST web page at http://blast.wustl.edu/.


4. I did a sequence search which returned several sequences, how do I find out more information about these matches?

Under the heading "Sequences producing High-scoring Segment Pairs" are the significant matches to the query, the identifiers or accessions are hyperlinked to reports which provide more detailed information about the sequences.

Alternatively, you can search these identifiers under gene index search page. Please check question 6 on how to search based on accessions.


5. How do I search the Gene Indices for tissue expression information?

To query for tissue expression information:

  • Go to the Gene Index you wish to search through the main index page.
  • Under the section titled "Search the Index by", there are several search options. Click on the link marked "tissue, cDNA library name or cDNA library identifier(cat#)". This will take you to the Expression Page for the index.
  • This page will present 4 search options which you can query for tissue information:
    1. Search for Library specific Assemblies
    2. Search cDNA libraries by keyword
    3. Search cDNA libraries by Library Identifier
    4. Scan a List of TC by Library Expression Data
The last option allows you to compare two tissue types / libraries and calculates R Statistics.

6. Can I do searches using GenBank accessions, clone names or EST ID? Can I do batch searches based on these accessions?

Yes, you can search the gene index of interest using these identifiers.

  • First, link to the page for the DFCI Gene Index you want to search via the main index page.
  • The page displayed on your browser will contain a section titled "Search the Index by" containing several search options. Click on the "identifier (TC/THC, ET/HT, EST, GB)" link.
  • A form will be displayed giving you the option to search by one of several different identifiers. Enter the identifier in the corresponding search box and click on the "Submit" button. A report for the identifier will be returned on your browser.

Unfortunately, the batch search option is not available at this time. If you are interested in searching HGI, MGI, or RGI based on GenBank Accessions, you can use Resourcerer GB Search Tool. Otherwise, you may submit a request to TGI for batch search service.


7. Is it possible to get sequence information about the regions flanking a given EST?

Yes, if the EST has been assembled into a THC/TC.

  • Follow the instructions outlined in the answer to question 6 to obtain an EST Report for your sequence.
  • The EST report will contain a table with a column marked "THC" (or "TC"). If the EST has been included in an assembly this box will contain the hyperlinked name of the assembly. Clicking on this will return the THC/TC Report for this assembly.
  • This report will show you the relative location of the ESTs within this assembly and the sequences flanking it.
Note: If your EST is at the 5' or 3' most end of a THC/TC, or is not included in a THC/TC, information for the region flanking this end is not available. You may want to try searching the EST against one of GenBank's genomic databases. THC/TCs are theoretical assemblies so flanking information derived from a THC/TC is also theoretical.

8. How do I find orthologs for my sequence?

  • You can search your sequence against the Eukaryotic Gene Orthologs (EGO) database by following the instructions described in question 1 and choose EGO in the database selection box.
  • If sequence information is not available, you can search the gene index using the identifier for your sequence (question 6) and obtain the assembly ID (THC/TC) or a ET accession (NP) from the Report. You can then search EGO by using the DFCI accessions (THC/TC, NP) at the EGO search page.
  • Individual TC Reports also show hyperlinks to the Tentative Ortholog Groups (TOG) if available.

Return to TGI main page.
Comments and suggestions: contact us.

Comments and suggestions : Contact Us

Acknowledgements
   The Gene Index Project is supported in part by funding from the US National Science Foundation through grant #DBI-0552416.