Bio292: Introductory Genomics & Bioinformatics for Health Research
taught by Dr. John Quackenbush
Harvard School of Public Health
Spring Term 1

Introduction to Bioconductor and R

May 23 2011 (9:30am - 5:00pm) and May 24 2011 (9:30am - 12:30pm), Countway 403

This course is an introduction to R and Bioconductor, a powerful and flexible statistical language for analysis of genetic and genomics data ( The course will introduce attendees to the basics of using R for statistical programming, computation, graphics, and modeling, especially for analyzing high-througput genomic data. We will start with a basic introduction to the R language, reading and writing data, and plotting data. Case studies and data will all be based on real gene expression and genomics data. We will introduce the main classes and packages in Bioconductor. Our goal is to get attendees up and running with R and Bioconductor such that they can use it in their research and are in a good position to expand their knowledge of R and Bioconductor on their own. Course notes will written such that they provide students with a useful reference manual on R and Bioconductor.

Reproducible Research - Using Sweave and R

June 15 2011 (9:30am - 12:30pm), Countway 403

Have you struggled to replicate an analysis or reproduce a figure? Did you ever wish it were easier to piece together your analysis code, results and data files to revise a paper 6 months after the initial submission? It is not uncommon for a researcher to be unable to reproduce results. Data analysis in most publications are not easily reproduced (Ioannidis et al 2009), even when the original data is available. Reproducible research refers to the ability to be able to independently and accurately reproduce, or replicate an analysis or experiment.

This tutorial will cover the basics of how R code can be embedded in text documents using Sweave and LaTeX to create statistical reports that can be reproduced by issuing a single operating system command. We will describe how to better organize your R and bioconductor analyzes. We will also introduce some improvements (pgfSweave) and graphical user interfaces (LyX, RStudio) which make creating documents in Sweave/R easier.

Mouse Genome Informatics

September 20 2011: 3 class options (9:30am - 11:00am; 1:00pm - 2:30pm; 3:00pm - 4:30pm), Countway 403

Mouse Genome Informatics provides free, publicly available access to integrated data on the genetics, genomics and biology of the laboratory mouse. In this self-guided tour you will explore the MGI database in depth. You will use MGI to:

• find mouse models of human disease
• locate genes and alleles associated with a specific phenotype
• identify suppliers of mice carrying a mutation in a gene of interest
• find gene expression assays and images for specific anatomical structures
• retrieve upstream regulatory sequence of a gene
• view terms describing the molecular function, biological process and cellular component of a gene, and retrieve a list of genes annotated with a specific Gene Ontology (GO) term
• identify single nucleotide polymorphisms (SNPs) and PCR polymorphisms specific to selected mouse strains
• find primer sequences
• locate suppliers of BAC or cDNA clones
• download MGI data or perform batch queries

