Gene Expression
and Cell State Data



1. mRNA levels
1.1. cDNA microarrays
1.2. Oligonucleotide chips
1.3. RT-PCR
1.4. Serial Analysis of Gene Expression
2. Protein levels


1. mRNA levels

1.1. cDNA microarrays

Developed at Stanford University, the microarrays are glass slides on which cDNA has been deposited by high-speed robotic printing. They are ideally suited for expression analysis of up to 10,000 cDNA clones per array from EST (expressed sequence tag) sequencing projects (such as the private effort at Incyte Pharmaceuticals and the public Washington University project).

Yeast microarray Microarrays measurements are carried out as differential hybridizations to minimize errors originating from cDNA spotting variability: mRNA from two different sources (e.g control and drug-treated), labeled with two different fluorescent dyes, is passed over the array at the same time. The fluorescence signal from each mRNA population is evaluated inependently, and then used to calculate the treated/control expression ratio.

Patrick Brown's lab at Stanford has used microarrays to measure gene expression levels for the entire yeast genome (approximately 6400 distinct cDNA sequences) during the diauxic shift (transition from sugar metabolism to ethanol metabolism), sporulation and the entire cell cycle. These data sets are publicly available. The Brown Lab also has an online guide to build your own arrayer and scanner. These microarrays have been commercialized by Incyte Pharmaceutical's Microarray Division (formerly Synteni). Incyte Gene Expression Microarrays are available with templates from human, rat, mouse, plant and microbial genomes.

1.2. Oligonucleotide chips

Yeast chips These chips, produced by Affymetrix, consist of small glass plates with thousands of short 20-mer oligonucleotide probes attached to their surface, The oligonucleotides are synthesized directly onto the surface using a combination of semiconductor-based photolithography and light-directed chemical synthesis. Due to the combinatorial nature of the process, very large numbers of mRNAs can be probed at the same time. However, manufacturing and reading of the chips requires expensive equipment. Current chips have over 65,000 different probes, with typically several probes for each mRNA.

Affymetrix currently manufactures GeneChips for 42,000 human genes and ESTs, 30,000 murine genes and ESTs, and 6,100 yeast ORFs (whole genome). Little data is publicly available, with the exception of a S. cerevisiae expression database generated in collaboration with Ron Davis' lab.

Rat cervical spinal cord development

1.3. RT-PCR

To measure gene expression using RT-PCR (Reverse Transcriptase Polymerase Chain Reaction), the mRNA is first reverse-transcribed into cDNA, and the cDNA is then amplified to measurable levels using PCR. Using built-in calibration techniques, RT-PCR can achieve high accuracy coupled with an exceptional sensitivity of 10molecules/10ml assay volume and a dynamic range covering 6-8 orders of magnitude. The method does require PCR primers for all the genes of interest, and is not inherently parallel like the previous three, so automation is crucial to scale up.

Roland Somogyi has used this method to measure the expression levels of 112 genes at nine different time points during the development of rat cervical spinal cord, and 70 genes during development and following injury of the hippocampus. The former data set is publicly available, the second should be available soon.

1.4. Serial Analysis of Gene Expression

SAGE uses a very different technique for measuring mRNA levels. First, double stranded cDNA is created from the mRNA. A single 10 base pair (long enough to uniquely identify each gene) "sequence tag" is cut from a specific location in each cDNA. Then the sequence tags are concatenated into a long double stranded DNA which can then be amplified and sequenced. This method has two advantages: the mRNA sequence does not need to be known a priori--so it will also detect previously unknown genes--and it uses sequencing technology that many labs already have. The method is rather complex though, and requires a large amount of sequencing.

SAGE has been used to analyze the set of genes expressed during three different phases of the yeast cell cycle. SAGE has also been used to monitor the expression of at least 45,000 human genes in normal colon cells, colon tumors, colon cell lines, pancreatic tumors and pancreatic cell lines. Some of this data is available upon request.


2. Protein levels

Protein levels are much harder to quantify than mRNA levels. 2D-PAGE separates proteins on a two-dimensional sheet of gel, first in one direction based on their isoelectric point, and then in the other direction based on their molecular weight. The result is a two-dimensional image with a large number of protein "spots". The intensity of each spot is proportional to the amount of the specific protein present.

2D-PAGE It is not a priori known which protein each spot represents, although the position of known proteins can be estimated. Also, new microsequencing and mass spectrometry techniques allow spots to be identified with proteins of which the sequence is known. The resolution of the spots may not be high enough to separate all proteins, and 2D gel results have been hard to reproduce, because of sensitivity to operating parameters and a host of possible artifacts. These problems have been somewhat alleviated lately by the use of highly standardized protocols and higher accuracy techniques.

There are several 2D gel databases for E. coli, yeast, Drosophila, rat, mouse, human, etc. One of the most important ones is the SWISS-2DPAGE database, containing a total of 518 entries from human, yeast, E. coli and Dictyostelium. 2D-PAGE proteomics is currently being commercialized in a partnership between Incyte Pharmaceuticals and Oxford Glycosciences.


© Copyright 1997 by Patrik D'haeseleer,
patrik at cs dot unm dot edu
c/o Computer Science Department, University of New Mexico, Albuquerque, NM, 87131

(505) 277-9428 (office)
(505) 277-6927 (fax)