Gene Expression
and Cell State Data
1. mRNA levels
1.1. cDNA microarrays
1.2. Oligonucleotide chips
1.3. RT-PCR
1.4. Serial Analysis of Gene Expression
2. Protein levels
1. mRNA levels
1.1. cDNA microarrays
Developed at Stanford University, the microarrays are glass slides on
which cDNA has been deposited by high-speed robotic printing. They are
ideally suited for expression analysis of up to 10,000 cDNA clones per
array from EST (expressed sequence tag) sequencing projects (such as
the private effort at Incyte Pharmaceuticals and the public Washington
University project).
Microarrays measurements are carried out as differential hybridizations
to minimize errors originating from cDNA spotting variability: mRNA from
two different sources (e.g control and drug-treated), labeled with two
different fluorescent dyes, is passed over the array at the same time. The
fluorescence signal from each mRNA population is evaluated inependently,
and then used to calculate the treated/control expression ratio.
Patrick Brown's lab at Stanford has used microarrays to measure gene
expression levels for the entire yeast genome (approximately 6400 distinct
cDNA sequences) during the diauxic shift (transition from sugar metabolism
to ethanol metabolism), sporulation and the entire cell cycle. These
data sets are publicly available. The Brown Lab also has an online
guide to build your own arrayer and scanner. These microarrays have been
commercialized by Incyte Pharmaceutical's Microarray Division (formerly
Synteni). Incyte Gene Expression Microarrays are available with
templates from human, rat, mouse, plant and microbial genomes.
1.2. Oligonucleotide chips
These chips, produced by Affymetrix, consist of small glass plates
with thousands of short 20-mer oligonucleotide probes attached to
their surface, The oligonucleotides are synthesized directly onto the
surface using a combination of semiconductor-based photolithography
and light-directed chemical synthesis. Due to the combinatorial nature
of the process, very large numbers of mRNAs can be probed at the
same time. However, manufacturing and reading of the chips requires
expensive equipment. Current chips have over 65,000 different probes,
with typically several probes for each mRNA.
Affymetrix currently manufactures GeneChips for 42,000 human genes
and ESTs, 30,000 murine genes and ESTs, and 6,100 yeast ORFs (whole
genome). Little data is publicly available, with the exception of a
S. cerevisiae expression database generated in collaboration with Ron
Davis' lab.
1.3. RT-PCR
To measure gene expression using RT-PCR (Reverse Transcriptase Polymerase
Chain Reaction), the mRNA is first reverse-transcribed into cDNA, and
the cDNA is then amplified to measurable levels using PCR. Using built-in
calibration techniques, RT-PCR can achieve high accuracy coupled with an
exceptional sensitivity of 10molecules/10ml assay volume and a dynamic
range covering 6-8 orders of magnitude. The method does require PCR
primers for all the genes of interest, and is not inherently parallel
like the previous three, so automation is crucial to scale up.
Roland Somogyi has used this method to measure the expression levels
of 112 genes at nine different time points during the development of
rat cervical spinal cord, and 70 genes during development and following
injury of the hippocampus. The former data set is publicly available,
the second should be available soon.
1.4. Serial Analysis of Gene Expression
SAGE uses a very different technique for measuring mRNA levels. First,
double stranded cDNA is created from the mRNA. A single 10 base pair
(long enough to uniquely identify each gene) "sequence tag" is cut
from a specific location in each cDNA. Then the sequence tags are
concatenated into a long double stranded DNA which can then be amplified
and sequenced. This method has two advantages: the mRNA sequence does
not need to be known a priori--so it will also detect previously unknown
genes--and it uses sequencing technology that many labs already have.
The method is rather complex though, and requires a large amount of
sequencing.
SAGE has been used to analyze the set of genes expressed during three
different phases of the yeast cell cycle. SAGE has also been used to
monitor the expression of at least 45,000 human genes in normal colon
cells, colon tumors, colon cell lines, pancreatic tumors and pancreatic
cell lines. Some of this data is available upon request.
2. Protein levels
Protein levels are much harder to quantify than mRNA levels. 2D-PAGE
separates proteins on a two-dimensional sheet of gel, first in one
direction based on their isoelectric point, and then in the other
direction based on their molecular weight. The result is a two-dimensional
image with a large number of protein "spots". The intensity of each spot
is proportional to the amount of the specific protein present.
It is not a priori known which protein each spot represents, although the
position of known proteins can be estimated. Also, new microsequencing and
mass spectrometry techniques allow spots to be identified with proteins
of which the sequence is known. The resolution of the spots may not be
high enough to separate all proteins, and 2D gel results have been hard
to reproduce, because of sensitivity to operating parameters and a host of
possible artifacts. These problems have been somewhat alleviated lately by
the use of highly standardized protocols and higher accuracy techniques.
There are several 2D gel databases for E. coli, yeast, Drosophila, rat,
mouse, human, etc. One of the most important ones is the SWISS-2DPAGE
database, containing a total of 518 entries from human, yeast, E. coli
and Dictyostelium. 2D-PAGE proteomics is currently being commercialized
in a partnership between Incyte Pharmaceuticals and Oxford Glycosciences.
© Copyright 1997 by Patrik D'haeseleer, patrik
at cs dot unm dot edu
c/o Computer Science Department, University of New Mexico,
Albuquerque, NM, 87131
(505) 277-9428 (office)
(505) 277-6927 (fax)