— Overview: Why FLcDNAs Matter —
In elementary biology we all learn the basic rule: Genes code for proteins. An organism’s genetic code (DNA) is transcribed into mRNA which is then translated into a sequence of amino acids – a protein.
What’s less commonly understood is that, in higher organisms (eukaryotes), portions of the genetic code (introns) are spliced out when DNA is transcribed into mRNA. So a protein product’s amino acid sequence reflects the mRNA sequence rather than the gene sequence.
Whole genome sequencing projects do not tell us what portions of the genome are spliced out during transcription. And mRNA itself is too unstable to sequence. By sequencing full-length complementary DNA (FLcDNA) made from mRNA and comparing back to genomic sequence, researchers can identify the introns that have been spliced out of a gene. From there, it’s a simple logical step to identifying the amino acid sequence of the protein produced by that gene.
However, nature is not content with such simplicity. In reality, a single gene can code for more than one protein through a process known as alternative splicing (see below). If researchers had the resources to sequence several examples of FLcDNAs for an individual gene, they would often find the transcripts for the same genetic sequence
spliced in multiple ways. Such an approach would allow researchers to determine many possible proteins produced by a single gene. Although the current project will only sequence one copy of a particular FLcDNA, it will nevertheless identify some alternative splicings by sequencing multiple ESTs and by quantitative real-time PCR (qRT-PCR) (see below)
— The Gap Between Gene and Function —
According to what’s known as the Central Dogma, genes are transcribed into mRNAs which are translated into proteins. But different genes are being transcribed and translated at different times in different tissues of an organism. For example, in maize, the roots, leaves, flowers and tassels result from differences in gene expression, and the genes active in each tissue vary as the plant develops.
To understand the function of different gene products, one must know what genes are turned on or off in different parts of the plant at different stages of development.
— Using cDNA ESTs to Find Genes and Determine Gene Function —
One approach to addressing the question of gene function involves a bit of reverse engineering. Scientists extract relatively unstable bits of mRNA from tissues and reverse-transcribe it into a stable DNA molecule lacking introns – called complementary DNA (cDNA). They insert the cDNAs into plasmid libraries, and sequence each
end of the cDNAs. These short stretches of DNA code are called Expressed Sequence Tags, or ESTs. Because they reflect mRNA sequences, ESTs represent active genes. [For more information on ESTs and other applied technologies, click here.
Recent projects have generated large numbers of ESTs for maize. For example, the Maize Gene Discovery Project produced upwards of 200,000 such tags from 13 tissue types at six stages of maize development. Such information is useful for finding genes (MGD detected approximately 30,000 genes using ESTs) and determining which genes are turned on in particular tissues and at particular developmental stages.
The quantity of a gene-specific mRNA present in a tissue can also be determined from cDNA (using quantitative real-time PCR assays and microarrays) and compared to the quantity present in other tissues. Such quantitative information is useful in understanding gene function as well.
— The FLcDNA Advantage —
In addition to EST sequencing, researchers can sequence the full length of a cDNA . If the gene sequence and the FLcDNA sequence are both known, then the intron locations will also be known – at least for that particular transcript (see alternative splicing below). That means that genome sequencing and FLcDNA projects are both needed.
For humans and rice, whole genome projects were completed before any FLcDNA projects began. For maize, the genome sequencing and FLcDNA projects are happening simultaneously. At the completion of both, researchers will be able to bridge the gap from gene to spliced gene sequence (FLcDNA) and from there to the amino acid sequence of proteins.
— How Alternative Splicing Complicates Things —
For many genes in higher organisms, the gene sequence can be spliced in several alternative ways during transcription. Thus, a single gene can produce more than one mRNA sequence which in turn can be translated into more than one protein. In this manner, the approximately 50,000 genes in the maize plant might produce a significantly greater number of proteins.
[ ^ TOP ^ ]
— How Common is Alternative Splicing in Maize? —
Researchers don’t yet know how often the maize plant resorts to alternative splicing. In humans, such events are extremely common – between 40 and 60 percent of human genes produce more than one protein. In rice, researchers found approximately 5 percent alternative splicing when sequencing only relatively short EST segments. The Walbot lab has worked for many years with ten maize
genes, four of which have multiple alternative transcripts. Thus, it is quite possible that alternative splicing plays a significant role in maize.
— Using ESTs to Discover Alternatively Spliced Transcripts —
To figure out which proteins are being produced in different tissues requires knowing not only which gene is being transcribed, but which alternative splicing is being used for that gene in that tissue.
To get at that information, one would ideally sequence multiple examples of FLcDNAs for each gene. Unfortunately, such an approach is impractical, because of the cost (~ $150) for each FLcDNA completed. ESTs, however, offer opportunities to identify alternative splicings – at least to the extent that they are exhibited within the relatively short length of the EST.
The current project will sequence approximately 360,000 ESTs, a certain portion of which will be products of the same gene, and some of which will represent alternative splicings. A similar approach found 5 percent alternatively spliced transcripts in rice. Here, because the combined 5' and 3' EST length from each cDNA clone is longer than that used in the rice project (nearly 1400 bases rather than 800 bases), project researchers hope for greater success.