|What is the Experimental Approach of the Maize FLcDNA Project?
This project will span three years and involve two academic institutions: the University of Arizona and Stanford. The overall goal is to sequence 30,000 FLcDNA clones from two cDNA libraries of varied tissues and stress treatments. This project is using maize inbred B73 background for both clone libraries, the same inbred line being used for full genome sequencing. Specifically, the supporting aims of this project are:
- Sequence 5' and 3' ESTs from 130,000 random cDNA clones in library #1.
- Sequence 5' and 3' ESTs from 50,000 random cDNA clones in library #2.
- Select ~30,000 unique clones with both a 5' and 3' EST for full-length sequencing from libraries #1 and #2.
- Annotate the expression of these FLcDNAs using microarray hybridizations, locate FLcDNAs on the physical map of maize chromosomes, and display results using a web-based genome browser.
- Distribute clones and amplified FLcDNA libraries to the research community.
- Involve high school teachers and undergraduates in genomics projects and analysis; develop classroom exercises using maize genomics resources.
Overview of the Experimental Approach
Use two libraries of clones comprising a wide variety of the various tissues of maize to include genes that are active in different parts (leaf vs. flower vs. stem, etc.) and responsive to environmental changes.
Sequence ESTs (sequences derived from messenger RNA) at both the 5' and 3' end of each clone.
Assemble EST sequences to determine the number of distinct transcripts found. Once combined with other known data such as previous ESTs and genome sequence, confirm the unique identity of each assembly.
Choose 30,000 different clones to sequence completely.
- Only choose clones that have both a 5' and 3' end
- Choose one clone from each assembly
- Clones will be organized for sequencing according to length based on comparison to homologous rice gene models
Sequence FLcDNAs. For clones that are too long to sequence in one pass, utilize two methods for finish sequencing (see Figure):
- Iterative primer walking
- Transposon techniques
Assemble FLcDNAs and annotate their expression patterns using information from microarrays, full genome sequencing, and qRT-PCR for alternatively spliced transcripts.
Make all data publicly available.
What is Involved in Generating FLcDNA Clones?
Individual plasmids from two cDNA libraries will be randomly sequenced to recover maize transcripts expressed in diverse tissues (see Table).
After assembly of all the ESTs, ~30,000 unique and representative clones from both libraries will be chosen for complete, full-length sequencing. This number of FLcDNAs will comprise about 60% of the expected number of maize genes. cDNA clones chosen for full-length sequencing
will contain completed 5' and 3' ends with untranslated regions in order to retain the entire translated region of the transcribed gene. Clones that are sequenced fully the first time will be marked as finished.
Those that are incomplete will be slated for finish sequencing (see Figure).
To streamline this process, these clones will be sorted by length into separate sequencing plates prior to finishing- for example, one, two, and three rounds of primer walking for finish sequencing required. Finishing is accomplished by a technique called primer walking. Primers are
designed to the first run sequences at the 5' and 3' ends of a clone, and then each end is resequenced from the new starting point(s). This process is repeated until the clone is sequenced completely through. When only a few difficult clones in a plate remain unfinished
after repeated primer walking steps, these are handled individually using a transposon tagging technique.
The problem clone(s) will be randomly disrupted at different sites using transposon insertion. Primers specific to the transposon ends will then be used to read a portion of the clone sequence. This will be repeated
multiple times to generate many sequences that overlap and then can be assembled into the FLcDNA.
All finished, FLcDNA clones must have two reads at a high quality threshold across the entire assembled sequence, with consistent 5' and 3' end pairs. ESTs already sequenced in the public databases will also be used to check the new sequences for accuracy.
Alternative spliced variants will be tested by real-time qRT-PCR.
WHY IS THIS PROJECT IMPORTANT?
MANY SEQUENCING TECHNIQUES HAVE BEEN VALUABLE FOR OTHER ORGANISMS
The most direct method for sequencing an entire genome is shotgun sequencing, in which the genome is broken into very small pieces for sequencing. After all of the small bits are finished, the major task is assembling these into the correct linear order.
This method has worked well for small genomes and genomes with a low amount of repetitive sequence, such as yeast and Arabidopsis (a plant with a very small genome, about 5% the size of maize, and only 19% repetitive DNA). By contrast, the maize genome is at least
two-thirds or more repeated DNA, reflecting recent gene duplications and the complete duplication of the entire genome, a process called polyploidization, that occurred about 10 - 15 million years ago. Imagine trying to assemble an entire book from a list of the
words in random order; some words such as "the" will be repeated many times. Because of the enormous cost of "finishing" a genome with highly repetitive parts in the 20th century, a complete sequence of maize seemed out of reach.
Instead, knowledge about maize genes was built using a combination of techniques: Expressed Sequence Tags (ESTs) are sequences derived from messenger RNA (the transcribed portion of the genome), sequences next to recent transposon insertion sites
(most of which are in genes), methyl filtration (which enriches for transcriptionally active genomic pieces), and High Cot sequencing, which enriches for low copy number genomic segments.
There are more than 700,000 maize ESTs representing information from more than 50,000 likely genes.
The other three methods generate GSS (Genome Survey Sequences), of which there are about two million for maize. The GSSs provide intron sequences and sequence upstream and downstream of maize genes. When the GSSs and ESTs are combined, a robust picture
of the maize genes is possible. ( The FLcDNA Advantage ) Gene models predicted from the current information will be verified or corrected with the addition of full-length maize cDNAs from this project, because these are complete copies of individual messenger RNAs rather than short pieces found in ESTs.
THE EST ASSEMBLY IS ERROR-PRONE
The co-assembly of GSS and ESTs is error-prone: even though sequencing errors are relatively low in frequency, they present a problem for assembly. To account for these errors, scientists must use criteria such as 95% similarity to allow the pieces to be put into the same gene models. This can cause two distinct genes that are highly related to each other to be pushed into one assembly. This type of error is corrected by
completely sequencing a single molecule, which represents a continuous, unbroken representation of the transcribed region of a gene. Despite the large number of ESTs and GSSs, however, the data cannot be assembled into the full genome, because all of the methods bias against recovery of the highly repetitive sequences that constitute most of the genome.
HOW FULL-LENGTH CDNAS, PAST TECHNIQUES, AND FULL GENOME SEQUENCING WILL COMPLEMENT EACH OTHER
Beginning in December 2005, the maize genome is being sequenced using pieces of about 100 - 200 kb each. These pieces have been put into linear order by previous projects and represent a "tiling path" for each chromosome in which the BACs (Bacterial Artificial Chromosome) with minimum overlap along the chromosome were chosen for sequencing. The ~100 kb BAC clones are sequenced as
small "genome" projects and then assembled to generate the complete sequence of that step in the tiling path. Thus, BAC by BAC, the entire genome is described. This project should be finished in 2008. The full-length cDNAs from our project will provide invaluable information for the correct identification of introns and exons in the majority of maize genes and will be added as annotation to the
final genome sequence. Additionally, our project will provide information about where and when many of these genes are expressed, an important step in understanding the function of each maize gene.
Constituent Tissues In The Two Clone Libraries And Planned Microarray Experiments
|Library #1||Library #2||Array||Status|
| Aleurone (21 dap)
Embryo 10 dap
Embryo 15 dap
Endosperm 10 dap
Endosperm 15 dap
Silks at time of anthesis
Immature ear = 2 mm
Adult leaf blade + sheath
Stem of adult plant
Shoot apex (meristem plus P0 through ~P4)
10 day seedling shoot + root
5 cm Tassel branch
| 7 day seedling|
7 day seedling with 2 day salt treatment
7 day seedling with 4 hour 45 C heat shock
7 day seedling with 24 hour 10 C treatment
7 day seedling with 4 hour UV-B treatment
7 day etiolated seedling
7 day seedling with 2 day osmotic stress
| || Tassel branch -- anthers sizes from 0.5 - 1 mm|
1 mm dissected anthers
1.5 mm pre-meiotic anthers
Anthers covering stages of meiosis
Schematic Diagram of the Finishing Strategy for FLcDNA Sequencing
||FINISHING STRATEGY FOR FLCDNA SEQUENCING |