The Maize Full Length cDNA Project
Images of Research Images of Research Images of Research Images of Research Images of Research Images of Research Images of Research
Arizona Genomics Institiute   ~   Arizona Genomics Computational Laboratory   ~   Stanford University 

mRNA Structure


Description of mRNA Structure

Answers to questions about multiple mRNA types from a single gene


      Why do some genes produce more than one type of mRNA?

      Why are the 5' and 3' untranslated regions of variable length in the population of mRNAs for a particular gene?

      How does alternative splicing increase the number of protein products potentially encoded by one gene?



Diagram of a DNA segment (a gene) and with the start and stop sites of its transcription into RNA.





Diagram of an RNA (in blue) synthesized by RNA polymerase II using the DNA template.


5' end of the mRNA 3' end of the mRNA


Making mRNA


The DNA genome is the template for transcription of genetic (inherited) information into RNA. RNAs are the working copies of genetic information in the individual cells.


Messenger RNA (mRNA) is transcribed by the enzyme RNA polymerase II from DNA segments designated as genes, the information-coding sections of the genome. In nearly all cases, the genes encode protein products. Maize has about 50,000 distinct genes. To synthesize a protein, the mRNA corresponding to the sense strand of the DNA of a gene is first transcribed from that DNA, and then the mRNA is "decoded" by translation into a protein. The decoding uses a triplet code of nucleotides to specify individual amino acids. DNA is composed of A, T, G, and C nucleotides, and 3 such nucleotides can specify one of the 20 amino acids that are the constituents of proteins. For example, ATG = methionine. Each nucleotide in the code is part of just one triplet (a codon), hence the linear order of the nucleotides specifies the linear order of amino acids in the protein product. This fact is called colinearity.


The mRNA product is called the primary transcript, and it is a precise representation of one strand of the DNA template generated using the rules of complementarity -- the same rules used to ensure faithful copying of DNA during DNA replication. This rule is a simple one: nucleotide A pairs only with T and G only with C. To make mRNA, the coding strand of the DNA double helix is used as the template to polymerize the complementary bases into mRNA: if there is a C in the DNA there will be a G in the mRNA, if a T in the DNA then an A in the RNA, etc. There is one composition difference between RNA and DNA: in RNA a U nucleotide is used instead of a T. A DNA segment of TAC is copied into mRNA as AUG, and this codon specifies the amino acid methionine during the beginning of translation into the protein chain.


5' End of the mRNA is Modified after Transcription


Nucleotides start with 3 phosphate groups (high energy moieties) prior to polymerization. The first base of an mRNA is the 5' nucleotide; it retains all 3 phosphate groups while the subsequent nucleotides retain only a single phosphate. The other 2 phosphate groups are removed to permit linking each nucleotide to the one preceding it. The 5' nucleotide is further distinguished by the addition of a cap nucleotide (an inverted G residue) that blocks the attack of ribonucleases (enzymes that destroy mRNA) and serves as a binding site for cap binding proteins that further protect the mRNA.


Exploiting the cap to recover Full Length mRNA (FLmRNA). Antibodies specific for the 5' cap complex -- either the 5' cap nucleotide or the 5' cap binding proteins -- are used in the laboratory to purify FLmRNA. Pieces of mRNA from a broken molecule will lack the cap. Recovery of capped mRNA is a key step in purification of FLmRNA for construction of FLcDNAs that are used in sequencing.


3' End of the mRNA is Modified after Transcription


When RNA Polymerase II is programmed to cease transcription at the end of a gene, the last base transcribed is the 3' end. This end is recognized by a special RNA polymerase that adds approximately 50 - 200 A nucleotides to the 3' end; this step does not require a DNA template. The string of A residues, called the poly(A) tail, is recognized by poly(A) binding proteins. Consequently both the 5' and 3' ends of mRNA are bound by proteins, and these interactions stabilize mRNA.


Exploiting the poly(A) tail to recover FLmRNA. A complete mRNA will have a poly(A) tail, and these molecules can be retrieved by virtue of the affinity of the A tail for a string of T residues, based on the rule of complementarity. Oligomeric T segments are attached to small beads, the beads mixed with RNA, and only the poly(A)+ tails of mRNA sticks. By sequentially selecting mRNAs that have both a cap and a poly(A) tail, intact FLmRNAs are purified from a larger pool including incomplete mRNAs.


FLmRNAs from One Gene Can Be Heterogeneous


5' Heterogeneity. RNA polymerase II starts transcription from DNA by positioning on nucleotide motifs of 6 - 10 bases in the promoter of a gene. The promoter is the regulatory region of the gene and is usually not transcribed. The choice of which motifs to use and hence where to start transcription is strongly influenced by other DNA binding proteins called transcription factors. Think of a series of streetlights on a highway. At any green light, your car can "go." Similarly, for most genes there are multiple RNA polymerase II "start" sites where transcription can initiate. Consequently, the population of FLmRNAs for a specific gene often have different 5' nucleotides. Aligning the population of molecules against the DNA sequences shows a nested set of start sites often spread over several hundred bases. Each of these represents a discrete start site for one RNA polymerase II transcription event of that gene. Typically, one site is used most often while the other sites are restricted to specific circumstances, such as an environmental condition (for example, cold or heat stress) or are used in a specific cell type. In the vicinity of each start site are the DNA motifs that position RNA polymerase II and associated transcription factors for transcriptional initiation. Hence knowledge of the various 5' start sites for FLmRNAs is a first step in identifying the relevant DNA motifs important in the expression of that particular gene.


3' Heterogeneity. The stop signs for RNA polymerase II at the end of genes are A+T rich motifs. These stop motifs are found in multiple locations, hence the 3' nucleotide position for a particular gene can vary in the population of FLmRNAs from that gene. In general, the longest FLmRNA is chosen for complete sequencing -- it's 5' and 3' ends extend further than other FLmRNAs available for that gene.


Introns: Internal Heterogeneity in mRNA Composition is Also Possible



Exons of mRNAs contain the triplet codons translated into proteins. In many genes the exon regions are separated by non-coding regions called introns (yellow boxes in the diagram above). These intron regions are spliced out of a primary mRNA transcript, and the flanking exons are joined together to make a continuous set of coding exons, as shown by the solid blue line in the diagram below. This processed mRNA is now ready for translation into protein. The codons start with the AUG at the green dot and continue through the red dot where the stop codon is located.



Although a small fraction of genes lack introns, a typical maize or plant gene has several. A few genes have more than 20 introns. Most introns are within the coding region as described above, and their removal is crucial to establishing the "open reading frame of correct codons" of the encoded protein. The importance of this point is illustrated in the next section. Introns can also be found in the 5' untranslated region (5' UTR), which is the stretch of RNA bases between the 5' end of the mRNA and the start codon (AUG) for protein translation initiation. Similarly introns are occasionally found in the 3' UTR, which is the stretch of bases between the triplet stop codon specifying the last amino acid in the protein and the poly(A) tail.


Alternative Intron Processing Can Result in Multiple Proteins

from One Gene


      Intron retention can add amino acids to a protein. In some cases an intron is not removed from the transcript. If this intron contains nucleotides in a multiple of 3 (such as 90 nucleotides) it will specify a set of amino acids (90 nucleotides = 30 codons = 30 amino acids) added to the protein; after the intron, the exon codons are read in the normal way.


      Intron retention can result in a highly altered protein. If the retained intron is not a multiple of 3, the triplet reading frame of the following exon will be "misread" because of frameshifting. If the original codons were CCC AAA GUU CAC, but an unspliced intron causes the groups of 3 to be formed using a "G" contributed by the intron as GCC CAA AGU UCA, an entirely different set of four amino acids are encoded in this region and continuing on throughout the exon until a stop codon is encountered. In many cases a stop codon is encountered in the alternative "reading frame (alternative triplets)" and a truncated, shorter protein is produced.


      Intron skipping deletes some information from the expected protein product. If there are several introns in a gene, each should be removed discretely, because the 5' and 3' edges of an intron are recognized as a unit; the mRNA is cut, and the intron is removed and the two flanking exons are joined together. Sometimes, however, the 5' edge of intron 1 is joined to the 3' edge of intron 2 (or intron 3 or an even more distant intron) removing both introns and the intervening exon 2. The codons encoded by exon 2 are missing from the protein.


Alternative intron processing events such as these are readily recognized when multiple FLmRNAs of the same gene are characterized. The standard splicing pattern yields the "predicted" mRNA, while the alternative products are either missing some expected RNA segments or retain segments that should have been removed. Current estimates suggest that 10% or more of plant mRNAs exhibit alternative processing events and hence will encode two or more different protein products. These protein products are identical up to the point where the alternative splicing event occurs.


Some alternative splicing is highly regulated, that is, it occurs only during specific conditions or in specific cell types. For these cases, scientists infer that the alternative proteins play some crucial role in the places where they are synthesized. In many cases of alternative splicing, however, there is no current rationale for why the events occur. There may be an intrinsic error to intron removal resulting in the diverse transcript types that can be observed for the mRNA molecules of some genes. Additional proof of the function of the various proteins is required to establish that alternative splicing is a meaningful event.