Description of
mRNA Structure
Answers to
questions about multiple mRNA types from a single gene
Ø
Why
do some genes produce more than one type of mRNA?
Ø
Why
are the 5' and 3' untranslated regions of variable length in the population of
mRNAs for a particular gene?
Ø
How
does alternative splicing increase the number of protein products potentially
encoded by one gene?
Diagram
of a DNA segment (a gene) and with the start and stop sites of its
transcription into RNA.
Diagram of an RNA (in blue) synthesized by RNA
polymerase II using the DNA template.
5' end of the mRNA
3' end of the mRNA
Making
mRNA
The DNA genome is the template for transcription of
genetic (inherited) information into RNA.
RNAs are the working copies of genetic information in the individual
cells.
Messenger
RNA (mRNA) is transcribed by the enzyme RNA polymerase II from DNA segments
designated as genes, the
information-coding sections of the genome.
In nearly all cases, the genes encode protein products. Maize has about 50,000 distinct genes. To synthesize a protein, the mRNA
corresponding to the sense strand of the DNA of a gene is first transcribed
from that DNA, and then the mRNA is "decoded" by translation into a
protein. The decoding uses a triplet
code of nucleotides to specify individual amino acids. DNA is composed of A, T, G, and C
nucleotides, and 3 such nucleotides can specify one of the 20 amino acids that
are the constituents of proteins. For
example, ATG = methionine. Each nucleotide in the code is part of just one triplet (a
codon), hence the linear order of the nucleotides
specifies the linear order of amino acids in the protein product. This fact is called colinearity.
The mRNA product is called the primary transcript, and it is a precise
representation of one strand of the DNA template generated using the rules of complementarity -- the same rules used
to ensure faithful copying of DNA during DNA replication. This rule is a simple one: nucleotide A pairs
only with T and G only with C. To make
mRNA, the coding strand of the DNA double helix is used as the template to
polymerize the complementary bases into mRNA:
if there is a C in the DNA there will be a G in the mRNA, if a T in the
DNA then an A in the RNA, etc. There is
one composition difference between RNA and DNA:
in RNA a U nucleotide is used instead of a T. A DNA segment of TAC is copied into mRNA as
AUG, and this codon specifies the amino acid methionine during the beginning of
translation into the protein chain.
5' End of
the mRNA is Modified after Transcription
Nucleotides start with 3 phosphate groups (high energy
moieties) prior to polymerization. The
first base of an mRNA is the 5' nucleotide; it retains all 3 phosphate groups
while the subsequent nucleotides retain only a single phosphate. The other 2 phosphate groups are removed to
permit linking each nucleotide to the one preceding it. The 5' nucleotide is further distinguished
by the addition of a cap nucleotide (an inverted G residue) that blocks the
attack of ribonucleases (enzymes that destroy mRNA) and serves as a binding
site for cap binding proteins that further protect the mRNA.
Exploiting the
cap to recover Full Length mRNA (FLmRNA). Antibodies specific for the 5' cap complex --
either the 5' cap nucleotide or the 5' cap binding proteins -- are used in the
laboratory to purify FLmRNA. Pieces of
mRNA from a broken molecule will lack the cap.
Recovery of capped mRNA is a key step in purification of FLmRNA for
construction of FLcDNAs that are used in sequencing.
3' End of
the mRNA is Modified after Transcription
When RNA Polymerase II is programmed to cease
transcription at the end of a gene, the last base transcribed is the 3'
end. This end is recognized by a special
RNA polymerase that adds approximately 50 - 200 A nucleotides to the 3' end;
this step does not require a DNA template.
The string of A residues, called the poly(A)
tail, is recognized by poly(A) binding proteins. Consequently both the 5' and 3' ends of mRNA
are bound by proteins, and these interactions stabilize mRNA.
Exploiting the poly(A) tail to
recover FLmRNA. A complete mRNA
will have a poly(A) tail, and these molecules can be
retrieved by virtue of the affinity of the A tail for a string of T residues,
based on the rule of complementarity.
Oligomeric T segments are attached to small beads, the beads mixed with
RNA, and only the poly(A)+ tails of mRNA sticks. By sequentially
selecting mRNAs that have both a cap and a poly(A)
tail, intact FLmRNAs are purified from a larger pool including incomplete
mRNAs.
FLmRNAs
from One Gene Can Be Heterogeneous
5'
Heterogeneity. RNA
polymerase II starts transcription from DNA by positioning on nucleotide motifs
of 6 - 10 bases in the promoter of a
gene. The promoter is the regulatory
region of the gene and is usually not transcribed. The choice of which motifs to use and hence
where to start transcription is strongly influenced by other DNA binding
proteins called transcription factors. Think of a series of streetlights on a
highway. At any green light, your car
can "go." Similarly, for most
genes there are multiple RNA polymerase II "start" sites where
transcription can initiate. Consequently,
the population of FLmRNAs for a specific gene often have
different 5' nucleotides. Aligning the
population of molecules against the DNA sequences shows a nested set of start
sites often spread over several hundred bases.
Each of these represents a discrete start site for one RNA polymerase II
transcription event of that gene.
Typically, one site is used most often while the other sites are
restricted to specific circumstances, such as an environmental condition (for
example, cold or heat stress) or are used in a specific cell type. In the vicinity of each start site are the
DNA motifs that position RNA polymerase II and associated transcription factors
for transcriptional initiation. Hence
knowledge of the various 5' start sites for FLmRNAs is a first step in
identifying the relevant DNA motifs important in the expression of that
particular gene.
3'
Heterogeneity. The stop
signs for RNA polymerase II at the end of genes are A+T
rich motifs. These stop motifs are found
in multiple locations, hence the 3' nucleotide position for a particular gene
can vary in the population of FLmRNAs from that gene. In general, the
longest FLmRNA is chosen for complete sequencing -- it's
5' and 3' ends extend further than other FLmRNAs available for that gene.
Introns:
Internal Heterogeneity in mRNA Composition is Also
Possible
Exons of mRNAs contain the
triplet codons translated into proteins.
In many genes the exon regions are separated by non-coding regions
called introns (yellow boxes in the diagram above). These intron regions are spliced out of a
primary mRNA transcript, and the flanking exons are joined together to make a
continuous set of coding exons, as shown by the solid blue line in the diagram
below. This processed mRNA is now ready
for translation into protein. The codons
start with the AUG at the green dot and continue through the red dot where the
stop codon is located.
Although a small fraction of genes lack introns, a
typical maize or plant gene has several.
A few genes have more than 20 introns.
Most introns are within the coding region as described above, and their removal
is crucial to establishing the "open reading frame of correct codons"
of the encoded protein. The importance
of this point is illustrated in the next section. Introns can also be found in the 5'
untranslated region (5' UTR), which is the stretch of RNA bases between the 5'
end of the mRNA and the start codon (AUG) for protein translation
initiation. Similarly introns are
occasionally found in the 3' UTR, which is the stretch of bases between the
triplet stop codon specifying the last amino acid in the protein and the poly(A) tail.
Alternative Intron Processing Can Result in Multiple Proteins
from One Gene
Ø
Intron retention can add amino acids to a protein. In some cases
an intron is not removed from the transcript.
If this intron contains nucleotides in a multiple of 3 (such as 90
nucleotides) it will specify a set of amino acids (90 nucleotides = 30 codons =
30 amino acids) added to the protein; after the intron, the exon codons are
read in the normal way.
Ø
Intron retention can result in a highly altered protein. If the
retained intron is not a multiple of 3, the triplet reading frame of the
following exon will be "misread" because of frameshifting. If the original codons were CCC AAA
GUU CAC, but an unspliced intron
causes the groups of 3 to be formed using a "G" contributed by the
intron as GCC CAA AGU UCA, an entirely different
set of four amino acids are encoded in this region and continuing on throughout
the exon until a stop codon is encountered.
In many cases a stop codon is encountered in the alternative
"reading frame (alternative triplets)" and a truncated, shorter
protein is produced.
Ø
Intron skipping deletes some information from the expected protein
product. If there are several introns in a gene, each
should be removed discretely, because the 5' and 3' edges of an intron are
recognized as a unit; the mRNA is cut, and the intron is removed and the two
flanking exons are joined together.
Sometimes, however, the 5' edge of intron 1 is joined to the 3' edge of
intron 2 (or intron 3 or an even more distant intron) removing both introns and
the intervening exon 2. The codons
encoded by exon 2 are missing from the protein.
Alternative intron processing events such as these are readily
recognized when multiple FLmRNAs of the same gene are characterized. The standard
splicing pattern yields the "predicted" mRNA, while the alternative
products are either missing some expected RNA segments or retain segments that
should have been removed. Current
estimates suggest that 10% or more of plant mRNAs exhibit alternative
processing events and hence will encode two or more different protein
products. These protein products are
identical up to the point where the alternative splicing event occurs.
Some alternative splicing is highly regulated, that
is, it occurs only during specific conditions or in specific cell types. For these cases, scientists infer that the
alternative proteins play some crucial role in the places where they are
synthesized. In many cases of
alternative splicing, however, there is no current rationale for why the events
occur. There may be an intrinsic error
to intron removal resulting in the diverse transcript types that can be
observed for the mRNA molecules of some genes.
Additional proof of the function of the various proteins is required to
establish that alternative splicing is a meaningful event.