molecular mechanisms – bioliteracy / biofundamentals

Molecular bumper cars (RNA polymerase-ribosomal interactions): their (unexpected) functional effects and how to control them

Cells are extremely complex.¹ Much of their “core” complexity appears to have been present in their last (universal) common ancestor, known as LUCA. We find it in the “conserved” molecular mechanisms and machines active in modern cells. LUCA and its offspring are membrane-bounded, non-equilibrium systems that import free energy and export entropy to maintain and repair themselves, to grow, behave, and reproduce (and all the other things living things do). One problem, however, with LUCA is that it makes speculation on the steps that preceded it impossible to know with certainty. Not withstanding claims of breakthroughs (e.g. ‘Monumental’ experiment suggests how life on Earth may have started“), it is likely that we will never know the actual steps involved; after all, the origin of life occurred billions of years ago and under rather different conditions than exist today.

Living systems “work” based on inherited, pre-existing molecular machines and mechanisms (1). The actions of these machines are fueled through coupling to thermodynamically favorable reactions taking place under non-equilibrium conditions, i.e. the living state. Looking at the details of these interactions reveals interesting and unexpected behaviors. Unfortunately, the “simple” physical-chemical underpinnings of these processes, key to understanding them, are not always presented to students effectively (2). At the same time, the complexity of cellular systems means that in practice, the link between “simple” molecular mechanisms and the behavior of a biological systems can be obscure (see 3). That said, key insights are illuminated when molecular mechanisms are examined, as illustrated by Wee et al., (2023)(4).

Emerging from LUCA, biological populations have diverged into distinct “prokaryotic” lineages: the bacteria and archaea.² Both are defined by a protein-lipid boundary layer, the plasma membrane. Within this membrane is a single internal compartment, the cytoplasm. Information is stored in cells in two forms, first in the on-going LUCA-derived living system and the second, information in the sequence of double-stranded DNA molecules. These two types of information are interdependent: the information in DNA makes sense only within a cell and the on-going cellular processes depend upon and utilize the information in DNA. In bacteria and archaea, these are circular double-stranded DNA molecules. Here we restrict our discussion to the common unicellular bacterium Escherichia coli (E. coli), one of the workhorse systems that led to an understanding of core molecular mechanisms.

E. coli hasa single circular genomic DNA molecule of ~5 million nucleotide base pairs in length; it contains about 5000 distinct genes that encode polypeptides and functional “non-coding” RNA molecules (if you are interested in numbers, check out bionumbers). An E. coli cell is rod-shaped and ~1 micrometer (10^-6 meters) long. Its genomic DNA molecule is ~1000 times longer than the cell that contains it, and a rapidly dividing cell can contain multiple copies of the genome. Genes typically contain two distinct functional regions. Regulatory regions interact with various proteins that determine whether a gene is “expressed” or not. Coding regions specify what is expressed. The first step is the synthesis of an RNA molecule; such a molecule can encode a polypeptide or a non-coding RNA.³ Non-coding RNAs can have structural, catalytic, or regulatory functions.

The first step in gene expression in all cell types is the binding of proteins to a gene’s regulatory sequences. Typically a complex of proteins leads to the binding and activation of a DNA-dependent, RNA polymerase. The RNA polymerase complex unwinds a specific region of the DNA and uses the complementary nature of nucleotide base pairing: A binding to T in DNA and U in RNA, and C binding to G in both, to synthesize an RNA molecule based on DNA sequence. Synthesis of RNAs that encode polypeptides, known as messenger RNAs (mRNAs) starts with the 5′ end of the molecule and moves toward the 3′ end (replaced ↓ soon).

In prokaryotic cells, both DNA and mRNA synthesis reactions occur in the cytoplasm. A ribosome, a molecular machine composed of multiple proteins and RNAs, can engage the 5′ end region of an mRNA as soon as it appears – before the synthesis of the mRNA is complete. The cytoplasm of a cell contains lots of ribosomes; in E. coli there are ~70,000 ribosomes per cell (more or less). This leads to some interesting and functionally significant interactions. One thing to consider, not always stressed, is that these synthetic processes are not error proof. DNA replication (DNA-directed, DNA synthesis), transcription (DNA-directed, RNA synthesis), and polypeptide synthesis (RNA-directed, polypeptide synthesis) all have an error rate, typically 1 error per ~10⁶ addition events for DNA replication and transcription. Errors can lead to mutations in the DNA, RNAs that encode abnormal proteins, or abnormal and potentially toxic polypeptides.

To deal with physical realities, these synthetic processes employ various “error correction” strategies. In the case of DNA and RNA synthesis, the polymerases involved have what is known as “proof-reading” activities. If the incorrect nucleotide is inserted into a growing DNA or RNA chain, it can be recognized; the polymerase can then “reverse” (move backward along the DNA), remove the mistakenly inserted nucleotide, and then move forward again, adding the correct nucleotide. Key here is that the polymerase is moving back and forth along the DNA strand. The result of proof-reading is to reduce the error rate of DNA-dependent DNA and RNA synthesis substantially, down to 10^-8 to 10^-10 per base pair in the case of DNA synthesis.

In the case of the RNA polymerase, the newly synthesized RNA can fold back on itself, forming what is known as a “hairpin”. This hairpin “can stabilize an elemental pause (in RNA synthesis) an allosteric interaction with the β-flap tip helix of RNAP”. What Wee et al (4) report is as the mRNA-associated ribosome moves along the RNA it unfold the hairpin and “bumps” into the polymerase, inhibiting this “pause” which increases the rate of mRNA synthesis and inhibits the polymerase’s error correction function. The resulting mRNA population has more frequent base pair changes, errors that can influence the polypeptides synthesized. While cells of all types have various “chaperone” systems that can deal with misfolded proteins that arise in response to various stresses or errors, these can be overwhelmed. The resulting misfolded (damaged) proteins can lead to cellular defects and long term effects on viability (discussed in 5).

About 1.5 billion years later (give or take), a new type of cell appeared, the result (apparently) of a symbiotic interaction between an archaeal-like “host” and a O₂-utilizing bacterium. This synthetic organism, the progenitor of the eukaryotes, differed from either type of prokaryote in that it sequestered its genome, now composed of linear DNA molecules, within a double membrane bounded “nuclear” compartment. In this hybrid cell type, DNA and RNA synthesis was confined to the nucleus, while ribosomes and polypeptide synthesis were confined to the cytoplasm. Eukaryotic cells are typically much larger that prokaryotic cells, reproduce more slowly, and are more complex in terms of the numbers of genes, and the amount of genomic DNA they contain. It is tempting to speculated that while rapidly dividing, relatively simple prokaryotic cells may be able to tolerate more mistakes in terms of the synthesis of their polypeptides, larger, more complex eukaryotic cells would be vulnerable. A plausible result would be a selection pressure to separating RNA from polypeptide synthesis.

literature cited

Alberts, B. (1998). The cell as a collection of protein machines: preparing the next generation of molecular biologists. Cell, 92, 291-294.
de Lorenzo, V., 2024. The principle of uncertainty in biology: Will machine learning/artificial intelligence lead to the end of mechanistic studies?. Plos Biology, 22, p.e3002495.
Klymkowsky, M.W., 2021. Making mechanistic sense: are we teaching students what they need to know? Developmental Biology, 476, pp.308-313.
Wee et al., 2023. A trailing ribosome speeds up RNA polymerase at the expense of transcript fidelity via force and allostery. Cell, 186, pp.1244-1262.
Klymkowsky, M.W., 2019. Filaments and phenotypes: cellular roles and orphan effects associated with mutations in cytoplasmic intermediate filament proteins. F1000Research, 8.

Footnotes

if you want brush up on you molecular biology, check out chapter 7 of biofundamentals ↩︎
Image from Govindjee – doi:10.3389/fpls.2011.00028, CC BY 3.0. Given the diversity of biological systems, these are general descriptions – often there a exceptions, but recognizing them all makes generating a coherent narrative difficult (and beyond me). Mea culpa. ↩︎
bioliteracy link: When is a gene product a protein when is it a polypeptide? ↩︎

Genes – way weirder than you thought

Pretty much everyone, at least in societies with access to public education or exposure to media in its various forms, has been introduced to the idea of the gene, but “exposure does not equate to understanding” (see Lanie et al., 2004). Here I will argue that part of the problem is that instruction in genetics (or in more modern terms, the molecular biology of the gene and its role in biological processes) has not kept up with the advances in our understanding of the molecular mechanisms underlying biological processes (Gayon, 2016).

Let us reflect (for a moment) on the development of the concept of a gene: Over the course of human history, those who have been paying attention to such things have noticed that organisms appear to come in “types”, what biologists refer to as species. At the same time, individual organisms of the same type are not identical to one another, they vary in various ways. Moreover, these differences can be passed from generation to generation, and by controlling which organisms were bred together; some of the resulting offspring often displayed more extreme versions of the “selected” traits. By strictly controlling which individuals were bred dogs
together, over a number of generations, people were able to select for the specific traits they desired (→). As an interesting aside, as people domesticated animals, such as cows and goats, the availability of associated resources (e.g. milk) led to reciprocal effects – resulting in traits such as adult lactose tolerance (see Evolution of (adult) lactose tolerance & Gerbault et al., 2011). Overall, the process of plant and animal breeding is generally rather harsh (something that the fanciers of strange breeds who object to GMOs might reflect upon), in that individuals that did not display the desired trait(s) were generally destroyed (or at best, not allowed to breed).

Charles Darwin took inspiration from this process, substituting “natural” for artificial (human-determined) selection to shape populations, eventually generating new species (Darwin, 1859). Underlying such evolutionary processes was the presumption that traits, and their variation, was “encoded” in some type of “factors”, eventually known as genes and their variants, alleles. Genes influenced the organism’s molecular, cellular, and developmental systems, but the nature of these inheritable factors and the molecular trait building machines active in living systems was more or less completely obscure.

Through his studies on peas, Gregor Mendel was the first to clearly identify some of the rules for the behavior of these inheritable factors using highly stereotyped, and essentially discontinuous traits – a pea was either yellow or green, wrinkled or smooth. Such traits, while they exist in other organisms, are in fact rare – an example of how the scientific exploration of exceptional situations can help understand general processes, but the downside is the promulgation of the idea that genes and traits are somehow discontinuous – that a trait is yes/no, displayed by an organism or not – in contrast to the realities that the link between the two is complex, a reality rarely directly addressed (apparently) in most introductory genetics courses. Understanding such processes is critical to appreciating the fact that genetics is often not destiny, but rather alterations in probabilities (see Cooper et al., 2013). Without such an more nuanced and realistic understanding, it can be difficult to make sense of genetic information.

A gene is part of a molecular machine: A number of observations transformed the abstraction of Darwin’s and Mendel’s hereditary factors into physical entities and molecular mechanisms (1). In 1928 Fred Griffith demonstrated that a genetic trait could be transferred from dead to living organisms – implying a degree of physical / chemical stability; subsequent observations implied that the genetic information transferred involved DNA molecules. The determination of the structure of double-stranded DNA immediately suggested how information could be stored in DNA (in variations of bases along the length of the molecule) and how this information could be duplicated (based on the specificity of base pairing). Mutations could be understood as changes in the sequence of bases along a DNA molecule (introduced by chemicals, radiation, mistakes during replication, or molecular reorganizations associated with DNA repair mechanisms and selfish genetic elements.

But on their own, DNA molecules are inert – they have functions only within the context of a living organism (or highly artificial, that is man made, experimental systems). The next critical step was to understand how a gene works within a biological system, that is, within an organism. This involve appreciating the molecular mechanisms (primarily proteins) involved in identifying which stretches of a particular DNA molecule were used as templates for the synthesis of RNA molecules, which in turn could be used to direct the synthesis of polypeptides (see previous post on polypeptides and proteins). In the context of the introductory biology courses I am familiar with (please let me know if I am wrong), these processes are based on a rather deterministic context; a gene is either on or off in a particular cell type, leading to the presence or absence of a trait. Such a deterministic presentation ignores the stochastic nature of molecular level processes (see past post: Biology education in the light of single cell/molecule studies) and the dynamic interaction networks that underlie cellular behaviors.

But our level of resolution is changing rapidly (2). For a number of practical reasons, when the human genome was first sequence, the identification of polypeptide-encoding genes was based on recognizing “open-reading frames” (ORFs) encoding polypeptides of > 100 amino acids in length (> 300 base long coding sequence). The increasing sensitivity of mass spectrometry-based proteomic studies reveals that smaller ORFs (smORFs) are present and can lead to the synthesis of short (< 50 amino acid long) polypeptides (Chugunova et al., 2017; Couso, 2015). Typically an ORF was considered a single entity – basically one gene one ORF one polypeptide (3). A recent, rather surprising discovery is what are known as “alternative ORFs” or altORFs; these RNA molecules that use alternative reading frames to encode small polypeptides. Such altORFs can be located upstream, downstream, or within the previously identified conventional ORF
(figure →)(see Samandi et al., 2017). The implication, particularly for the analysis of how variations in genes link to traits, is that a change, a mutation or even the experimental deletion of a gene, a common approach in a range of experimental studies, can do much more than previously presumed – not only is the targeted ORF effected, but various altORFs can also be modified.

The situation is further complicated when the established rules of using RNAs to direct polypeptide synthesis via the process of translation, are violated, as occurs in what is known as “repeat-associated non-ATG (RAN)” polypeptide synthesis (see Cleary and Ranum, 2017). In this situation, the normal signal for the start of RNA-directed polypeptide synthesis, an AUG codon, is subverted – other RNA synthesis start sites are used leading to underlying or imbedded gene expression. This process has been found associated with a class of human genetic diseases, such as amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) characterized by the expansion of simple (repeated) DNA sequences (see Pattamatta et al., 2018). Once they exceed a certain length, such“repeat” regions have been found to be associated with the (apparently) inappropria repeat region RAN process
te transcription of RNA in both directions, that is using both DNA strands as templates (← A: normal situation, B: upon expansion of the repeat domain). These abnormal repeat region RNAs are translated via the RAN process to generate six different types of toxic polypeptides.

So what are the molecular factors that control the various types of altORF transcription and translation? In the case of ALS and FTD, it appears that other genes, and the polypeptides and proteins they encode, are involved in regulating the expression of repeat associated RNAs (Kramer et al., 2016)(Cheng et al., 2018). Similar or distinct mechanisms may be involved in other neurodegenerative diseases (Cavallieri et al., 2017).

So how should all of these molecular details (and it is likely that there are more to be discovered) influence how genes are presented to students? I would argue that DNA should be presented as a substrate upon which various molecular mechanisms occur; these include transcription in its various forms (directed and noisy), as well as DNA synthesis, modification, and repair mechanisms occur. Genes are not static objects, but key parts of dynamic systems. This may be one reason that classical genetics, that is genes presented within a simple Mendelian (gene to trait) framework, should be moved deeper into the curriculum, where students have the background in molecular mechanisms needed to appreciate its complexities, complexities that arise from the multiple molecular machines acting to access, modify, and use the information captured in DNA (through evolutionary processes), thereby placing the gene in a more realistic cellular perspective (4).

Footnotes:

1. Described greater detail in biofundamentals™

2. For this discussion, I am completely ignoring the roles of genes that encode RNAs that, as far as is currently know, do not encode polypeptides. That said, as we go on, you will see that it is possible that some such non-coding RNA may encode small polypeptides.

3. I am ignoring the complexities associated with alternative promoter elements, introns, and the alternative and often cell-type specific regulated splicing of RNAs, to create multiple ORFs from a single gene.

4. With respects to Norm Pace – assuming that I have the handedness of the DNA molecules wrong or have exchanged Z for A or B.

literature cited:

Cavallieri et al, 2017. C9ORF72 and parkinsonism: Weak link, innocent bystander, or central player in neurodegeneration? Journal of the neurological sciences 378, 49.
Cheng et al, 2018. C9ORF72 GGGGCC repeat-associated non-AUG translation is upregulated by stress through eIF2α phosphorylation. Nature communications 9, 51.
Chugunova et al, 2017. Mining for small translated ORFs. Journal of proteome research 17, 1-11.
Cleary & Ranum, 2017. New developments in RAN translation: insights from multiple diseases. Current opinion in genetics & development 44, 125-134.
Cooper et al, 2013. Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Human genetics 132, 1077-1130.
Couso, 2015. Finding smORFs: getting closer. Genome biology 16, 189.
Darwin, 1859. On the origin of species. London: John Murray.
Gayon, 2016. From Mendel to epigenetics: History of genetics. Comptes rendus biologies 339, 225-230.
Gerbault et al, 2011. Evolution of lactase persistence: an example of human niche construction. Philosophical Transactions of the Royal Society of London B: Biological Sciences 366, 863-877.
Kramer et al, 2016. Spt4 selectively regulates the expression of C9orf72 sense and antisense mutant transcripts. Science 353, 708-712.
Lanie et al, 2004. Exploring the public understanding of basic genetic concepts. Journal of genetic counseling 13, 305-320.
Pattamatta et al, 2018. All in the Family: Repeats and ALS/FTD. Trends in neurosciences 41, 247-250.
Samandi et al, 2017. Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins. Elife 6.