Genes – way weirder than you thought

Pretty much everyone, at least in societies with access to public education or exposure to media in its various forms, has been introduced to the idea of the gene, but “exposure does not equate to understanding” (see Lanie et al., 2004).  Here I will argue that part of the problem is that instruction in genetics (or in more modern terms, the molecular biology of the gene and its role in biological processes) has not kept up with the advances in our understanding of the molecular mechanisms underlying biological processes (Gayon, 2016). spacer bar

Let us reflect (for a moment) on the development of the concept of a gene: Over the course of human history, those who have been paying attention to such things have noticed that organisms appear to come in “types”, what biologists refer to as species. At the same time, individual organisms of the same type are not identical to one  another, they vary in various ways. Moreover, these differences can be passed from generation to generation, and by controlling  which organisms were bred together; some of the resulting offspring often displayed more extreme versions of the “selected” traits.  By strictly controlling which individuals were breddogs
together, over a number of generations, people were able to select for the specific traits they desired (→).  As an interesting aside, as people domesticated animals, such as cows and goats, the availability of associated resources (e.g. milk) led to reciprocal effects – resulting in traits such as adult lactose tolerance (see Evolution of (adult) lactose tolerance & Gerbault et al., 2011).  Overall, the process of plant and animal breeding is generally rather harsh (something that the fanciers of strange breeds who object to GMOs might reflect upon), in that individuals that did not display the desired trait(s) were generally destroyed (or at best, not allowed to breed). spacer bar

Charles Darwin took inspiration from this process, substituting “natural” for artificial (human-determined) selection to shape populations, eventually generating new species (Darwin, 1859).  Underlying such evolutionary processes was the presumption that traits, and their variation, was “encoded” in some type of “factors”, eventually known as genes and their variants, alleles.  Genes influenced the organism’s molecular, cellular, and developmental systems, but the nature of these inheritable factors and the molecular trait building machines active in living systems was more or less completely obscure. 

Through his studies on peas, Gregor Mendel was the first to clearly identify some of the rules for the behavior of these inheritable factors using highly stereotyped, and essentially discontinuous traits – a pea was either yellow or green, wrinkled or smooth.  Such traits, while they exist in other organisms, are in fact rare – an example of how the scientific exploration of exceptional situations can help understand general processes, but the downside is the promulgation of the idea that genes and traits are somehow discontinuous – that a trait is yes/no, displayed by an organism or not – in contrast to the realities that the link between the two is complex, a reality rarely directly addressed (apparently) in most introductory genetics courses.  Understanding such processes is critical to appreciating the fact that genetics is often not destiny, but rather alterations in probabilities (see Cooper et al., 2013).  Without such an more nuanced and realistic understanding, it can be difficult to make sense of genetic information.     spacer bar

A gene is part of a molecular machine:  A number of observations transformed the abstraction of Darwin’s and Mendel’s hereditary factors into physical entities and molecular mechanisms (1).  In 1928 Fred Griffith demonstrated that a genetic trait could be transferred from dead to living organisms – implying a degree of physical / chemical stability; subsequent observations implied that the genetic information transferred involved DNA molecules. The determination of the structure of double-stranded DNA immediately suggested how information could be stored in DNA (in variations of bases along the length of the molecule) and how this information could be duplicated (based on the specificity of base pairing).  Mutations could be understood as changes in the sequence of bases along a DNA molecule (introduced by chemicals, radiation, mistakes during replication, or molecular reorganizations associated with DNA repair mechanisms and selfish genetic elements.  

But on their own, DNA molecules are inert – they have functions only within the context of a living organism (or highly artificial, that is man made, experimental systems).  The next critical step was to understand how a gene works within a biological system, that is, within an organism.  This involve appreciating the molecular mechanisms (primarily proteins) involved in identifying which stretches of a particular DNA molecule were used as templates for the synthesis of RNA molecules, which in turn could be used to direct the synthesis of polypeptides (see previous post on polypeptides and proteins).  In the context of the introductory biology courses I am familiar with (please let me know if I am wrong), these processes are based on a rather deterministic context; a gene is either on or off in a particular cell type, leading to the presence or absence of a trait. Such a deterministic presentation ignores the stochastic nature of molecular level processes (see past post: Biology education in the light of single cell/molecule studies) and the dynamic interaction networks that underlie cellular behaviors.  spacer bar

But our level of resolution is changing rapidly (2).  For a number of practical reasons, when the human genome was first sequence, the identification of polypeptide-encoding genes was based on recognizing “open-reading frames” (ORFs) encoding polypeptides of > 100 amino acids in length (> 300 base long coding sequence).  The increasing sensitivity of mass spectrometry-based proteomic studies reveals that smaller ORFs (smORFs) are present and can lead to the synthesis of short (< 50 amino acid long) polypeptides (Chugunova et al., 2017; Couso, 2015).  Typically an ORF was considered a single entity – basically one gene one ORF one polypeptide (3).  A recent, rather surprising discovery is what are known as “alternative ORFs” or altORFs; these RNA molecules that use alternative reading frames to encode small polypeptides.  Such altORFs can be located upstream, downstream, or within the previously identified conventional ORFalternative orfs
(figure →)(see Samandi et al., 2017).  The implication, particularly for the analysis of how variations in genes link to traits, is that a change, a mutation or even the  experimental  deletion of a gene, a common approach in a range of experimental studies, can do much more than previously presumed – not only is the targeted ORF effected, but various altORFs can also be modified.  

The situation is further complicated when the established rules of using RNAs to direct polypeptide synthesis via the process of translation, are violated, as occurs in what is known as “repeat-associated non-ATG (RAN)” polypeptide synthesis (see Cleary and Ranum, 2017).  In this situation, the normal signal for the start of RNA-directed polypeptide synthesis, an AUG codon, is subverted – other RNA synthesis start sites are used leading to underlying or imbedded gene expression.  This process has been found associated with a class of human genetic diseases, such as amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) characterized by the expansion of simple (repeated) DNA sequences  (see Pattamatta et al., 2018).  Once they exceed a certain length, such“repeat” regions have been found to be associated with the (apparently) inappropriarepeat region RAN process
te transcription of RNA in both directions, that is using both DNA strands as templates (← A: normal situation, B: upon expansion of the repeat domain).  These abnormal repeat region RNAs are translated via the RAN process to generate six different types of toxic polypeptides. spacer bar

So what are the molecular factors that control the various types of altORF transcription and translation?  In the case of ALS and FTD, it appears that other genes, and the polypeptides and proteins they encode, are involved in regulating the expression of repeat associated RNAs (Kramer et al., 2016)(Cheng et al., 2018).  Similar or distinct mechanisms may be involved in other  neurodegenerative diseases  (Cavallieri et al., 2017).  

So how should all of these molecular details (and it is likely that there are more to be discovered) influence how genes are presented to students?  I would argue that DNA should be presented as a substrate upon which various molecular mechanisms occur; these include transcription in its various forms (directed and noisy), as well as DNA synthesis, modification, and repair mechanisms occur.   Genes are not static objects, but key parts of dynamic systems.  This may be one reason that classical genetics, that is genes presented within a simple Mendelian (gene to trait) framework, should be moved deeper into the curriculum, where students have the background in molecular mechanisms needed to appreciate its complexities, complexities that arise from the multiple molecular machines acting to access, modify, and use the information captured in DNA (through evolutionary processes), thereby placing the gene in a more realistic cellular perspective (4). 

Footnotes:

1. Described greater detail in biofundamentals™

2. For this discussion, I am completely ignoring the roles of genes that encode RNAs that, as far as is currently know, do not encode polypeptides.  That said, as we go on, you will see that it is possible that some such non-coding RNA may encode small polypeptides.  

3. I am ignoring the complexities associated with alternative promoter elements, introns, and the alternative and often cell-type specific regulated splicing of RNAs, to create multiple ORFs from a single gene.  

4. With respects to Norm Pace – assuming that I have the handedness of the DNA molecules wrong or have exchanged Z for A or B. 

literature cited: 

  • Cavallieri et al, 2017. C9ORF72 and parkinsonism: Weak link, innocent bystander, or central player in neurodegeneration? Journal of the neurological sciences 378, 49.
  • Cheng et al, 2018. C9ORF72 GGGGCC repeat-associated non-AUG translation is upregulated by stress through eIF2α phosphorylation. Nature communications 9, 51.
  • Chugunova et al, 2017. Mining for small translated ORFs. Journal of proteome research 17, 1-11.
  • Cleary & Ranum, 2017. New developments in RAN translation: insights from multiple diseases. Current opinion in genetics & development 44, 125-134.
  • Cooper et al, 2013. Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease. Human genetics 132, 1077-1130.
  • Couso, 2015. Finding smORFs: getting closer. Genome biology 16, 189.
  • Darwin, 1859. On the origin of species. London: John Murray.
  • Gayon, 2016. From Mendel to epigenetics: History of genetics. Comptes rendus biologies 339, 225-230.
  • Gerbault et al, 2011. Evolution of lactase persistence: an example of human niche construction. Philosophical Transactions of the Royal Society of London B: Biological Sciences 366, 863-877.
  • Kramer et al, 2016. Spt4 selectively regulates the expression of C9orf72 sense and antisense mutant transcripts. Science 353, 708-712.
  • Lanie et al, 2004. Exploring the public understanding of basic genetic concepts. Journal of genetic counseling 13, 305-320.
  • Pattamatta et al, 2018. All in the Family: Repeats and ALS/FTD. Trends in neurosciences 41, 247-250.
  • Samandi et al, 2017. Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins. Elife 6.

Ideas are cheap, theories are hard

In the context of public discourse, there are times when one is driven to simple, reflexive and often disproportionate (exasperated) responses.  That happens to me whenever people talk about the various theories that they apply to a process or event.  I respond by saying (increasingly silently to myself), that what they mean is really that they have an idea, a model, a guess, a speculation, or a comforting “just-so” story. All too often such competing “theories” are flexible enough to explain (or explain away) anything, depending upon one’s predilections. So why a post on theories?  Certainly the  point as been made before (see Ghose. 2013. “Just a Theory”: 7 Misused Science Words“). Basically because the misuse of the term theory, whether by non-scientists, scientists, or science popularizers, undermines understanding of, and respect for the products of the scientific enterprise.  It confuses hard won knowledge with what are often superficial (or self-serving) opinions. When professors, politicians, pundits, PR flacks, or regular people use the word theory, they are all too often, whether consciously or not, seeking to elevate their ideas through the authority of science.    

So what is the big deal anyway, why be an annoying pain in the ass (see Christopher DiCarlo’s video), challenging people, making them uncomfortable, and making a big deal about something so trivial.  But is it really trivial?  I think not, although it may well be futile or quixotic.  The inappropriate use of the word theory, particularly by academics, is an implicit attempt to gain credibility.  It is also an attack on the integrity of science.  Why?  Because like it or not, science is the most powerful method we have to understand how the world works, as opposed to what the world or our existence within the world means.  The scientific enterprise, abiding as it does by explicit rules of integrity, objective evidence, logical and quantifiable implications, and their testing has been a progressive social activity, leading to useful knowledge – knowledge that has eradicated small pox and polio (almost) and produced iPhones, genetically modified organisms, and nuclear weapons.  That is not to say that the authority of science has not been repeatedly been used to justify horrific sociopolitical ideas, but those ideas have not been based on critically evaluated and tested scientific theories, but on variously baked ideas that claim the support of science (both the eugenics and anti-vaccination movements are examples).   

Modern science is based on theories, ideas about the universe that explain and predict what we will find when we look (smell, hear, touch) carefully at the world around us.  And these theories are rigorously and continually tested, quantitatively – in fact one might say that the ability to translate a theory into a quantitative prediction is one critical hallmark of a real versus an ersatz (non-scientific) theory [here is a really clever approach to teaching students about facts and theories, from David Westmoreland 

So where do (scientific) theories come from?  Initially they are guesses about how the world works, as stated by Richard Feynman and the non-scientific nature of vague “theories”.  Guesses that have evolved based on testing, confirmation, and where wrong – replacement with more and more accurate, logically well constructed and more widely applicable constructs – an example of the evolution of scientific knowledge.  That is why ideas are cheap, they never had, or do not develop the disciplinary rigor necessary to become a theory.  In fact, it often does not even matter, not really, to the people propounding these ideas whether they correspond to reality at all, as witness the stream of tweets from various politicians or the ease with which many apocalyptic predictions are replaced when they turn out to be incorrect.  But how is the average person to identify the difference between a (more or less half-baked) idea and a scientific theory?  Probably the easiest way is to ask, is the idea constantly being challenged, validated, and where necessary refined by both its proponents and its detractors.  One of the most impressive aspects of Einstein’s theory of general relativity is the accuracy of its predictions (the orbit of Mercury, time-dilation, and gravitational waves (link)), predictions that if not confirmed would have forced its abandonment – or at the very least serious revision.  It is this constant application of a theory, and the rigorous testing of its predictions (if this, then that) that proves its worth.  

Another aspect of a scientific theory is whether it is fecund or sterile.  Does its application lead to new observations that it can explain?  In contrast, most ideas are dead ends.  Consider the recent paper on the possibility that life arose outside of the Earth, a proposal known as pan-spermia (1) – “a very plausible conclusion – life may have been seeded here on Earth by life-bearing comets” – and recently tunneling into  the web’s consciousness in stories implying the extra-terrestrial origins of cephalopods (see “no, octopuses don’t come from outer space.”)  Unfortunately, no actual biological insights emerge from this idea (wild speculation), since it simply displaces the problem, if life did not arise here, how did it arise elsewhere?  If such ideas are embraced, as is the case with many religious ideas, their alteration often leads to violent schism rather than peaceful refinement. Consider, as an example, an idea had by an archaic Greek or two that the world was made of atoms. These speculations were not theories, since their implications were not rigorously tested.  The modern atomic theory has been evolving since its introduction by Dalton, and displays the diagnostic traits of a scientific theory.  Once introduced to explain the physical properties of matter, it led to new discoveries and explanations for the composition and structure of atoms themselves (electrons, neutrons, and protons), and then to the composition and properties of these objects, quarks and such (link to a great example.)   

Scientific theories are, by necessity, tentative (again, as noted by Feynman) – they are constrained and propelled by new and more accurate observations.  A new observation can break a theory, leading it to be fixed or discarded.  When that happens, the new theory explains (predicts) all that the old theory did and more.  This is where discipline comes in; theories must meet strict standards – the result is that generally there cannot be two equivalent theories that explain the same phenomena – one (or both) must be wrong in some important ways.  There is no alternative, non-atomic theory that explains the properties of matter.  

The assumption is that two “competing” theories will make distinctly different predictions, if we look (and measure) carefully enough. There are rare cases where two “theories” make the same predictions; the classic example is the Ptolemaic Sun-centered and the Copernican Earth-centered models of the solar system.  Both explained the appearances  of planetary motion more or less equally well, and so on that basis there was really no objective reason to choose between them.  In part, this situation arose from an unnecessary assumption underlying both models, namely that celestial objects moved in perfect circular orbits – this assumption necessitated the presence of multiple “epicycles” in both models.  The real advance came with Kepler’s recognition that celestial objects need not travel in perfect circular orbits, but rather in elliptical orbits; this liberated models of the solar system from the need for epicycles.  The result was the replacement of “theories of solar system movement” with a theory of planetary/solar/galactic motions”.  

Whether, at the end of the day scientific theories are comforting or upsetting, beautiful or ugly remains to be seen, but what is critical is that we defend the integrity of science and call out the non-scientific use of the word theory, or blame ourselve for the further decay of civilization (perhaps I am being somewhat hyperbolic – sorry).

notes: 

1. Although really, pan-oogenia would be better.  Sperm can do nothing without an egg, but an unfertilized egg can develop into an organism, as occurs with bees.  

When is a gene product a protein when is it a polypeptide?

On the left is a negatively-stained electron micrograph of a membrane vesicle isolated from the electric ray Torpedo california, with a muscle-type nicotinic single acetylcholine receptor (AcChR) pointed out . To the right the structure of the AcChR determined to NN resolution using cryoelectron microscopy by Rahman, Teng, Worrell, Noviello, Lee, Karlin, Stowell & Hibbs (2020). “Structure of the native muscle-type nicotinic receptor and inhibition by snake venom toxins.”

As a new assistant professor (1), I was called upon to teach my department’s “Cell Biology” course. I found, and still find, the prospect challenging in part because I am not exactly sure which aspects of cell biology are important for students to know, both in the context of the major, as well as their lives and subsequent careers.  While it seems possible (at least to me) to lay out a coherent conceptual foundation for biology as a whole [see 1], cell biology can often appear to students as an un-unified hodge-podge of terms and disconnected cellular systems, topics too often experienced as a vocabulary lesson, rather than as a compelling narrative. As such, I am afraid that the typical cell biology course often re-enforces an all too common view of biology as a discipline, a view, while wrong in most possible ways, was summarized by the 19th/early 20th century physicist Ernest Rutherford as “All science is either physics or stamp collecting.”  A key motivator for the biofundamentals project [2] has been to explore how to best dispel this prejudice, and how to more effectively present to students a coherent narrative and the key foundational observations and ideas by which to scientifically consider living systems, by any measure the most complex systems in the Universe, systems shaped, but not determined by, physical chemical properties and constraints, together with the historical vagaries of evolutionary processes on an ever-changing Earth. 

Two types of information:  There is an underlying dichotomy within biological systems: there is the hereditary information encoded in the sequence of nucleotides along double-stranded DNA molecules (genes and chromosomes).  There is also the information inherent in the living system.  The information in DNA is meaningful only in the context of the living cell, a reaction system that has been running without interruption since the origin of life.  While these two systems are inextricably interconnected, there is a basic difference between them. Cellular systems are fragile, once dead there is no coming back.  In contrast the information in DNA can survive death – it can move from cell to cell in the process of horizontal gene transfer.  The Venter group has replaced the DNA of bacterial cells with synthetic genomes in an effort to define the minimal number of genes needed to support life, at least in a laboratory setting [see 3, 4].  In eukaryotes, cloning is carried out by replacing a cell’s DNA, with that of another cell (reference).  

Conflating protein synthesis and folding with assembly and function: Much of the information stored in a cell’s DNA is used to encode the sequence of various amino acid polymers (polypeptides).  While over-simplified [see 5], students are generally presented with the view that each gene encodes a particular protein through DNA-directed RNA synthesis (transcription) and RNA-directed polypeptide synthesis (translation).  As the newly synthesized polypeptide emerges from the ribosomal tunnel, it begins to fold, and is released into the cytoplasm or inserted into or through a cellular membrane, where it often interacts with one or more other polypeptides to form a protein  [see 6].  The assembled protein is either functional or becomes functional after association with various non-polypeptide co-factors or post-translational modifications.  It is the functional aspect of proteins that is critical, but too often their assembly dynamics are overlooked in the presentation of gene expression/protein synthesis, which is really a combination of distinct processes. 

Students are generally introduced to protein synthesis through the terms primary, secondary, tertiary, and quaternary structure, an approach that can be confusing since many (most) polypeptides are not proteins and many proteins are parts of complex molecular machines [here is the original biofundamentals web page on proteins + a short video][see Teaching without a Textbook]. Consider the nuclear pore complex, a molecular machine that mediates the movement of molecules into and out of the nucleus.  A nuclear pore is “composed of ∼500, mainly evolutionarily conserved, individual protein molecules that are collectively known as nucleoporins (Nups)” [7]. But what is the function of a particular NUP, particularly if it does not exist in significant numbers outside of a nuclear pore?  Is a nuclear pore one protein?  In contrast, the membrane bound, mitochondrial ATP synthase found in aerobic bacteria and eukaryotic mitochondria, is described as composed “of two functional domains, F1 and Fo. F1 comprises 5 different subunits (three α, three β, and one γ, δ and ε)” while “Fo contains subunits c, a, b, d, F6, OSCP and the accessory subunits e, f, g and A6L” [8].  Are these proteins or subunits? is the ATP synthase a protein or a protein complex?  

Such confusions arise, at least in part, from the primary-quaternary view of protein structure, since the same terms are applied, generally without clarifying distinction, to both polypeptides and proteins. These terms emerged historically. The purification of a protein was based on its activity, which can only be measured for an intact protein. The primary structure of  a polypeptide was based on the recognition that DNA-encoded amino acid polymers are unbranched, with a defined sequence of amino acid residues (see Sanger. The chemistry of insulin).  The idea of a polypeptide’s secondary structure was based on the “important constraint that all six atoms of the amide (or peptide) group, which joins each amino acid residue to the next in the protein chain, lie in a single plane” [9], which led Pauling, Corey and Branson [10] to recognized the α-helix and β-sheet, as common structural motifs.  When a protein is composed of a single polypeptide, the final folding pattern of the polypeptide, is referred to as its tertiary structure and is apparent in the first protein structure solved, that of myoglobin (↓), by Max Perutz and John Kendell. 

Myoglobin’s role in O2 transport depends upon a non-polypeptide (prosthetic) heme group. So far so good, a gene encodes a polypeptide and as it folds a polypeptide becomes a protein – nice and simple (2).  Complications arise from the observations that 1) many proteins are composed of multiple polypeptides, encoded for by one or more genes, and 2) some polypeptides are a part of different proteins.  Hemoglobin, the second protein whose structure was

determined, illustrates the point (←).  Hemoglobin is composed of four polypeptides encoded by distinct genes encoding α- and β-globin polypeptides.  These polypeptides are related in structure, function, and evolutionary origins to myoglobin, as well  as the cytoglobin and neuroglobin proteins (↓).  In

humans, there are a number of distinct α-like globin and β-like globin genes that are expressed in different hematopoetic tissues during development, so functional hemoglobin proteins can have a number of distinct (albeit similar) subunit compositions and distinct properties, such as their affinities for O2 [see 11].  

But the situation often gets more complicated.  Consider centrin-2, a eukaryotic Ca2+ binding polypeptide that plays roles in organizing microtubules, building cilia, DNA repair, and gene expression [see 12 and references therein].  So, is the centrin-2 polypeptide just a polypeptide, a protein, or a part of a number of other proteins?  As another example, consider the basic-helix-loop-helix family of transcription factors; these transcription factor proteins are typically homo- or hetero-dimeric; are these polypeptides proteins in their own right?  The activity of these transcription factors is regulated in part by which binding partners they contain. bHLH polypeptides also interact with the Id polypeptide (or is it a protein); Id lacks a DNA binding domain so when it forms a dimer with a bHLH polypeptide it inhibits DNA binding (↓).  So is a single bHLH polypeptide a protein or is the protein necessarily a dimer?  More to the point, does the current primary→quaternary view of protein structure help or hinder student understanding of the realities of biological systems?  A potentially interesting bio-education research question.

A recommendation or two:  While under no illusion that the complexities of polypeptide synthesis and protein assembly can be easily resolved – it is surely possible to present them in a more coherent, consistent, and accessible manner.  Here are a few suggestions that might provoke discussion.  Let us first recognize that, for those genes that encode polypeptides: i) they encode polypeptides rather than functional proteins (a reality confused by the term “quaternary structure”).  We might well distinguish a polypeptide from a protein based on the concentration of free monomeric polypeptide (gene product) within the cell.  Then we need to convey the reality to students that the assembly of a protein is no simple process, particularly within the crowded cytoplasm [13], a misconception supported by the simple secondary-tertiary structure perspective. While some proteins assemble on their own, many (most?) cannot.


As an example, consider the protein tubulin (↑). As noted by Nithianantham et al [14], “ Five conserved tubulin cofactors/chaperones and the Arl2 GTPase regulate α- and β-tubulin assembly into heterodimers” and the “tubulin cofactors TBCD, TBCE, and Arl2, which together assemble a GTP-hydrolyzing tubulin chaperone critical for the biogenesis, maintenance, and degradation of soluble αβ-tubulin.”  Without these various chaperones the tubulin protein cannot be formed.  Here the distinction between protein and multiprotein complex is clear, since tubulin protein exists in readily detectable levels within the cell, in contrast to the α- and β-tubulin polypeptides, which are found complexed to the TBCB and TBCA chaperone polypeptides. Of course the balance between tubulin and tubulin polymers (microtubules) is itself regulated by a number of factors. 

 The situation is even more complex when we come to the ribosome and other structures, such as the nuclear pore.  Woolford [15] estimates that “more than 350 protein and RNA molecules participate in yeast ribosome assembly, and many more in metazoa”; in addition to four ribsomal RNAs and ~80 polypeptides (often referred to as ribosomal proteins) that are synthesized in the cytoplasm and transported into the nucleus in association with various transport factors, these “assembly factors, including diverse RNA-binding proteins, endo- and exonucleases, RNA helicases, GTPases and ATPases. These assembly factors promote pre-rRNA folding and processing, remodeling of protein–protein and RNA–protein networks, nuclear export and quality control” [16].  While I suspect that some structural components of the ribosome and the nuclear pore may have functions as monomeric polypeptides, and so could be considered as proteins, at this point it is best (most accurate) to assume that they are polypeptides, components of proteins and larger, molecular machines (past post). 

We can, of course, continue to consider the roles of common folding motifs,  arising from the chemistry of the peptide bond and the environment within and around the assembling protein, in the context of protein structure [17, 18], The knottier problem is how to help students recognize how functional entities, proteins and molecular machines, together with the coupled reaction systems that drive them and the molecular interactions that regulate them, function. How mutations, alleleic variations, and various environmentally induced perturbations influence the behaviors of cells and organisms, and how they generate normal and pathogenic phenotypes. Such a view emphasizes the dynamics of the living state, and the complex flow of information out of DNA into networks of molecular machines and reaction systems. 


Acknowledgements
: Thanks to Michael Stowell for feedback and suggestions and Jon Van Blerkom for encouragement.  All remaining errors are mine. Post updated to include imagines in the right places (and to include the cryoEM structure of the AcChR + minor edits – 16 December 2020.

Footnotes:

  1. Recently emerged from the labs of Martin Raff and Lee Rubin – Martin is one of the founding authors of the transformative “molecular biology of the cell” textbook. 
  2. Or rather quite over-simplistic, as it ignore complexities arising from differential splicing, alternative promoters, and genes encoding non-polypeptide encoding RNAs. 

Literature cited (please excuse excessive self-citation – trying to avoid self-plagarism)

1. Klymkowsky, M.W., Thinking about the conceptual foundations of the biological sciences. CBE Life Science Education, 2010. 9: p. 405-7.

2. Klymkowsky, M.W., J.D. Rentsch, E. Begovic, and M.M. Cooper, The design and transformation of Biofundamentals: a non-survey introductory evolutionary and molecular biology course. LSE Cell Biol Edu, in press., 2016. pii: ar70.

3. Gibson, D.G., J.I. Glass, C. Lartigue, V.N. Noskov, R.-Y. Chuang, M.A. Algire, G.A. Benders, M.G. Montague, L. Ma, and M.M. Moodie, Creation of a bacterial cell controlled by a chemically synthesized genome. science, 2010. 329(5987): p. 52-56.

4. Hutchison, C.A., R.-Y. Chuang, V.N. Noskov, N. Assad-Garcia, T.J. Deerinck, M.H. Ellisman, J. Gill, K. Kannan, B.J. Karas, and L. Ma, Design and synthesis of a minimal bacterial genome. Science, 2016. 351(6280): p. aad6253.

5. Samandi, S., A.V. Roy, V. Delcourt, J.-F. Lucier, J. Gagnon, M.C. Beaudoin, B. Vanderperre, M.-A. Breton, J. Motard, and J.-F. Jacques, Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins. Elife, 2017. 6.

6. Hartl, F.U., A. Bracher, and M. Hayer-Hartl, Molecular chaperones in protein folding and proteostasis. Nature, 2011. 475(7356): p. 324.

7. Kabachinski, G. and T.U. Schwartz, The nuclear pore complex–structure and function at a glance. J Cell Sci, 2015. 128(3): p. 423-429.

8. Jonckheere, A.I., J.A. Smeitink, and R.J. Rodenburg, Mitochondrial ATP synthase: architecture, function and pathology. Journal of inherited metabolic disease, 2012. 35(2): p. 211-225.

9. Eisenberg, D., The discovery of the α-helix and β-sheet, the principal structural features of proteins. Proceedings of the National Academy of Sciences, 2003. 100(20): p. 11207-11210.

10. Pauling, L., R.B. Corey, and H.R. Branson, The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proceedings of the National Academy of Sciences, 1951. 37(4): p. 205-211.

11. Hardison, R.C., Evolution of hemoglobin and its genes. Cold Spring Harbor perspectives in medicine, 2012. 2(12): p. a011627.

12. Shi, J., Y. Zhou, T. Vonderfecht, M. Winey, and M.W. Klymkowsky, Centrin-2 (Cetn2) mediated regulation of FGF/FGFR gene expression in Xenopus. Scientific Reports, 2015. 5:10283.

13. Luby-Phelps, K., The physical chemistry of cytoplasm and its influence on cell function: an update. Molecular biology of the cell, 2013. 24(17): p. 2593-2596.

14. Nithianantham, S., S. Le, E. Seto, W. Jia, J. Leary, K.D. Corbett, J.K. Moore, and J. Al-Bassam, Tubulin cofactors and Arl2 are cage-like chaperones that regulate the soluble αβ-tubulin pool for microtubule dynamics. Elife, 2015. 4.

15. Woolford, J., Assembly of ribosomes in eukaryotes. RNA, 2015. 21(4): p. 766-768.

16. Peña, C., E. Hurt, and V.G. Panse, Eukaryotic ribosome assembly, transport and quality control. Nature Structural and Molecular Biology, 2017. 24(9): p. 689.

17. Dobson, C.M., Protein folding and misfolding. Nature, 2003. 426(6968): p. 884.

18. Schaeffer, R.D. and V. Daggett, Protein folds and protein folding. Protein Engineering, Design & Selection, 2010. 24(1-2): p. 11-19.

Molecular machines and the place of physics in the biology curriculum

The other day, through no fault of my own, I found myself looking at the courses required by our molecular biology undergraduate degree program. I discovered a requirement for a 5 credit hour physics course, and a recommendation that this course be taken in the students’ senior year – a point in their studies when most have already completed their required biology courses.  Befuddlement struck me, what was the point of requiring an introductory physics course in the context of a molecular biology major?  Was this an example of time-travel (via wormholes or some other esoteric imagining) in which a physics course in the future impacts a students’ understanding of molecular biology in the past?  I was also struck by the possibility that requiring such a course in the students’ senior year would measurably impact their time to degree. 

In a search for clarity and possible enlightenment, I reflected back on my own experiences in an undergraduate biophysics degree program – as a practicing cell and molecular  biologist, I was somewhat confused. I could not put my finger on the purpose of our physics requirement, except perhaps the admirable goal of supporting physics graduate students. But then, after feverish reflections on the responsibilities of faculty in the design of the courses and curricula they prescribe for their students and the more general concepts of instructional (best) practice and malpractice, my mind calmed, perhaps because I was distracted by an article on Oxford Nanopore’s MinION (↓), a “portable real-time device for DNA and RNA sequencing”,a device that plugs into the USB port on one’s laptop!

Distracted from the potentially quixotic problem of how to achieve effective educational reform at the undergraduate level, I found myself driven on by an insatiable curiosity (or a deep-seated insecurity) to insure that I actually understood how this latest generation of DNA sequencers worked. This led me to a paper by Meni Wanunu (2012. Nanopores: A journey towards DNA sequencing)[1].  On reading the paper, I found myself returning to my original belief, yes, understanding physics is critical to developing a molecular-level understanding of how biological systems work, BUT it was just not the physics normally inflicted upon (required of) students [2]. Certainly this was no new idea.  Bruce Alberts had written on this topic a number of times, most dramatically in his 1989 paper “The cell as a collection of molecular machines” [3].  Rather sadly, and not withstanding much handwringing about the importance of expanding student interest in, and understanding of, STEM disciplines, not much of substance in this area has occurred. While (some minority of) physics courses may have adopted active engagement pedagogies (in the meaning of Hake [4]) most insist on teaching macroscopic physics, rather than to focus on, or even to consider, the molecular level physics relevant to biological systems, explicitly the physics of protein machines in a cellular (biological) context. Why sadly, because conventional, that is non-biologically relevant introductory physics and chemistry courses, all to often serve the role of a hazing ritual, driving many students out of biology-based careers [5], in part I suspect, because they often seem irrelevant to students’ interests in the workings of biological systems. (footnote 1)  

Nanopore’s sequencer and Wanunu’s article (footnote 2) got me thinking again about biological machines, of which there are a great number, ranging from pumps, propellers, and oars to  various types of transporters, molecular truckers that move chromosomes, membrane vesicles, and parts of cells with respect to one another, to DNA detanglers, protein unfolders, and molecular recyclers (↓). 

Nanopore’s sequencer works based on the fact that as a single strand of DNA (or RNA) moves through a narrow pore, the different bases (A,C,T,G) occlude the pore to different extents, allowing different numbers of ions, different amounts of current, to pass through the pore. These current differences can be detected, and allows for a nucleotide sequence to be “read” as the nucleic acid strand moves through the pore. Understanding the process involves understanding how molecules move, that is the physics of molecular collisions and energy transfer, how proteins and membranes allow and restrict ion movement, and the impact of chemical gradients and electrical fields across a membrane on molecular movements  – all physical concepts of widespread significance in biological systems (here is an example of where a better understanding of physics could be useful to biologists).  Such ideas can be extended to the more general questions of how molecules move within the cell, and the effects of molecular size and inter-molecular interactions within a concentrated solution of proteins, protein polymers, lipid membranes, and nucleic acids, such as described in Oliverira et al., (2016 Increased cytoplasmic viscosity hampers aggregate polar segregation in Escherichia coli)[6].  At the molecular level, the processes, while biased by electric fields (potentials) and concentration gradients, are stochastic (noisy). Understanding of stochastic processes is difficult for students [7], but critical to developing an appreciation of how such processes can lead to phenotypic  differences between cells with the same genotypes (previous post) and how such noisy processes are managed by the cell and within a multicellular organism.   

As path leads on to path, I found myself considering the (←) spear-chucking protein machine present in the pathogenic bacteria Vibrio cholerae; this molecular machine is used to inject toxins into neighbors that the bacterium happens to bump into (see Joshi et al., 2017. Rules of Engagement: The Type VI Secretion System in Vibrio cholerae)[8].  The system is complex and acts much like a spring-loaded and rather “inhumane” mouse trap.  This is one of a number of bacterial  type VI systems, and “has structural and functional homology to the T4 bacteriophage tail spike and tube” – the molecular machine that injects bacterial cells with the virus’s genetic material, its DNA.

Building the bacterium’s spear-based injection system is control by a social (quorum sensing) system, a way that unicellular organisms can monitor whether they are alone or living in an environment crowded with other organisms. During the process of assembly, potential energy, derived from various chemically coupled, thermodynamically favorable reactions, is stored in both type VI “spears” and the contractile (nucleic acid injecting) tails of the bacterial viruses (phage). Understanding the energetics of this process, exactly how coupling thermodynamically favorable chemical reactions, such as ATP hydrolysis, or physico-chemical reactions, such as the diffusion of ions down an electrochemical gradient, can be used to set these “mouse traps”, and where the energy goes when the traps are sprung is central to students’ understanding of these and a wide range of other molecular machines. 

Energy stored in such molecular machines during their assembly can be used to move the cell. As an example, another bacterial system generates contractile (type IV pili) filaments; the contraction of such a filament can allow “the bacterium to move 10,000 times its own body weight, which results in rapid movement” (see Berry & Belicic 2015. Exceptionally widespread nanomachines composed of type IV pilins: the prokaryotic Swiss Army knives)[9].  The contraction of such a filament has been found to be used to import DNA into the cell, an early step in the process of  horizontal gene transfer.  In other situations (other molecular machines) such protein filaments access thermodynamically favorable processes to rotate, acting like a propeller, driving cellular movement. 

During my biased random walk through the literature, I came across another, but molecularly distinct, machine used to import DNA into Vibrio (see Matthey & Blokesch 2016. The DNA-Uptake Process of Naturally Competent Vibrio cholerae)[10].

This molecular machine enables the bacterium to import DNA from the environment, released, perhaps, from a neighbor killed by its spear.  In this system (←), the double stranded DNA molecule is first transported through the bacterium’s outer membrane; the DNA’s two strands are then separated, and one strand passes through a channel protein through the inner (plasma) membrane, and into the cytoplasm, where it can interact with the bacterium’s  genomic DNA.

The value of introducing students to the idea of molecular machines is that it helps to demystify how biological systems work, how such machines carry out specific functions, whether moving the cell or recognizing and repairing damaged DNA.  If physics matters in biological curriculum, it matters for this reason – it establishes a core premise of biology, namely that organisms are not driven by “vital” forces, but by prosaic physiochemical ones.  At the same time, the molecular mechanisms behind evolution, such as mutation, gene duplication,  and genomic reorganization provide the means by which new structures emerge from pre-existing ones, yet many is the molecular biology degree program that does not include an introduction to evolutionary mechanisms in its required course sequence – imagine that, requiring physics but not evolution? (see [11]).

One final point regarding requiring students to take a biologically relevant physics course early in their degree program is that it can be used to reinforce what I think is a critical and often misunderstood point. While biological systems rely on molecular machines, we (and by we I mean all organisms) are NOT machines, no matter what physicists might postulate -see We Are All Machines That Think.  We are something different and distinct. Our behaviors and our feelings, whether ultimately understandable or not, emerge from the interaction of genetically encoded, stochastically driven non-equilibrium systems, modified through evolutionary, environmental, social, and a range of unpredictable events occurring in an uninterrupted, and basically undirected fashion for ~3.5 billion years.  While we are constrained, we are more, in some weird and probably ultimately incomprehensible way.

Footnotes:

[1]  A discussion with Melanie Cooper on what chemistry is relevant to a life science major was a critical driver in our collaboration to develop the chemistry, life, the universe, and everything (CLUE) chemistry curriculum.  

[2]  Together with my own efforts in designing the biofundamentals introductory biology curriculum. 

literature cited

1. Wanunu, M., Nanopores: A journey towards DNA sequencing. Physics of life reviews, 2012. 9(2): p. 125-158.

2. Klymkowsky, M.W. Physics for (molecular) biology students. 2014  [cited 2014; Available from: http://www.aps.org/units/fed/newsletters/fall2014/molecular.cfm.

3. Alberts, B., The cell as a collection of protein machines: preparing the next generation of molecular biologists. Cell, 1998. 92(3): p. 291-294.

4. Hake, R.R., Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. Am. J. Physics, 1998. 66: p. 64-74.

5. Mervis, J., Weed-out courses hamper diversity. Science, 2011. 334(6061): p. 1333-1333.

6. Oliveira, S., R. Neeli‐Venkata, N.S. Goncalves, J.A. Santinha, L. Martins, H. Tran, J. Mäkelä, A. Gupta, M. Barandas, and A. Häkkinen, Increased cytoplasm viscosity hampers aggregate polar segregation in Escherichia coli. Molecular microbiology, 2016. 99(4): p. 686-699.

7. Garvin-Doxas, K. and M.W. Klymkowsky, Understanding Randomness and its impact on Student Learning: Lessons from the Biology Concept Inventory (BCI). Life Science Education, 2008. 7: p. 227-233.

8. Joshi, A., B. Kostiuk, A. Rogers, J. Teschler, S. Pukatzki, and F.H. Yildiz, Rules of engagement: the type VI secretion system in Vibrio cholerae. Trends in microbiology, 2017. 25(4): p. 267-279.

9. Berry, J.-L. and V. Pelicic, Exceptionally widespread nanomachines composed of type IV pilins: the prokaryotic Swiss Army knives. FEMS microbiology reviews, 2014. 39(1): p. 134-154.

10. Matthey, N. and M. Blokesch, The DNA-uptake process of naturally competent Vibrio cholerae. Trends in microbiology, 2016. 24(2): p. 98-110.

11. Pallen, M.J. and N.J. Matzke, From The Origin of Species to the origin of bacterial flagella. Nat Rev Microbiol, 2006. 4(10): p. 784-90.

Is a little science a dangerous thing?

Is the popularization of science encouraging a growing disrespect for scientific expertise? 
Do we need to reform science education so that students are better able to detect scientific BS? 

It is common wisdom that popularizing science by exposing the public to scientific ideas is an unalloyed good,  bringing benefits to both those exposed and to society at large. Many such efforts are engaging and entertaining, often taking the form of compelling images with quick cuts between excited sound bites from a range of “experts.” A number of science-centered programs, such PBS’s NOVA series, are particularly adept and/or addicted to this style. Such presentations introduce viewers to natural wonders, and often provide scientific-sounding, albeit often superficial and incomplete, explanations – they appeal to the gee-whiz and inspirational, with “mind-blowing” descriptions of how old, large, and weird the natural world appears to be. But there are darker sides to such efforts. Here I focus on one, the idea that a rigorous, realistic understanding of the scientific enterprise and its conclusions, is easy to achieve, a presumption that leads to unrealistic science education standards, and the inability to judge when scientific pronouncements are distorted or unsupported, as well as anti-scientific personal and public policy positions.That accurate thinking about scientific topics is easy to achieve is an unspoken assumption that informs much of our educational, entertainment, and scientific research system. This idea is captured in the recent NYT best seller “Astrophysics for people in a hurry” – an oxymoronic presumption. Is it possible for people “in a hurry” to seriously consider the observations and logic behind the conclusions of modern astrophysics? Can they understand the strengths and weaknesses of those conclusions? Is a superficial familiarity with the words used the same as understanding their meaning and possible significance? Is acceptance understanding?  Does such a cavalier attitude to science encourage unrealistic conclusions about how science works and what is known with certainty versus what remains speculation?  Are the conclusions of modern science actually easy to grasp?
The idea that introducing children to science will lead to an accurate grasp the underlying concepts involved, their appropriate application, and their limitations is not well supported [1]; often students leave formal education with a fragile and inaccurate understanding – a lesson made explicit in Matt Schneps and Phil Sadler’s Private Universe videos. The feeling that one understands a topic, that science is in some sense easy, undermines respect for those who actually do understand a topic, a situation discussed in detail in Tom Nichols “The Death of Expertise.” Under-estimating how hard it can be to accurately understand a scientific topic can lead to unrealistic science standards in schools, and often the trivialization of science education into recognizing words rather than understanding the concepts they are meant to convey.

The fact is, scientific thinking about most topics is difficult to achieve and maintain – that is what editors, reviewers, and other scientists, who attempt to test and extend the observations of others, are for – together they keep science real and honest. Until an observation has been repeated or confirmed by others, it can best be regarded as an interesting possibility, rather than a scientifically established fact.  Moreover, until a plausible mechanism explaining the observation has been established, it remains a serious possibility that the entire phenomena will vanish, more or less quietly (think cold fusion). The disappearing physiological effects of “power posing” comes to mind. Nevertheless the incentives to support even disproven results can be formidable, particularly when there is money to be made and egos on the line.

While power-posing might be helpful to some, even though physiologically useless, there are more dangerous pseudo-scientific scams out there. The gullible may buy into “raw water” (see: Raw water: promises health, delivers diarrhea) but the persistent, and in some groups growing, anti-vaccination movement continues to cause real damage to children (see Thousands of cheerleaders exposed to mumps).  One can ask oneself, why haven’t professional science groups, such as the American Association for the Advancement of Science (AAAS), not called for a boycott of NETFLIX, given that NETFLIX continues to distribute the anti-scientific, anti-vaccination program VAXXED [2]?  And how do Oprah Winfrey and Donald Trump  [link: Oprah Spreads Pseudoscience and Trump and the anti-vaccine movement] avoid universal ridicule for giving credence to ignorant non-sense, and for disparaging the hard fought expertise of the biomedical community?  A failure to accept well established expertise goes along way to understanding the situation. Instead of an appreciation for what we do and do not know about the causes of autism (see: Genetics and Autism Risk & Autism and infection), there are desperate parents who turn to a range of “therapies” promoted by anti-experts. The tragic case of parents trying to cure autism by forcing children to drink bleach (see link) illustrates the seriousness of the situation.

So why do a large percentage of the public ignore the conclusions of disciplinary experts?  I would argue that an important driver is the way that science is taught and popularized [3]. Beyond the obvious fact that a range of politicians and capitalists (in both the West and the East) actively distain expertise that does not support their ideological or pecuniary positions [4], I would claim that the way we teach science, often focussing on facts rather than processes, largely ignoring the historical progression by which knowledge is established, and the various forms of critical analyses to which scientific conclusions are subjected to, combines with the way science is popularized, erodes respect for disciplinary expertise. Often our education systems fail to convey how difficult it is to attain real disciplinary expertise, in particular the ability to clearly articulate where ideas and conclusions come from and what they do and do not imply. Such expertise is more than a degree, it is a record of rigorous and productive study and useful contributions, and a critical and objective state of mind. Science standards are often heavy on facts, and weak on critical analyses of those ideas and observations that are relevant to a particular process. As Carl Sagan might say, we have failed to train students on how to critically evaluate claims, how to detect baloney (or BS in less polite terms)[5].

In the area of popularizing scientific ideas, we have allowed hype and over-simplification to capture the flag. To quote from a article by David Berlinski [link: Godzooks], we are continuously bombarded with a range of pronouncements about new scientific observations or conclusions and there is often a “willingness to believe what some scientists say without wondering whether what they say is true”, or even what it actually means.  No longer is the in-depth, and often difficult and tentative explanation conveyed, rather the focus is on the flashy conclusion (independent of its plausibility). Self proclaimed experts pontificate on topics that are often well beyond their areas of training and demonstrated proficiency – many is the physicist who speaks not only about the completely speculative multiverse, but on free will and ethical beliefs. Complex and often irreconcilable conflicts between organisms, such as those between mother and fetus (see: War in the womb), male and female (in sexually dimorphic species), and individual liberties and social order, are ignored instead of explicitly recognized, and their origins understood. At the same time, there are real pressures acting on scientific researchers (and the institutions they work for) and the purveyors of news to exaggerate the significance and broader implications of their “stories” so as to acquire grants, academic and personal prestige, and clicks.  Such distortions serve to erode respect for scientific expertise (and objectivity).

So where are the scientific referees, the individuals that are tasked to enforce the rules of the game; to call a player out of bounds when they leave the playing field (their area of expertise) or to call a foul when rules are broken or bent, such as the fabrication, misreporting, suppression, or over-interpretation of data, as in the case of the anti-vaccinator Wakefield. Who is responsible for maintaining the integrity of the game?  Pointing out the fact that many alternative medicine advocates are talking meaningless blather (see: On skepticism & pseudo-profundity)? Where are the referees who can show these charlatans the “red card” and eject them from the game?

Clearly there are no such referees. Instead it is necessary to train as large a percentage of the population as possible to be their own science referees – that is, to understand how science works, and to identify baloney when it is slung at them. When a science popularizer, whether for well meaning or self-serving reasons, steps beyond their expertise, we need to call them out of
bounds!  And when scientists run up against the constraints of the scientific process, as appears to occur periodically with theoretical physicists, and the occasional neuroscientist (see: Feuding physicists and The Soul of Science) we need to recognize the foul committed.  If our educational system could help develop in students a better understanding of the rules of the scientific game, and why these rules are essential to scientific progress, perhaps we can help re-establish both an appreciation of rigorous scientific expertise, as well as a respect for what is that scientists struggle to do.



Footnotes and references:

  1. And is it clearly understood that they have nothing to say as to what is right or wrong.
  2.  Similarly, many PBS stations broadcast pseudoscientific infomercials: for example see Shame on PBS, Brain Scam, and the Deepak Chopra’s anti-scientific Brain, Mind, Body, Connection, currently playing on my local PBS station. Holocaust deniers and slavery apologists are confronted much more aggressively.
  3.  As an example, the idea that new neurons are “born” in the adult hippocampus, up to now established orthodoxy, has recently been called into question: see Study Finds No Neurogenesis in Adult Humans’ Hippocampi
  4.  Here is a particular disturbing example: By rewriting history, Hindu nationalists lay claim to India
  5. Pennycook, G., J. A. Cheyne, N. Barr, D. J. Koehler and J. A. Fugelsang (2015). “On the reception and detection of pseudo-profound bullshit.” Judgment and Decision Making 10(6): 549.