And other Forms of 'Junk' DNA
The Fractal Complexity of Life
Sean D. Pitman M.D.
© January 2001
Latest Update: May 2013 (Functionality of the eta-globin pseudogene)
Table of Contents
Pseudogenes are DNA sequences that resemble functional genes but are generally thought to have no purpose. In fact many scientists think that pseudogenes are nothing more than discarded genetic fossils of a bygone era when they did have some sort of important function. Of course, it logically follows that similar pseudogenes that are shared by different species give evidence of common ancestry and even potential times of divergence.11 For example, the eta-globin pseudogene, which is found in both humans and chimps, has been used as an argument for the common ancestry of the two species.
The first pseudogene was reported in 1977.1 Since that time, a large number of these genes have been reported and described in humans and many other species.
There are two types of pseudogenes known as "processed" and "unprocessed" pseudogenes.2,11
Processed genes are found on different chromosomes from their functional counterparts. They lack introns and certain regulator genes, often terminate in adenine series, and are flanked by direct repeats (which are associated with movable genetic elements). They may be complete or incomplete copies of genes or mixtures of several genes. They are believed to have occurred through a 3-step process: Copying DNA into RNA, editing the introns to make mRNA, and then turning the code in the mRNA back into DNA through a reverse transcription process. This process is thought to have created the "L1 family of pseudogenes."2 Other theories include retroviruses as means of pseudogene transport between different organisms.
Unprocessed pseudogenes are usually found in clusters of similar functional sequences on the same chromosome. They usually have introns and associated regulatory sequences. Their expression is usually prevented by a "misplaced" stop codon or codons. There may be other changes from the "original" as the result of deletions, insertions, and point mutations. Some form of mRNA may or may not be produced depending on the damage to the gene. Many of these are believed to have arisen by gene duplication, which produced an extra copy of the gene. The extra copy could then accumulate mutations without harming the organism since it would still have a completely functional original copy.2 (The evolutionary gene duplication hypothesis suggests that over time, random mutations may produce a new gene with new functions by using this gene duplicate while maintaining the original gene funtion5).
It is felt by many, especially evolutionary biologists, that shared pseudogenes, which have no function in any form in different species, are examples of common ancestry. Comparison of DNA sequences from humans, chimps, and other mammals shows a great number of shared pseudogenes. Perhaps the best-known example of a shared pseudogene is the eta-globin gene.
The eta gene is located on chromosome 11 in humans and is fourth in a series of 6 beta globin genes (five are functional).4 It has no start codon (AUG) and it has several stop codons. So obviously, no mRNA is made and therefore no protein. Humans, chimps, and gorillas have the same number of beta-globin genes arranged in the same sequence. The exon sequences within these genes are also similar - as are the exons of the eta gene.4 It is thought that the eta-globin gene originated by a duplication of the gamma-A-globin gene because of the high similarity of the sequences. Also, both genes are present in primates.
The history of the eta-globin pseudogene is thought to have originated some 140 million years ago in marsupials and placental mammals. After the "evolutionary divergence" of marsupials, the gamma-globin gene formed by duplication of an existing gene in the beta-globin family. Later, but before radiation of the orders of placental mammals, the eta-globin gene formed from a duplication of the gamma-globin gene. Gamma and eta genes must therefore have been present in ancestral placentals, but presumably gamma was lost by goats (which do not have gamma) and eta was lost by rabbits (which do not have eta).
According to this scenario, the eta gene must have been functional at first, because it is functional in goats today. 2 It is non-functional in all primates, which is interpreted to mean it was already non-functional in ancestral primates some 70-80 million years ago. This interpretation implies that the eta-globin gene has been maintained for more than 70 million years without being converted to a useful new gene and without being eliminated through random mutations.
Signs of Function?
So, the persistence of a non-functional DNA sequence in an entire lineage for such a supposed long period of time seems remarkable in the context of the gene duplication hypothesis. The very fact that pseudogenes are still present and recognizable after tens of millions of years without any beneficial function just doesn't seem to make sense. Certainly, without some beneficial function, natural selection would not have maintained their sequences for such long periods of time. There is in fact a cost to maintaining non-functional DNA. It takes energy to replicate and maintain DNA that doesn't pay for its keep. Although this cost might seem small over the short term. Even an extremely small cost compounded over the course of millions of generations starts to turn into a significant disadvantage. So, the fact that pseudogenes have any recognizable gene-like structure at all suggests that they do in fact serve some kind of purpose.
The persistence of pseudogenes is in itself evidence for their activity. This is a serious problem for evolution, as it is expected that natural selection would remove this type of DNA if it were useless, since DNA manufactured by the cell is energetically costly. Because of the lack of selective pressure on this neutral DNA, one would expect that ‘old’ pseudogenes would be scrambled beyond recognition as a result of accumulated random mutations. Moreover, a removal mechanism for neutral DNA is now known.6
“Typically when people say that the human genome contains 27,000 genes or so, they are referring to genes that code for proteins,” points out Michel Georges, a geneticist at the University of Liège in Belgium. But even though that number is still tentative—estimates range from 20,000 to 40,000—it seems to confirm that there is no clear correspondence between the complexity of a species and the number of genes in its genome. “Fruit flies have fewer coding genes than roundworms, and rice plants have more than humans,” notes John S. Mattick, director of the Institute for Molecular Bioscience at the University of Queensland in Brisbane, Australia. “The amount of noncoding DNA, however, does seem to scale with complexity.". . .
"Increasingly we are realizing that there is a large collection of ‘genes’ that are clearly functional even though they do not code for any protein” but produce only RNA, Georges remarks. The term “gene” has always been somewhat loosely defined; these RNA-only genes muddle its meaning further. To avoid confusion, says Claes Wahlestedt of the Karolinska Institute in Sweden, “we tend not to talk about ‘genes’ anymore; we just refer to any segment that is transcribed [to RNA] as a ‘transcriptional unit.’” Based on detailed scans of the mouse genome for all such elements, “we estimate that there will be 70,000 to 100,000,” Wahlestedt announced at the International Congress of Genetics, held this past July in Melbourne. “Easily half of these could be noncoding.” If that is right, then for every DNA sequence that generates a protein, another works solely through active forms of RNA—forms that are not simply intermediate blueprints for proteins but, rather, directly alter the behavior of cells.” . . .
“I think this will come to be a classic story of orthodoxy derailing objective analysis of the facts, in this case for a quarter of a century,” Mattick says. “The failure to recognize the full implications of this particularly the possibility that the intervening noncoding sequences may be transmitting parallel information in the form of RNA molecules—may well go down as one of the biggest mistakes in the history of molecular biology." [emphasis added] 16
Given this, it is not known if all of what are currently thought of as pseudogenes have absolutely no function. In fact, some pseudogenes are believed to function as sources of information for producing genetic diversity. It is thought that partial pseudogenes are copied into functional genes during genetic recombination, producing variants of the functional gene. This phenomenon has been reported many times to include various immunoglobulins within mice and birds, mouse histone genes, horse globin genes, and human beta-globin genes. It is not known if this could be a possible role for the eta-globin gene as well. However, the fact that the eta-globin pseudogene is located between the fetal and adult genes suggests that it might play a role in gene switching (there seems to be some preliminary evidence to this effect although the eta gene sequence’s part in this is still unknown).
It all seems like the protein coding genes are actually rather informationally simplistic (on the level of bricks and mortar for building a house) - that the real informational complexity and functionality lies in the non-coding portion of the genome (the blueprint for directing where to put the bricks and mortar for building the house). This portion of the genome directs when and where the protein building blocks are placed and therefore is vitally important to the overall structure and ultimate function of the resulting creature. It was because of the evolutionary bias that these non-coding regions of DNA were assumed to be junk for so long - and therefore overlooked and unrecognized as key informational components in the genome. Interestingly enough, such findings actually support the predictions of intelligent design theory while countering long-held evolutionary assumptions. Of course, there are always ad hoc modifications to explain such failed predictions resulting from an evolutionary bias.
A Functional Eta-globin Pseudogene After All?
And, as it turns out, recent studies have shown that the eta-globin gene is actually functional after all - just like design theorists have been arguing for a long time.
What is interesting about the original neo-Darwinian prediction that the eta-globin pseudogene was a clear example of a true non-functional "shared mistake" is that the eta-globin pseudogene, in particular, had long been known to be mutating at one-fifth the expected neutral mutation rate (Link). Given that more and more “pseudogenes” and non-coding regions of DNA previously thought to be “junk DNA” are now being found to be functional to one degree or another, such a reduced mutation rate makes it very hard to definitively argue that the eta-globin sequence is a clear example of “shared mistakes” that have been passed on over the past 85 million years. It is also rather hard to imagine why or how a truly non-functional sequences would be maintained in the genome for such a long period of time without having been eliminated by random mutations (especially considering the cost required to maintain truly non-functional DNA).
And, ironically, as of January of 2013, a paper was published
by Moleirinho, A. et al. demonstrating that the eta-globin pseudogene
is in fact functional, playing a regulatory role and assisting in “gene
switching” between fetal and adult forms of hemoglobin. Consider a portion of
their argument as follows:
Several decades ago, a hypothesis was formulated holding an important regulatory role of HBD and HBBP1 in the Hb fetal-to-adult switch that matches quite well the assumption of strong negative selective forces acting on these sequences (Ottolenghi et al. 1979; Bank et al. 1980; Chang and Slightom 1984; Goodman et al. 1984). Over the past years, the β-globin cluster has been regarded as a complex genetic system and a paradigm of gene expression regulation. More recently, a boost of studies on the β-globin cluster have contributed to a better understanding of the mechanisms underlying the regulation of each gene in the cluster (Harju et al. 2002; Chakalova et al. 2005; Noordermeer and de Laat 2008; Sankaran et al. 2010). Remarkably, chromosome conformation (3C and 5C) analyses for the β-globin locus disclosed strong interactions between the LCR and the region encompassing both HBD and HBBP1 (Dostie et al. 2006; Sanyal et al. 2012). Furthermore, distinct spatial interactions of the LCR in fetal and adult stages were uncovered by another study based only in 3C assay in which HBD sequence was proposed to be enrolled in the maintenance of a transcriptionally competent structure at the adult stage (Beauchemin and Trudel 2009). These recent findings suggest that HBD and HBBP1 might be involved in chromatin looping in the human β-globin cluster, a crucial mechanism for temporal coordination of gene expression (Holwerda and De Laat 2012). Importantly, one SNP (rs10128556) in HBBP1 has been also identified as a modulator of HbF levels reinforcing the idea that this genomic region is indeed involved in the Hb fetal-to-adult switch (Galarneau et al. 2010).
Moleirinho, A., et al. 2013. Evolutionary Constraints in the β-Globin Cluster: The Signature of Purifying Selection at the δ-Globin (HBD) Locus and its Role in Developmental Gene Regulation. Genome Biology and Evolution. 5 (3):559–571.
So, I guess we just have to chalk up another one for the creationists don’t we? Yet another “pseudogene” bites the dust and isn’t so “pseudo” any more…
Human-like Sea Anemone?
Supporting this bricks and mortar concept is a 2007 study published in Science by Putnam et. al., on the genome of an interesting sea anemone.44 In this paper the authors note that the individual genes within the overall genome of this sea anemone look very much like human genes - that's right, human genes.
"One of the big surprises of the anemone genome, says Swalla, is the discovery of blocks of DNA that have the same complement of genes as in the human genome. Individual genes may have swapped places, but often they have remained linked together despite hundreds of millions of years of evolution. . . Moreover, the anemone genes look vertebratelike. They often are full of noncoding regions called introns, which are much less common in nematodes and fruit flies than in vertebrates. And more than 80% of the anemone introns are in the same places in humans. . .
Finnerty and his graduate student James Sullivan also looked in the anemone genome for 283 human genes involved in a wide range of diseases. They will report in the July issue of Genome that they found 226. Moreover, in a few cases, such as the breast cancer gene BRCA2, the anemone’s version is more similar to the human’s than to the fruit fly’s or to the nematode’s. . .
This implies that even very ancient genomes were quite complex and contained most of the genes necessary to build today’s most sophisticated multicellular creatures. . .
We cannot rule out the possibility, however, that such apparently animal-specific introns were indeed present in the last common ancestor of plants, fungi, and animals, but were convergently lost in both plants and fungi. . .
Where did the eumetazoan gene repertoire come from? Nearly 80% (6182 out of 7766) of the ancestral eumetazoan genes have clearly identifiable relatives (i.e., proteins with significant sequence homology and conserved domain architecture) outside of the animals, including fungi, plants, slimemolds, ciliates, or other species available from public data sets (32). These are evidently members of ancient eukaryotic gene families that were already established in the unicellular ancestors of the Metazoa and are involved in core eukaryotic cellular functions." 44
As noted above, this paper seems to support the idea that the basic genes within a genome are very simple building blocks that can be and are in fact used to build many different types of creatures - from humans to anemones to plants, fungi, and even single cells organisms like ciliates. In short, it is not the protein-coding genes that are primarily responsible for producing the phenotype of the organism. Rather most of the structural and assembly information for the organism as a whole is found in the form of non-coding DNA. The same building blocks can be used to build a one-room house or a sky scraper. All it takes is different architectural plans to order the same building blocks in very different ways.
The Putnam paper also counters the notion that consistent phylogenetic trees can always be built using sequence analysis of the basic genetic building blocks. After all, the anemone shows significant genetic homology to humans without showing the same homology to the fruit fly or nematode worms. How is this explained without a host of ad hoc props from the Darwinian perspective? The argument of numerous convergent losses of the same types of genes in very different families while retaining these genes in other families over billions of years seems to be just a bit strained - or so it seems to me.
One Man's Junk . . .
Other pseudogenes and so-called transposons, such as the “Alu element” (once thought to be completely useless), are being found to have important functions.
There is a growing body of evidence that Alu (a SINE – Short Interspersed Nuclear Element) sequences are involved in gene regulation, such as in enhancing and silencing gene activity, or can act as a receptor-binding site… This is surely a precedent for the functionality of other types of pseudogenes. 6, 7
In 1997 Flam et al published an article in the journal Science suggesting that "junk-DNA" seemed to be set up very similar to a language system - like a human language system. "The authors of the paper employed linguistic tests to analyze junk DNA and discovered striking similarities to ordinary language. The scientists interpret those similarities as suggestions that there might be messages in the junk sequences, although its anyone's guess as to how the language might work." 31 This is especially interesting because this same sort of argument would be used as evidence of extraterrestrial intelligence (like "ET") if such a language-like pattern were found in any other media - like radiowaves or etchings on Marian rocks.
Around 1998 Carl Schmid, a molecular biologist at the University of California at Davis, started advancing what seemed like a nutty idea to explain Alu’s unusual affinity for genes. Schmid suggested Alu sequences resided near genes because they are not really “junk” sequences, but are rather useful sequences involved with a mechanism that helps cells repair themselves. With the entire genome map in front of them, showing so many instances of Alu sequences around genes, scientists are beginning to take Schmid seriously. “It looks pretty convincing,” Francis Collins said. Others such as M.I.T. geneticist Eric Lander agree.8
More recently in 2001, a team of molecular geneticists discovered two “hot
spots” where the same SINEs inserted independently:
Vertebrate retrotransposons have been used extensively for phylogenetic analyses and studies of molecular evolution. Information can be obtained from specific inserts either by comparing sequence differences that have accumulated over time in orthologous copies of that insert or by determining the presence or absence of that specific element at a particular site. The presence of specific copies has been deemed to be an essentially homoplasy-free phylogenetic character because the probability of multiple independent insertions into any one site has been believed to be nil. . . . We have identified two hot spots for SINE insertion within mys-9 and at each hot spot have found that two independent SINE insertions have occurred at identical sites. These results have major repercussions for phylogenetic analyses based on SINE insertions, indicating the need for caution when one concludes that the existence of a SINE at a specific locus in multiple individuals is indicative of common ancestry. Although independent insertions at the same locus may be rare, SINE insertions are not homoplasy-free phylogenetic markers.9
Even more recently, in the May 2003 issue of Nature, Jeannie Lee published an article entitled, "Complicity of Gene and Pseudogene" in which some interesting findings from work done by Hirotsune et al.13 were presented:
Dysfunctional in the sense that they cannot be used as a template for producing a protein, pseudogenes are in fact nearly as abundant as functional genes. Why have mammals allowed their accumulation on so large a scale? One proposed answer is that, although pseudogenes are often cast as evolutionary relics and a nuisance to genomic analysis, the processes by which they arise are needed to create whole gene families, such as those involved in immunity and smell. But, are pseudogenes themselves merely byproducts of this process? Or do apparent evolutionary pressures to retain them [natural selection] hint at some hidden biological function? For one particular pseudogene, the latter seems to be true . . . Hirotsune and colleagues report the unprecedented finding that the Makorin1-p1 pseudogene [located on chromosome 5 in mice] performs a specific biological task [it regulates the expression of the Makorin1 gene which is located on a completely different chromosome - chromosome 6 in mice].
The work of Hirotsune et al. is provocative for revealing the first biological function of any pseudogene. It challenges the popular belief that pseudogenes are simply molecular fossils -- the evidence of Mother Nature's experiments gone awry." 12,13
In yet another recent Science article by Wojciech Makalowski, the following comments are made that seem to echo what design theorists have been saying for a very long time:
Although catchy, the term "junk DNA" for many years repelled mainstream researchers from studying noncoding DNA. Who, except a small number of genomic clochards, would like to dig through genomic garbage? However, in science as in normal life, there are some clochards who, at the risk of being ridiculed, explore unpopular territories. Because of them, the view of junk DNA, especially repetitive elements, began to change in the early 1990s. Now, more and more biologists regard repetitive elements as genomic treasure." 14
Then, in December of 2003 issue of Annual Review of Genetics, Balakirev and Ayala published a paper entitled, "Pseudogenes: Are They 'Junk' or Functional DNA?" Consider just a few of their conclusions and see if they do not again remind you of what design theorists have been claiming for a long time - - That pseudogenes surely have important functions and therefore are not really "pseudo" after all:
Pseudogenes have been defined as nonfunctional sequences of genomic DNA originally derived from functional genes. It is therefore assumed that all pseudogene mutations are selectively neutral and have equal probability to become fixed in the population. Rather, pseudogenes that have been suitably investigated often exhibit functional roles, such as gene expression, gene regulation, generation of genetic (antibody, antigenic, and other) diversity. Pseudogenes are involved in gene conversion or recombination with functional genes. Pseudogenes exhibit evolutionary conservation of gene sequence, reduced nucleotide variability, excess synonymous over nonsynonymous nucleotide polymorphism, and other features that are expected in genes or DNA sequences that have functional roles. . .
An extensive and fast-increasing literature does not justify a sharp division between genes and pseudogenes that would place pseudogenes in the class of genomic "junk" DNA that lacks function and is not subject to natural selection. Pseudogenes are often extremely conserved and transcriptionally active. . .
There seems to be the case that some functionality has been discovered in all cases, or nearly, whenever this possibility has been pursued with suitable investigations. One may well conclude that most pseudogenes retain or acquire some functionality and, thus, that it may not be appropriate to define pseudogenes as nonfunctional sequences of genomic DNA originally derived from functional genes, or as "genes that are no longer expressed but bear sequence similarity to active genes". Rather, pseudogenes might be defined as DNA sequences derived by duplication or retroposition from functional genes that are often subject to natural selection and therefore retain much of the original sequence and structure because they have acquired new regulatory or other functions, or may serve as reservoirs of genetic variability.15
Identical Human-Mouse Junk DNA?- Lots of It?
Then, in May of 2004 Haussler and Bejerano used computers to compare the human genome with the mouse and the rat genomes. They assumed that because humans, mice, and rats look so different, there would be differences in the genome. They did see the expected differences in the shared genes from the assumed 'common ancestor', but they were surprised to find long stretches of shared non-coding "junk" DNA that were exactly the same in humans and rodents.
"There were about five hundred stretches of DNA in the human genome that hadn't changed at all in the millions and millions of years that separated the human from the mouse and the rat," says Haussler. "I about fell off my chair. It's very unusual to have such an amount of conservation continually over such a long stretch of DNA."32
Many of these stretches of DNA, called "ultraconserved" regions, don't appear to code for protein, so they might have been dismissed as junk if they hadn't shown up in so many different species. Haussler "confirmed that negative selection is three times stronger in these regions than it is for nonsynonymous changes in coding regions." As far as Haussler is concerned, "It is a mystery what molecular mechanisms would place virtually every base in a segment of size up to 1 kilobase [i.e., 1000 bp] under this level of negative selection" ( Link ). That's 500 regions of DNA up to 1000 bp that are identical between rats and humans - up to 500,000 identical genetic sites in DNA?! What is also surprising is that these same regions largely matched up with chicken, dog and fish sequences as well; but are absent from sea squirt and fruit flies. Note that the last supposed common ancestor for all of these creatures was thought to live some 400 million years ago ( Link ). Of course, it is only logical to assume that if nature has gone to so much trouble to preserve these ultraconserved regions over all these years, then they must be more important than just 'junk.' Haussler thinks the most likely scenario is that they control the activity of indispensable genes and embryo development.
"From what we know about the rate at which DNA changes from generation to generation, the chance of finding even one stretch of DNA in the human genome that is unchanged between humans and mice and rats over these hundred million years is less than one divided by ten followed by 22 zeros. It's a tiny, tiny fraction. It's virtually impossible that this would happen by chance." 32
Of course, this is in light of an interesting experiment described by Nóbrega et. al. in a 2004 issue of the journal Nature where the authors demonstrated that large-scale deletions (two large non-coding intervals: 1,511 kilobases and 845 kilobases in length for a total of 1,243,000 bp) of non-coding DNA could be tolerated by mice without any detectible functional effect. "Viable mice homozygous for the deletions were generated and were indistinguishable from wild-type littermates with regard to morphology, reproductive fitness, growth, longevity and a variety of parameters assaying general homeostasis." What is especially interesting here is that these particular non-coding sequences are conserved between humans and rodents. The authors argue that, "Some of the deleted sequences might encode for functions unidentified in our screen; nonetheless, these studies further support the existence of potentially 'disposable DNA' in the genomes of mammals." 34
The problem here is the question of why such disposable DNA would be so conserved over many tens of millions of years? Functional or non-functional, it still poses a problem for standard evolutionary theory. If functional, it isn't non-functional remnants of evolutionary trial and error history. If non-functional its high degree of conservation doesn't seem to fit with the idea that many tens of millions of years have actually passed since the ancestral origin of either humans or rodents (or chickens, dogs, and fish).
Even so, those like Haussler and Nóbrega still believe very strongly that humans and rats do in fact share a common ancestor that lived a hundred million years ago or so. The idea that perhaps humans and rats might have actually been individually created, deliberately, does not even cross their minds.
In any case, such discoveries does at least seem to suggest that the genomes of both humans and rats must be doing something other than coding for proteins, but the purpose of these ultraconserved regions remains a mystery. Haussler thinks that solving this mystery might unlock the secrets of diseases like autism and epilepsy. "There are many cases that are unexplained by any changes in the genes," says Haussler. "This is a new area to look. Doctors who have patients where they have collected DNA samples can look for something common in all of those DNA samples that might explain what is going wrong with their patients— how does the DNA from their patients differ from the DNA of other people who don't have the disease? You look for the consistent difference. These places are a great place to look for some of the diseases that we are still mystified about." 32
Haussler concludes with the following understatement: "I think other bits of 'junk' DNA will turn out not to be junk. I think this is the tip of the iceberg, and that there will be many more similar findings." ( Link ) By 2007 Haussler and Bjerano found 10,402 sequences or tansposons that showed signs of function. "We used to think they were mostly messing things up. Here is a case where they are actually useful," Bejerano said. 33
And the count of functional genetic elements once thought to be 'junk' continues to expand at an almost exponential rate . . .
Consider also what picking one of these identical sequences for phylogenetic analysis would do to phylogenetic tree building. There wouldn't be much of a tree. I mean, with identical sequence shared between humans and rats, chickens, dogs, and fish, there wouldn't be much in these sequences to distinguish humans from apes as being any more "related". It seems to be turning out that phylogenetic relationships might not be any more than functionally maintained genetic similarities rather than clear evolutionary relationships via the process of common descent.
Another interesting argument is that various pseudogenes in different species often have certain shared "mistakes" - that "must have originated in a common ancestor." 11 However, there is some evidence that nucleotide changes may not be completely random in certain gene locations. Mutational "hotspots" have been identified in many genes as well as pseudogenes. In these locations, point mutations, even specific types of point mutations, are much more common than elsewhere in the gene.
Consider the GULOP (or GULO) pseudogene for example. In most mammals this is an active gene encoding the enzyme L-glucono-γ-lactone oxidase (LGGLO). GULO is located on chromosome 8 at p21.1 in a region that is rich in genes (see figure). This is the enzyme that catalyzes the last step in the synthesis of ascorbic acid (vitamin C). As it turns out, this particular gene is defective in humans and other primates as well as several other creatures to include guinea pigs, bats and certain kinds of fish. Compared to the rat GULO gene, the human version, as well as the great ape version, has large or clearly functional deletions involving exons I-III, V-VI, VIII, and XI (see figure above).18-21 Compare this with the significant deletions of the guinea pig GULO sequence that involve exons I, V, and VI - - all of which match the same losses of the primate mutations. In addition to this, all four functionally detrimental stop codons (3TGA and 1TAA sequences) that are identified in the guinea pig are shared at the same sites locations in the primate GULO pseudogene.
Of course, it seems that we humans are able to get along just fine without this gene because we eat a lot of foods that are rich in vitamin C, like citrus fruits. So, what's the big deal? Well, the argument goes something like this (as per a popular Talk.Origins essay by Edward E. Max, Ph.D.):
In most mammals functional GLO genes are present, inherited - according to the evolutionary hypothesis - from a functional GLO gene in a common ancestor of mammals. According to this view, GLO gene copies in the human and guinea pig lineages were inactivated by mutations. Presumably this occurred separately in guinea pig and primate ancestors whose natural diets were so rich in ascorbic acid that the absence of GLO enzyme activity was not a disadvantage--it did not cause selective pressure against the defective gene.
Molecular geneticists who examine DNA sequences from an evolutionary perspective know that large gene deletions are rare, so scientists expected that non-functional mutant GLO gene copies--known as "pseudogenes"--might still be present in primates and guinea pigs as relics of the functional ancestral gene. . . [Beyond this], the theory of evolution would make the strong prediction that primates [like apes and monkeys] would carry similar crippling mutations to the ones found in the human pseudogene. A test of this prediction has recently been reported. A small section of the GLO pseudogene sequence was recently compared from human, chimpanzee, macaque and orangutan; all four pseudogenes were found to share a common crippling single nucleotide deletion that would cause the remainder of the protein to be translated in the wrong triplet reading frame (Ohta and Nishikimi BBA 1472:408, 1999). 11,20
Now, it is interesting that among the many various substitution mutations in the "GLO" pseudogene that many, though not all, would be shared, to include a single deletion mutation that is shared by all primates (when compared to the rat of course). If not for common descent why would the sequences of human, chimpanzee, gorilla and orangutan reveal a single nucleotide deletion at position 97 in the coding region of Exon X? What are the odds that out of 165 base pairs the same one would be mutated in all these primates by random chance? Pretty slim - right? Is this not then overwhelming evidence of common evolutionary ancestry?
This would indeed seem to be the case at first approximation. However, in 2003, the same Japanese group published the complete sequence of the guinea pig GLO pseudogene, which is thought to have evolved independently, and compared it to that of humans [Inai et al, 2003]. 21 Surprisingly, they reported many shared mutations (deletions and substitutions) present in both humans and guinea pigs. Remember now that humans and guinea pigs are thought to have diverged at the time of the common ancestor with rodents. Therefore, a mutational difference between a guinea pig and a rat should not be shared by humans with better than random odds. But, this was not what was observed. Many mutational differences were shared by humans, including the one at position 97. According to Inai et al, this indicated some form of non-random bias that was independent of common descent or evolutionary ancestry. The probability of the same substitutions in both humans and guinea pigs occurring at the observed number of positions was calculated, by Inai et al, to be 1.84x10-12 - consistent with mutational hotspots.
What is interesting here is that the mutational hot spots found in guinea pigs
and humans exactly match the mutations that set humans and primates apart from
the rat (see figure below). 21,22 This particular feature has
given rise to the obvious argument that Inai et al got it wrong.
Reed Cartwright, a population geneticist, has noted a methodological flaw in the
"However, the sections quoted from Inai et al. (2003) suffer from a major methodological error; they failed to consider that substitutions could have occurred in the rat lineage after the splits from the other two. The researchers actually clustered substitutions that are specific to the rat lineage with separate substitutions shared by guinea pigs and humans. . .
If I performed the same analysis as Inai et al. (2003), I would conclude that there are ten positions where humans and guinea pigs experienced separate substitutions of the same nucleotide, otherwise known as shared, derived traits. These positions are 1, 22, 31, 58, 79, 81, 97, 100, 109, 157. However, most of these are shown to be substitutions in the rat lineage when we look at larger samples of species.
When we look at this larger data table, only one position of the ten, 81, stands out as a possible case of a shared derived trait, one position, 97, is inconclusive, and the other eight positions are more than likely shared ancestral sites. With this additional phylogenetic information, I have shown that the "hot spots" Inai et al. (2003) found are not well supported." (see Link)
It does indeed seem like a number of the sequence differences noted by
Cartwright are fairly unique to the rat - especially when one includes several
other species in the comparison.
However, I do have a question regarding this point. It seems to me that
there simply are too many loci where the rat is the only odd sequence out in
Exon X (i.e., there are seven and arguably eight of these loci). Given the
published estimate on mutation rates (Drake) of about 2 x 10-10 per
loci per generation, one should expect to see only 1 or 2 mutations in the 164
nucleotide exon in question (Exon X) over the course of the assumed time of some
30 Ma (million years). Therefore, the argument of the mutational differences
being due to mutations in the rat lineage pre-supposes a much greater mutation
rate in the rat than in the guinea pig. The same thing is true if one compares
the rat with the mouse (i.e., the rat's evident mutation rate is much higher
than that of the mouse).
This is especially interesting since many of the DNA mutations are synonymous (see Link). Why should essentially neutral mutations become fixed to a much greater extent in the rat gene pool as compared to the other gene pools? Wouldn't this significant mutation rate difference, by itself, seem to suggest a mutationally "hot" region - at least in the rat?
Beyond this, several loci differences are not exclusive to the rat/mouse gene pools and therefore suggest mutational hotspots beyond the general overall "hotness" or propensity for mutations in this particular genetic sequence.
Some have noted that although the shared mutations may be the result of hotspots, there are many more mutational differences between humans and rats/guinea pigs as compared to apes. Therefore, regardless of hotspots, humans and apes are clearly more closely related than are humans and rats/guinea pigs.
The problem with this argument is that the rate at which mutations occur is related to the average generation time. Those creatures that have a shorter generation time have a correspondingly higher mutation rate over the same absolute period of time - like 100 years. Therefore, it is only to be expected that those creatures with relatively long generation times, like humans and apes, would have fewer mutational differences relative to each other over the same period of time relative to those creatures with much shorter generation times - like rats and guinea pigs.
This same sort of thing is seen to a fairly significant degree in the GULO region. Many of the same regional mutations are shared between humans and guinea pigs. Consider the following illustration yet again:
Why would both humans and guinea pigs share major deletions of exons I, V and VI as well as four stop codons if these mutations were truly random? In addition to this, a mutant group of Danish pigs have also been found to show a loss of GULO functionality. And, guess what, the key mutation in these pigs was a loss of a sizable portion of exon VIII. This loss also matches the loss of primate exon VIII. In addition, there is a frame shift in intron 8 which results in a loss of correct coding for exons 9-12. This also reflects a very similar loss in this region in primates (see Link). That's quite a few key similarities that were clearly not the result of common ancestry for the GULO region. This seems to be very good evidence that many if not all of the mutations of the GULO region are indeed the result of similar genetic instabilities that are prone to similar mutations - especially in similar animals.
As an aside, many other genetic mutations that result in functional losses are known to commonly affect the same genetic loci in the same or similar manner outside of common descent. For example, achondroplasia is a spontaneous mutation in humans in about 85% of the cases. In humans achondroplasia is due to mutations in the FGFR2 gene. A remarkable observation on the FGFR2 gene is that the major part of the mutations are introduced at the same two spots (755 C->G and 755-757 CGC->TCT) independent of common descent. The short legs of the Dachshund are also due to the same mutation(s). The same allelic mutation has occurred in sheep as well.
For additional commentary on this topic see: ( Link )
Real Time Molecular Convergence
Another interesting example of this phenomenon has been studied in detail in more rapidly reproducing organisms, such as viruses. For example, an interesting study was published by Bull et al., on replicate lineages of the bacteriophage phiX174. Numerous mutations occurred in each genome during propagation. Across nine separate lineages 119 independent substitutions occurred at 68 nucleotide sites. What is interesting here is that over half of these substitutions at 1/3 of the sites were identical in the different lineages. Some convergent substitutions were specific to specific hosts while others where shared between the two separate hosts. Phylogenetic reconstruction using the complete genome sequence not only failed to recover the correct evolutionary history because of these convergent changes, but the true history was rejected as being a significantly inferior fit to the data. 27 In a subsequent similar study Bull et al argue that such results "point to a limited number of pathways taken during evolution in these viruses, and also raise the possibility that much of the amino-acid variation in the natural evolution of these viruses has been selected." 29 In other words, much of the variations in viral genomes is not neutral, but is in fact functional and therefore maintained by natural selection.
This is amazing! The implications here are quite stunning. If the convergent nature of molecular mutations like this cannot be adequately detected such mutations would interfere with any sort of reliable phylogenetic tree building or accurate determination of evolutionary relationships. If there is any sort of correlation with higher-level multicellular organisms, this could significantly undermine the entire science of evolutionary biology as it is currently understood. Real time studies like this are obviously needed on a wider scale to determine if such convergent mutations are more widespread. Obviously, the common assumption that convergent mutations on the molecular level are rare and the result of completely random chance is simply not true anymore for at least some (and possibly most if not all) genomes.
A similar finding was described more recently by Cuevas et al. in a 2002 article published in Genetics dealing with RNA viruses (see Addendum).28 In this study the authors again demonstrated convergences in 12 variable sites in independent lineages. The authors were surprised to discover that convergences occurred not only within non-synonymous sites, but in synonymous sites and intergenetic regions as well (usually thought to be neutral with respect to the effects of natural selection). The authors also noted that this phenomenon is not restricted to the laboratory, but is also a relatively widespread observation among HIV-1 virus clones in humans and in SHIV strains isolated from macaques, monkeys, and humans.
These same authors go on to note that, "Convergent evolution at the molecular level is not controversial as long as it can be reconciled with the neutralist and the selectionist theories. The neutral theory suggests that convergences are simply accidents, whereas within the framework of selectionism, there are two qualifications for convergences. The first explanation considers convergences as being adaptive and the result of organisms facing the same environment (as in the case of our experiments) with a few alternative pathways of adaptation (as expected for compacted genomes). Second, keeping in mind the model of clonal interference, beneficial mutations have to become fixed in an orderly way (Gerrish and Lenski 1998), with the best possible candidate fixed first, and then the second best candidate, and so on. This implies that, given a large enough population size to make clonal interference an important evolutionary factor, we should always expect the same mutations to be fixed."
According to the authors, the the above argument is valid for nonsynonymous changes but an alternative explanation must be found for synonymous changes and for changes in the intergenic regions since these changes are generally though to be selectively neutral. So, the authors note that, "Genomic RNA is involved in many RNA-RNA and RNA-protein interactions that affect viral replication. This is obvious for noncoding, regulatory regions (Stillman and Whitt 1997, 1998), but there is increasing evidence that capsid-coding regions in picornaviruses may also have an effect on viral replication (McKnight and Lemon 1998; Fares et al. 2001). Therefore, the RNA itself (apart from its protein-coding capacity) may contribute to the viral phenotype, and fitness may also be affected by synonymous replacements." This is an important point because, "Evidence for selection on synonymous sites has been inferred also in mammals (Eyre-Walker 1999), as a consequence of selection acting upon the base composition of isochors and large sections of junk DNA."
In other words, there doesn't seem to be much DNA, even in seemingly non-functional areas of DNA or even among synonymous changes, that is truly non-functional when it comes to viral genomes. The authors then go on to suggest a comparison with the genomes of high-level organism, like hominids.
"For example, Fay et al (2001) reported that, in humans, the vast majority (80%) of amino acidic changes are deleterious to some extent and only a minor fraction are neutral. Among these deleterious amino acidic mutations, at least 20% are slightly deleterious. Here, we found that 15 amino acid sites changed, with only 5 being significantly advantageous. At this point, we can only speculate about the selective role of all the amino acid sites shown to be invariable in our study. The total number of amino acids in five genes of VSV is 3536. Assuming that changes in any of the 3536 - 15 - 3521 invariable amino acids would be deleterious (and thus washed out by purifying selection during or evolution experiment), then the fraction of amino acid replacements that are potentially harmful would be 3521/3536 = ~99.58%; the fraction of neutral sites would be 10/3536 = ~0.28%; whereas only 5/3536 = ~0.14% would be beneficial. Despite the differences between humans and VSV in genome size and organization and in the nature of the nucleic acid used, in both cases the fraction of potentially deleterious amino acid substitutions is overwhelmingly larger than that of neutral or beneficial ones."
In other words, it is at least reasonable to suspect that very little coding DNA, even in hominids, is truly "neutral" or immune to all pressures of natural selection. This is becoming true of non-coding DNA as well given that much of what was once thought to be junk is now being found to be functional ( Link ). This strongly suggests that many of what were thought to be shared mutational errors might actually be functionally-maintained by similar creatures in similar environments. In this light, consider the following conclusions of Wood et al published in a 2005 edition of Genetica:
The most convincing evidence of parallel genotypic adaptation comes from artificial selection experiments involving microbial populations. In some experiments, up to half of the nucleotide substitutions found in independent lineages under uniform selection are the same. Phylogenetic studies provide a means for studying parallel genotypic adaptation in non-experimental systems, but conclusive evidence may be difficult to obtain because homoplasy can arise for other reasons. Nonetheless, phylogenetic approaches have provided evidence of parallel genotypic adaptation across all taxonomic levels, not just microbes. Quantitative genetic approaches also suggest parallel genotypic evolution across both closely and distantly related taxa, but it is important to note that this approach cannot distinguish between parallel changes at homologous loci versus convergent changes at closely linked non-homologous loci. The finding that parallel genotypic adaptation appears to be frequent and occurs at all taxonomic levels has important implications for phylogenetic and evolutionary studies. With respect to phylogenetic analyses, parallel genotypic changes, if common, may result in faulty estimates of phylogenetic relationships. [Emphasis added] 30
Notice that according to Wood et al, parallel and/or convergent mutations are "frequent" at "all taxonomic levels, not just microbes". That's very interesting and does indeed have very serious implications when it comes to determining phylogenetic relationships - relationships that are likely to be not only wrong, but meaningless as far as the evolutionary theory of common descent is concerned. Rather, phylogenetic similarities may be more a reflection of functional similarities and differences than of true evolutionary relationships.
Back to mutational hotspots, what makes hotspots so "hot"? Perhaps the answer lies in the chemical nature of the hotspot region. The type of molecular bonds, their stability or instability, or other molecular interactions may lend themselves to specific nucleotide pair switches, especially given certain environmental changes. No one really knows for sure except to say that mutational hot spots do exist. So, given that they do exist, similar genes should be expected to function in similar ways and this includes having similar mutational "hotspots and/or "shared mistakes." 3 In any case, it is interesting to note that there are no such examples of "shared errors" between mammals and other groups of animals (although there are plenty of common "errors" that are shared by widely divergent mammalian groups).
There are no examples of 'shared errors' that link mammals to other branches of the genealogic tree of life on earth. . . Therefore, the evolutionary relationships between distant branches on the evolutionary genealogic tree must rest on other evidence besides 'shared errors.' 11
Of course the argument used to explain this fact is that mammals split off from other groups of animals over 200 million years ago. Given this amount of time, random mutations would have obliterated any trace of common genetic errors. 11 This is a very good point. The question remains however as to why are some identifiable genetic errors are maintained as long as they are if they are in fact functionless? Also, "processed pseudogenes" are very similar to "movable genetic elements" which are often transmitted from animal to animal by viruses. Certain interspecies pseudogenes of this type might in fact share a common ancestor while the various types of animals themselves, that harbor certain of these genetic sequences, may not be related through common descent so much as they are partially related through common infection.
In any case, there really are no "foolproof" genetic markers of common decent. All of the ones proposed so far to be foolproof have been shown to have significant flaws. The prediction that pseudogenes, transposons (SINEs and LINEs) and other shared mutational mistakes are conclusive evidence for common descent has not held up over recent years. For example, consider the following excerpt from David Hillis' paper entitled, "SINEs of the perfect character." published in the Proceedings of the National Academy of Sciences, 1999:
What of the claim that the SINE/LINE insertion events are perfect markers of evolution (i.e., they exhibit no homoplasy)? Similar claims have been made for other kinds of data in the past, and in every case examples have been found to refute the claim. For instance, DNA-DNA hybridization data were once purported to be immune from convergence, but many sources of convergence have been discovered for this technique. Structural rearrangements of genomes were thought to be such complex events that convergence was highly unlikely, but now several examples of convergence in genome rearrangements have been discovered. Even simple insertions and deletions within coding regions have been considered to be unlikely to be homoplastic, but numerous examples of convergence and parallelism of these events are now known. Although individual nucleotides and amino acids are widely acknowledged to exhibit homoplasy, some authors have suggested that widespread simultaneous convergence in many nucleotides is virtually impossible. Nonetheless, examples of such convergence have been demonstrated in experimental evolution studies. 10
Introns and Common Ancestry
The term "intron" is short for "intragenic region," which are sections of DNA within DNA that codes for genes. Introns are common in eukaryotic organisms (animals, plants, fungi, etc), but not in prokaryotes (bacteria, etc). When genes containing introns are transcribed, introns are usually spliced out, leaving a transcript of the DNA sequence without the original intron sequence. The size of introns has a very wide range, from less than 20 base pairs to almost 500,000 base pairs. What is also interesting is that the total size of the introns within a given gene may be much larger than the coding regions of the gene itself - by over 90% on occasion.
What is also interesting about Introns, for the purposes of this particular discussion, is that introns have long been thought to be evolutionary remnants of random insertions of DNA - random insertions which are almost always either harmful or neutral. Of course, if an insertion was neutral or near neutral, it could become fixed in a population's gene pool and be passed on over time. Overall, however, because of the notion that intron insertions are almost always detrimental, it had also been assumed that such insertion events were extremely rare.
However, some interesting research by Li et. al. was published in 2009 suggesting that intron insertions are not rare, commonly detrimental, or the result of common ancestry when found at the same location within different organisms.
"Our molecular analyses have enabled us to reject a number of hypotheses for the mechanism of intron origins, while clearly indicating an entirely unexpected pathway -- emergence as accidents arising during the repair of double-strand breaks."56
The authors of this study also note that 17% of the intron insertions the observed were parallel insertions being identical within independent genotypes or ancestral lines. In other words, intron insertions favor certain particular hotspots within the genome.
"The most intriguing finding for me is the multiple instances of parallel intron gains, because this means that Daphnia is in an active phase of intron proliferation," Li said. "This makes Daphnia an extraordinary system to study intron evolution. In addition, we believe our work facilitates a more accurate estimate of intron gain rates, and directly challenges the assumption that parallel intron gains are rare in many prior analyses....
Remarkably, we have found many cases of parallel intron gains at essentially the same sites in independent genotypes," Lynch said. "This strongly argues against the common assumption that when two species share introns at the same site, it is always due to inheritance from a common ancestor." 56
Endogenous Retroviruses (ERVs)
Updated: April, 2012
The case for common descent:
Endogenous retroviruses or "ERVs" are viral elements that are thought to have inserted themselves into the genomes of various creatures - to include humans and apes. ERVs are thought by many to be among the strongest evidences supporting the theory of common descent. For example, it is argued that the same ERVs in the same locations in the genomes of both humans and apes are best explained by a shared common ancestry between humans and apes. In other words, the common ancestor of humans and apes must have been the one to initially experience the ERV insertion into its genome. Then, later, when human and ape ancestors split off from this common ancestral lineage, the same ERV sequences were maintained in the same places in the genomes of both lineages.
This argument seems rather straightforward and even downright obvious at first approximation. However, there are several potential problems with this theory.
Signs of function:
One problem is that a number of ERVs, or at least portions of ERVs, are being discovered to be functionally beneficial.
The ERV known as enJSRV has been shown to "regulate trophectoderm growth and differentiation in the peri-implantation ovine conceptus. This work supports the hypothesis that ERVs play fundamental roles in placental morphogenesis and mammalian reproduction." 35
In fact, ERVs elements are thought to control or aid in the transcription of over 20% of human genome46 and can trigger premature transcriptional termination at a distance. (Link)
It is also interesting to consider that ERVs and other supposedly "parasitic" DNA elements are found far more often in the genomes of more "complex" organisms - suggesting again that non-coding portions of DNA once thought to be nothing but "junk DNA" and evolutionary remnants are actually playing important functional roles in the genomes of more functionally complex organisms.
"With the accumulation of genomic sequence data, certain unexplained patterns of genome evolution have begun to emerge. One striking observation is the general tendency of genomes of higher organisms to evolve an ever decreasing gene density with higher order. For example, E. Coli has a gene density of about 2 Kb per gene, Drosophila 4 Kb per gene and mammalian about 30 Kb per gene. Much of the decreased density is due to the increase in the accumulation of non-coding or 'parasitic DNA' elements, such as type one and two transposons. Current evolutionary theory does not adequately account for this observation (81). In addition mammals appear to have retained the presence of at least some copies of non-defective 'genomic retroviruses', such as intercysternal A-type particles (IAP's) or endogenous retroviruses (ERVs). It is currently difficult to account for the selective pressure that retains these genomic viruses . . ." 38
In this same line, a subsequent paper presented evidence that proposes potential reasons for the previously observed "selective pressure" and the actual need for ERVs within the genomes of complex organisms like humans. In the journal Bioinformatics, Conley et. al. write:
"We report the existence of 51,197 ERV-derived promoter sequences that initiate transcription within the human genome, including 1743 cases where transcription is initiated from ERV sequences that are located in gene proximal promoter or 5' untranslated regions (UTRs)...
Our analysis revealed that retroviral sequences in the human genome encode tens-of-thousands of active promoters; transcribed ERV sequences correspond to 1.16% of the human genome sequence and PET tags that capture transcripts initiated from ERVs cover 22.4% of the genome. These data suggest that ERVs may regulate human transcription on a large scale." 46
ERVs are also being shown to be protective against infection by harmful exogenous retroviruses:
"A possible biological role hypothesized for ERVs is to help the host resist infections of pathogenic exogenous retroviruses, affording a selective advantage to the host bearing them. For instance, some avian and murine ERVs can block infection of related exogenous retroviruses at entry by receptor interference; mouse Fv-1 blocks infection at a preintegration step, also can be viewed as an ERV." 50
ERVs may also aid in modulating the activity of the immune system:
"For example, the HERV-K sequence of the human teratocarcinoma derived virus type (HTDV), is reported to be able to make retrovirus like particle and can express gag, pol and env genes via vectors. Also, ERV 3 can express env gene in embryonic placental tissues. Such reports may now explain the numerous early observations of being able to find viral particles in human tissues. Although some HERV's are expressed in mammary tumors, the feline RD114, ERV-3, and HERV K10+ are all expressed in placental tissues. What then is the significance of nondefective ERVs and why is expression so common in embryos? . . . I and Venables et al. in the Boyd group have proposed that some of these HERV's may function during embryo implantation to help prevent immune recognition by the mother's immune system. . .
In addition, the ERV gag gene product may also be immuno-modulatory. The p70 (gag) of mouse IAP has been cloned and expressed and shown to be identical to IgE binding factor (IgE-BF) which is a regulator of B-cell ability to produce IgH. More recently, it has been reported that endogenous gag is Fv-1, an-Herv.L like endogenous virus which confers resistance to MLV tumors. Although some researchers disagree with the immunomodulatory role of p15E, an immune suppressing activity in culture assays has been clearly established. These supporting results seem sufficiently clear to warrant a serious investigation that both the env and gag gene products of ERV's may modulate immunity." 38
So, it isn't a necessary default that all ERV-like sequences are functionless evolutionary remnants of random viral infections as originally proposed by prominent evolutionists such as Richard Dawkins or Douglass Theobald. This fact was highlighted by Richard Sternberg in a 2002 issue of Annals of the New York Academy of Sciences in the following statement:
"The selfish DNA narrative and allied frameworks must join the other ‘icons’ of neo-Darwinian evolutionary theory that, despite their variance with empirical evidence, nevertheless persist in the literature." 47
Similar support for this concept is noted by Dr. Wang from the Haussler lab:
"These results raise new questions about the role of so-called 'junk DNA,' the vast regions of the genome that don't code for proteins. ERVs fall into that category. Many scientists once believed that such DNA served no purpose, but new data from the Haussler lab and other labs are challenging that view." 48
Origin of ERVs from exogenous retroviruses - or visa versa?:
There is also some evidence that exogenous retroviruses are occasionally derived from ERVs - instead of the other way around.
"Exogenous retroviruses may have originated from ERVs and ERV-Ls in particular may represent an intermediate between retrotransposons and exogenous viruses." 49
This concept is supported by the observation that there are no known examples of current exogenous retroviral insertions into the modern human genome. Also, there are no known infectious exogenous counterparts of any human endogenous retroviruses known to exist today. This is a very curious finding considering the striking commonality of ERVs within the human and ape genomes if the prevailing hypothesis that these ERV sequences were in fact derived from exogenous infective retroviruses.
“No current transposition activity of HERVs or endogenization of human exogenous retroviruses has been documented so far.” 51
“Most of these elements represent ancient retroviral infections, as evidenced by their wide distribution in primate species, and no infectious counterparts of human endogenous retroviruses (HERVs) are known to exist today.” 52
This opens up the possibility that at sometime in the past all exogenous retroviruses were originally derived from ERVs - - not necessarily the other way around with ERVs originally being derived from exogenous retroviral infections. In other words, it is possible that all sequences that are now thought to be viruses or viral elements were originally derived from functional genetic sequences that have since suffered degenerative changes and loss of genetic controls, resulting in various parasitic features that we see in many viruses today - as well as the resulting harmful effects of this loss of regulation such as tumor development and the association of numerous types of cancers and neoplastic processes.
Non-random viral insertions:
Beyond this, it has also been shown that the insertions of ERVs are not entirely random despite this common belief - even among mainstream scientists. ERVs actually show a preference for certain fairly specific locations in various genomes.
"Although retrovirus integration can occur throughout the genome, local "hot spots" for integration exist where a strong preference for particular sites over others can be demonstrated statistically. Recent work with HIV and murine leukemia virus has implied that there is also a preference for integration into transcribed regions of the host genome, in the case of murine leukemia virus, near transcriptional start sites. The basis for these preferences is unknown, but they may reflect interaction of the pre-integration complex with specific proteins or with specific DNA sequences or structures that are associated with transcription." 36
"But although this concept of retrovirus selectivity is currently prevailing, practically all genomic regions were reported to be used as primary integration targets, however, with different preferences. There were identified 'hot spots' containing integration sites used up to 280 times more frequently than predicted mathematically." 43
The odds against similar ERV germline insertions:
In order for an ERV to be in the same location in differing populations via common descent one of two things had to have happened. Either many individuals in the same population were infected by the save virus which inserted itself into the same position in all the different individuals (highly unlikely scenario), or there was a very significant population bottleneck where only a very few individuals (like just one individual) were infected and then the offspring of that individual subsequently overtook the entire population to achieve fixation of the viral sequence within all individuals of the population.
In short, a viral event would have had to overtake all of the individuals in population for each different ERV sequence in the genome (hundreds of thousands of them), and each ERV would have had to inject itself into gametic cells while not harming the reproductive fitness of the host - - and all of this would have had to happen many times in different species. The odds do not seem all that likely - especially when one considers the very high detrimental mutation rate for humans and apes and the very real odds that detrimental mutations would lead toward rapid genetic meltdown and extinction during extended population bottlenecks.
Another interesting aspect of ERVs is that they do not always show the expected evolutionary pattern of "inheritance". According to the proposed phylogenetic tree (shown to the right) chimps are closer to humans than to gorillas. Given this scenario, gorillas and chimps would only be expected to share an ERV if this same ERV were also present in humans. However there are some ERVs that don't seem to fit this pattern. For example, the K family of ERVs (HERV-K provirus) is present in chimps and gorillas, but not in humans.40 Also, portions of ERVs known as CERV 2 and CERV 1 elements are present in chimpanzee, bonobo and gorilla (non-orthologous) but are absent in human, orangutan, old world monkeys, new world monkeys.39
The usual explanation for such findings, of course, is that humans lost this or that particular ERV along the way. Of course, this post-hoc argument could be used to explain any aberrancy. It seems somewhat difficult to imagine, however, how an entire human population could loose ERVs that are preserved in both chimps and gorillas? - outside of yet another significant population bottle neck that is.
There are also other even more problematic phylogenetic inconsistencies with ERVs:
"We performed two analyses to determine whether these 12 shared map intervals might indeed be orthologous. First, we examined the distribution of shared sites between species (Table S3). We found that the distribution is inconsistent with the generally accepted phylogeny of catarrhine primates. This is particularly relevant for the human/great ape lineage. For example, only one interval is shared by gorilla and chimpanzee; however, two intervals are shared by gorilla and baboon; while three intervals are apparently shared by macaque and chimpanzee. Our Southern analysis shows that human and orangutan completely lack PTERV1 sequence (see Figure 2A). If these sites were truly orthologous and, thus, ancestral in the human/ape ancestor, it would require that at least six of these sites were deleted in the human lineage. Moreover, the same exact six sites would also have had to have been deleted in the orangutan lineage if the generally accepted phylogeny is correct. Such a series of independent deletion events at the same precise locations in the genome is unlikely (Figure S3). . .
Several lines of evidence indicate that chimpanzee and gorilla PTERV1 copies arose from an exogenous source. First, there is virtually no overlap (less than 4%) between the location of insertions among chimpanzee, gorilla, macaque, and baboon, making it unlikely that endogenous copies existed in a common ancestor and then became subsequently deleted in the human lineage and orangutan lineage. Second, the PTERV1 phylogenetic tree is inconsistent with the generally accepted species tree for primates, suggesting a horizontal transmission as opposed to a vertical transmission from a common ape ancestor. An alternative explanation may be that the primate phylogeny is grossly incorrect, as has been proposed by a minority of anthropologists." [emphasis added] 42
"Inconsistencies do exist with phylogenetic analyses and are often explained by ad hoc arguments without positive evidence." ( Link - last accessed 3/10/09)
In fact, it seems like just about any finding or data set can be explained within the evolutionary paradigm using this or that "ad hoc" explanation to make the data fit the theory. This produces a problem of bias when it comes to interpreting data sets. Such biases in the interpretation of ERV phylogenies have been recognized for some time now. For example, according to Posada and Crandal, in a 2001 paper published in Molecular Biology and Evolution:
"Wrong models of [retroviral] evolution lead to the estimation of trees that are in agreement with biochemical and immunological evidence and with previous phylogenetic studies. . .
When examining the results of the present study, only those trees estimated according to simple, likely wrong, models of evolution agree with current evidence. In most of the reconstructed trees, different genera appear as monophyletic groups. These groups have normally high bootstrap values indicating that, given the data sets at hand, we can be confident in the nodes defining these clusters. When more complex, more realistic, models of evolution are employed, fewer genera are recovered as monophyletic, the level of support is lower, and the topologies are very different from the assumed "known" trees.
Phylogenetic bias, by which "incorrect" models can give "correct" answers, has been identified in simulation studies. Why this bias occurs is a question that remains unsolved. . .
One possible factor contributing to the bias is most likely a problematic alignment, in which sequences belonging to the same group (genus) are easily aligned, whereas the opposite is true for sequences belonging to different groups. Complex models might be confounded when trying to extract information from the bad intragroup sequence alignment, while simpler models use basically the observed patterns. This would warrant a word of caution for the estimation of phylogenies from highly divergent data sets." 45
The sheer number of ERVs:
Not too long ago it was thought that around 30,000 ERVs existed within the human/ape genomes, comprising between 1-8% of each. 37,41,43 As of the 2005 Chimpanzee Sequencing and Analysis Consortium, where the entire chimpanzee genome was compared to the human genome, it is now thought that approximately 200,000 ERVs, or portions of ERVs, exist within the genomes of both humans and apes - totaling around 127 million base pairs (around 4% of the total genomic real estate).63 Some authors suggests a 45% ERV origin for the human genome at large (Mindell and Meyer 2001) and 50% for mammalian species in general ( Link ), if all small fragments of ERV sequences are included in the estimate. In any case, of these hundreds of thousands of recognizable portions of ERVs, the vast majority of them seem to match up, at the very same loci, between humans and chimps. As suggested in "Table 2" from the paper (to the right), less than 1% of the ERVs are lineage specific for either humans or apes. In other words, the vast majority of ERVs are shared or "orthologous" between humans and chimps (a significant increase from the seven or so that were once thought to infect both humans and chimps at identical locations ( Link ).
For many this finding alone (of near universal homology among ERVs between species) might suggest overwhelming evidence in favor of common ancestry - not to mention the nested hierarchical patterns that most of these ERVs display between species. Certainly, it does not seem rational to deny a common origin of some kind. However, given the ever increasing discoveries of functionality for various elements of ERVs, in all their various forms and fragments, combined with the difficulty to explain truly exogenous sources for their existence, the theory of a common evolutionary ancestry becomes less tenable. The existence of ERVs or parts of ERVs at the same or similar places mostly doing the same or similar jobs over so much of the genome (often at a very high level of integrated complexity) in similar creatures strongly favors the theory of a common designer.
The Darwinian mechanism of random mutations and function-based natural selection is simply inadequate to explain the high-level functional complexity of ERV elements within very complex and heavily integrated genomes. Also, the fact is that there simply are no known examples of exogenous retroviruses inserting themselves into the germ lines of humans or chimps, or any other animal for that matter. If the common descent hypothesis were in fact true, that such events have occurred in the past at very rapid rates, one should expect to see at least a few real time examples of such. Why are there no such examples in action? The requirement for so many tens and hundreds of thousands of neutral retroviral infections to have achieve fixation in the original germ-line ancestry of humans and chimps, requiring very small population bottlenecks over long periods of time, is also strongly inconsistent with the hypothesis of common descent on many levels, while being right in line with the hypothesis of original intelligent design with degenerative changes acting over subsequent periods of time.
But why would any intelligent designer deliberately design retroviral elements as part of the genomes of highly complex living things? Aren't retroviral insertions almost always functionally detrimental? Given that so many beneficial features are now known to be tied to these endogenous retroviral elements, that much of our genome is actually controlled by these elements, that we wouldn't be able to live without them, it seems much more likely that the original genome was in fact designed with these elements in place at the very beginning - that they have always been vital to the existence of humans, apes, and all other complex organisms. However, as is always the case when complex reproductive machines or lines of code are subject to random mutations and natural selection, degenerative changes take place whereby parasitic elements are quickly realized through the loss of pre-existing complexities that used to control or modify their activities. It is far more likely, therefore, that exogenous retroviruses that currently plague humanity (like HIV, HTLV-1, Hep-B, etc) where originally derived from endogenous retroviral sequences that suffered degenerative changes. Like a cancer less that has escaped the normal systems of control and regulation, these exogenous viruses became "selfish" and parasitic, attacking and feeding off the host instead of contributing to the ideal function of the host.
There are many real time examples of such degenerative changes that result in parasitic functionality - such as the TTSS toxin injector system in bacteria that evolved through the loss of pre-existing structural elements from the rotary bacterial flagellum. The TTSS system is now used by toxic bacteria - like the bacteria that cause Bubonic Plague or "Black Death" (Yersinia pestis). Such are better referred to as devolutionary changes rather than evolutionary changes since they are based on a loss, rather than a true gain, of novel functional complexity.
See also the following excellent Review Article on ERVs.
Fusion of Chromosome 2
It is commonly argued that the human version of chromosome 2 has in fact suffered a fusion event sometime in the past. Telomeres, for example, are usually found only at the ends of chromosomes - not in the middle. However, within the middle of chromosome 2 there is a pretelomeric sequence, a telomeric sequence, an inverted telomeric sequence and an inverted pretelomeric sequence - in that order. There are also remnants of an extra centromere as well as similar banding patterns with the equivalent chromosomes (2p and 2q) of apes.53
So, it seems fairly obvious to most that there was a chromosomal fusion event sometime in the past in the ancestry of modern humans. However, the fusion event that resulted in the formation of modern human chromosome 2 is not at all inconsistent with a theory of a separate ancestry of humans and apes - despite the very common assertion by mainstream scientists and Darwinian apologists that the fusion of chromosome 2 is very clear evidence for the common ancestry of humans and apes.
How can I possibly even suggest such an argument? - against the vast majority of
mainstream scientists? Well, for one thing, chromosomal fusions happen to
be fairly common - even within the same species. In fact, there are humans
alive today that have chromosomal fusions - and surprise surprise, they're still
human! - morphologically and functionally indistinguishable from other modern
humans. Another example can be found with horses.
Hybrids of the wild horse have 33 pairs while the domesticated horse has 32
chromosomal pairs. Also, domestic dogs and wolves of the genus canis have
78 chromosomes while foxes have a varied number from 38-78 chromosomes. Yet
another example is the house mouse Mus Musculis, which has 40
chromosomes, while a population of mice form the Italian Alps was found to have
only 22 chromosomes (
So, the different chromosomal numbers between humans and apes doesn't necessarily indicate common ancestry. It is not evidence for when the event took place, nor is it evidence for the ancestry prior to that event. It could just as easily mean that similar creatures with independent ancestries originally had the same chromosome number and general banding patterns - a number that was later altered by fusion mutations in the human population during a population bottleneck. Given another dramatic population bottleneck in the future, such a transmissible fusion could easily happen again - in either apes or humans . . . or any other creature for that matter. That's what's clearly predictable here. Even those who believe in intelligent design (ID) understand that not all genetic features require the input of intelligence. The simple fusion of two chromosomes, without any significant functional gain or loss, is easy to explain via random mindless processes and is actually fairly common. No big deal. Not very surprising or shocking - not even from an ID perspective. In fact, evolutionists would make exactly the same argument for the common ancestry of humans and apes without the fusion of chromosome 2. This fusion event really adds nothing to the argument. It simply presents no additional explanatory or predictive power to the argument for common descent beyond the simple observation that similarities suggest a common origin of some kind...
In other words, those evolutionists who present this argument do not provide any evidence that the human ancestor who originally had 48 chromosomes (as apes do), was actually any more closely "related" to apes, functionally or morphologically, than are modern humans. It isn't the fact that apes have 48 chromosomes that make them look and act and function like apes rather than humans. If it were that simple, evolutionists would actually have a very good argument. The problem for Darwinists is that it isn't nearly that simple - not even close. This chromosomal number difference produces no obvious functional differences between apes and humans in and of itself - none at all. Therefore, it is not a stretch to assume that any 48 chromosome ancestor of modern humans might have also had a chromosomal scheme similar to that of apes, regardless of whether or not that individual was "related" to apes via common descent. Claiming that banding pattern similarities is evidence of common ancestry with apes simply invokes the “similarity = common descent” argument, and thus begs the question.
While it is quite reasonable that strong similarities, such as exist between humans and apes, do in fact indicate a common origin, that common origin is not necessarily based on common descent via slow genetic modifications selected by mindless nature over time from some shared common ancestor. Given the highly functionally complex differences between the two species which are being discovered more and more in recent years (especially in non-coding regions of the genome) it seems far more likely that the common origin of these differences, as well as the similarities, was based in deliberate highly intelligent design. The only event(s) that clearly do not require the input of high-level outside intelligence are events like random chromosomal fusions or other forms of random mutations which are very unlikely to produce any functional benefit beyond very low levels of functional complexity ( Link )
Again, it is entirely possible, quite likely in fact, that our human ancestors underwent a chromosomal fusion event during a population bottleneck in fairly recent history (i.e., within the past several thousand years at most), easily explaining the fusion of chromosome 2. This concept is supported by an article published in a 2003 issue of Nature by Rohde et. al. where the authors make the following argument:
"These analyses suggest that the genealogies of all living humans overlap in remarkable ways in the recent past. In particular, the MRCA [most recent common ancestor] of all present-day humans lived just a few thousand years ago [~3,000] in these models. Moreover, among all individuals living more than just a few thousand years earlier than the MRCA, each present-day human has exactly the same set of genealogical ancestors." 54
Interstitial Telomeric Sequences
In this line it is also interesting to note that "interstitial" telomeric sequences (ITSs) with their repeats of TTAGGGTTAGGGTTAGGG... are found scattered throughout the human and ape genomes - all over the place. It used to be thought that these interstitial ITSs were simply junk sequences - left over evolutionary garbage from ages past. However, it has since been discovered that these ITSs are often functionally important to the genome. The chromatin organization of telomeres can silence genes and has been linked to epigenetic modes of inheritance. Furthermore, difference classes of transcripts are derived from telomeres and their flanking repetitive DNA regions. These are involved in numerous cellular and developmental functions.
What is also interesting is that of the numerous known ITS sequences within the human and ape genomes, only one of them, at the 2q13 ITS site, is actually shared by humans and chimps. In other words, of the many known ITSs in the genomes of both humans and apes (and cows, chickens, rats, etc), only the 2q13 ITS can be associated with an evolutionary breakpoint or fusion event. The other ITSs simply do not line up with chromosomal breakpoints in primates.60 So, to argue that the 2q13 ITS is typical of what is seen in human and chimp genomes seems to be a cherry picking of the available facts. Most of the known ITS sequences are not "DNA scars" in the way they have been portrayed. Rather, it seems more likely that ITSs are sites where TTAGGG repeats have simply been added to chromosomes by telomerases and that many of these ITS sites are associated with distinct sets of proteins which have been linked to important functional roles within the genome - such as recombination hotspots, etc.61
A New Paradigm
The Fractal Complexity of Life
Given the information presented so far, it seems quite obvious that the old notion that pseudogenes and other forms of shared "junk" DNA give clear evidence of common ancestry over common functional need, has to be discarded - together with the notion that most of the human genome is functionless evolutionary garbage. Certainly if organisms share similar environments and have similar morphologic appearances and needs, one should not be too surprised to find similar functional genetic elements shared between such creatures. Therefore, such sequences cannot be used to clearly establish evolutionary trees or to estimate divergent times since such beneficial sequences would be maintained over time via natural selection without any significant changes. The similarities and differences would not be based so much on evolutionary changes over the time since a shared common ancestor as they would be the result of similarities and differences in functional needs that have always been there, maintained by the forces of natural selection, since these creatures came to be.
But beyond this basic argument is the fascinating discovery of the fact that most of the non-coding DNA in the human genome is not junk after all. In fact, it is arguably the most functional part of the genome, directing the placement of the basic bricks and mortar of the organism - so to speak. Layer upon layer of information is being discovered. As in a fractal, the closer one looks, the more detailed and intricate and complex the functional elements of the genome appear - in layer after layer.
No one knows yet just what the big picture of genetics will look like once this hidden layer of information is made visible. "Indeed, what was damned as junk because it was not understood may, in fact, turn out to be the very basis of human complexity," Mattick suggests. Pseudogenes, riboswitches and all the rest aside, there is a good reason to suspect that is true. Active RNA, it is now coming out, helps to control the large-scale structure of the chromosomes and some crucial chemical modifications to them—an entirely different, epigenetic layer of information in the genome.16
In fact, the most detailed probe yet into the workings of the human genome has led scientists to conclude [as of June 14, 2007] that a cornerstone concept about the chemical code for life is badly flawed. Reporting in the British journal Nature and the US journal Genome Research on Thursday [June 14, 2007], they suggest that an established theory about the genome should be consigned to history.
In between the genes and the sequences known to regulate their activity are long, tedious stretches that appear to do nothing. The term for them is "junk" DNA, reflecting the presumption that they are merely driftwood from our evolutionary past and have no biological function. But the work by the ENCODE (ENCyclopaedia of DNA Elements) consortium implies that this nuggets-and-dross concept of DNA should be, well, junked.
The genome turns out to a highly complex, interwoven machine with very few inactive stretches, the researchers report. Genes, it transpires, are just one of many types of DNA sequences that have a functional role. And "junk" DNA turns out to have an essential role in regulating the protein-making business. Previously written off as silent, it emerges as a singer with its own discreet voice, part of a vast, interacting molecular choir.
"The majority of the genome is copied, or transcribed, into RNA, which is the active molecule in our cells, relaying information from the archival DNA to the cellular machinery," said Tim Hubbard of the Wellcome Trust Sanger Institute, a British research group that was part of the team. "This is a remarkable finding, since most prior research suggested only a fraction of the genome was transcribed."
Francis Collins, director of the US National Human Genome Research Institute (NHGRI), which coralled 35 scientific groups from around the world into the ENCODE project, said the scientific community "will need to rethink some long-held views about what genes are and what they do."17
"We fooled ourselves into thinking the genome was going to be a transparent blueprint, but it’s not," says Mel Greaves, a cell biologist at the Institute of Cancer Research in Sutton, UK. Instead, as sequencing and other new technologies spew forth data, the complexity of biology has seemed to grow by orders of magnitude. Delving into it has been like zooming into a Mandelbrot set — a space that is determined by a simple equation, but that reveals ever more intricate patterns as one peers closer at its boundary....
"It seems like we’re climbing a mountain that keeps getting higher and higher," says Jennifer Doudna, a biochemist at the University of California, Berkeley. "The more we know, the more we realize there is to know."...
Researchers from an international collaborative project called the Encyclopedia of DNA Elements (ENCODE) showed that in a selected portion of the genome containing just a few per cent of protein-coding sequence, between 74% and 93% of DNA was transcribed into RNA. Much non-coding DNA has a regulatory role; small RNAs of different varieties seem to control gene expression at the level of both DNA and RNA transcripts in ways that are still only beginning to become clear. "Just the sheer existence of these exotic regulators suggests that our understanding about the most basic things — such as how a cell turns on and off — is incredibly naive," says Joshua Plotkin, a mathematical biologist at the University of Pennsylvania in Philadelphia.57
The human genome in numbers26
- 1.5% of the genome translated into proteins
- 27% of the genome transcribed as part of protein-coding gene expression but not translated into proteins
- 25% of the genome that is transcribed but not translated, and is not associated with protein-coding genes
- 250 microRNAs currently identified (as of June 2005)
- ~1,000 as of 2007 ( Link )
- 10,000 protein-coded genes estimated to be regulated by microRNAs; each microRNA can target several genes, and a particular gene may be regulated by several microRNAs
- 98% of genomic output that is non-coding RNA
- 9% of genes that appear to have associated antisense transcripts
- ~20,000 "pseudogenes" in the genome
This is very interesting. I mean, who would have thought that the majority of the genome would be copied or transcribed into RNA? - and that it would in fact be functional? Only a few years ago the scientific community believed that less than 5% of the genome was actually functional and the rest was non-functional evolutionary remnants. After all, "noncoding genomic regions account for 98% to 99% of the human genome and consist of introns found within protein-coding transcripts and the intergenic regions between them."25 Add to these numbers the very surprising finding that many genetic sequences that do not produce either proteins or RNA are also being found to be functional (see discussion of Pyknons)
Who would have predicted this? - - besides creationists and intelligent design theorists that is? Creationists and intelligent design theorists have been claiming for many years that the concept of "Junk DNA" (as well as vestigial structures) was not entirely correct. I myself have been promoting this idea for years (since 1997). Yet, only now are mainstream scientists finally starting to realize the significant errors in their long-cherished beliefs when it comes to the ill-conceived notion of junk DNA - an idea which was based on ardently held evolutionary presuppositions that blinded mainstream science and prevented them from searching out the hidden treasures of so-called "junk DNA" for a fairly long time.
When are scientists going to start realizing that the creationist paradigm does indeed have very good predictive scientific value when it comes to accurately understanding and investigating the physical world and universe?
To add to this, consider the fairly recent finding (2006) of "pyknons" by Rigoutsos et al.24 Pyknons are variable-length patterns within DNA sequences that have identically conserved copies and multiplicities above what is expected by chance. They are also no transcribed into RNA (unlike miRNAs noted above) or translated into protein. Among the millions of discovered patterns, Rigoutsos et al. found a subset of 127,998 patterns, which they termed pyknons, that have additional nonoverlapping instances in the untranslated and protein-coding regions of 30,675 transcripts from 20,059 human genes. The pyknons arrange combinatorially in the untranslated and coding regions of numerous human genes where they form mosaics. Consecutive instances of pyknons in these regions show a strong bias in their relative placement, favoring distances of ~22 nucleotides.
Pyknons are also very common in the human genome. They form 1/6th of the human intergenic and intronic regions for a total of 127,998 pyknons covering 898,424,004 DNA nucleotide positions on the forward and reverse strands of the human genome.
What is interesting here, of course, is that pyknons are associated with specific biologic processes - i.e., they are functional. Cross-genome comparisons reveal that many of the pyknons have instances in the 3' UTRs of genes from other vertebrates and invertebrates where they are overrepresented in similar biological processes, as in the human genome. This "unexpected finding" suggests, according to the authors, potential unique functional connections between the coding and noncoding parts of the human genome - such as a possible link with posttranscriptional gene silencing and RNA interference.
"Human pyknons are also present in other genomes, where they associate with similar biological processes. Notably, >600 million nucleotides that are associated with nongenic copies of pyknons in the human genome are absent from the mouse and rat genomes. Interestingly, the human pyknons have many instances in the intergenic and intronic regions of the phylogenetically distant worm and fruit fly genomes, covering ~1.6 million nucleotides in each."24
Given that genetic sequences that are transcribed or translated or both seem to account for the "majority" of the genome, and are thought to be functionally beneficial, it is interesting that certain types of genetic sequences that are neither translated nor transcribed are also being found to be functional. Taken together, it seems like the significant majority of the genome is indeed functional to at least some degree - well over 50% if not more like 85-90% or even higher?
The Key Human-Ape Differences
It is becoming more and more clear that the key functional differences between living things, like humans and apes, are not so much found in protein-coding genes, but in the non-coding regions of DNA once thought to be functionless "junk-DNA" - evolutionary remnants of past mistakes that are shared between various creatures. This notion is starting to be shed with more and more discoveries that show that many of these same regions are not just functional, they carry the vast majority of the genetic information. The "genes" that were once thought to be so important for genetic function are turning out to be equivalent to the most low-level basic building blocks within the genome, like bricks and motor. Surprisingly, it is the non-coding regions of DNA control what is done with these building blocks - that determine what kind of "house" to build so to speak. The following article is very interesting in this regard:
"Seventy-five percent of known human miRNAs [microRNAs] cloned in this study were conserved in vertebrates and mammals, 14% were conserved in invertebrates, 10% were primate specific and 1% are human specific. The new miRNAs have a different conservation distribution: more than half of the human miRNAs were conserved only in primates, about 30% in mammals and 9% in nonmammalian vertebrates or invertebrates; 8% were specific to humans. We saw a similar distribution for the chimpanzee miRNAs.
The different miRNA repertoire, as well as differences in expression levels of conserved miRNAs, may contribute to gene expression differences observed in human and chimpanzee brain . Although the physiological relevance of miRNAs expressed at low levels remains to be shown, it is tempting to speculate that a pool of such miRNAs may contribute to the diversity of developmental programs and cellular processes . . . For example, miRNAs recently have been implicated in synaptic development and in memory formation. As the species specific miRNAs described here are expressed in the brain, which is the most complex tissue in the human body, with an estimated 10,000 different cell types, these miRNAs could have a role in establishing or maintaining cellular diversity and could thereby contribute to the differences in human and chimpanzee brain ... function." 23
Pseudogenes are also being found to have similar functionality as miRNAs. "Transcripts of processed pseudogenes can contain regions with significant antisense homology, which may suggest a regulatory role for transcribed pseudogenes through an RNAi-like mechanism" (see Link ). Two recent studies have demonstrated that such transcribed pseudogenes can regulate transcription of homologous protein-coding genes. Transcription of a pseudogene in Lymnea stagnalis, that is homologous to the nitric oxide synthase gene, decreases the expression levels for the gene through formation of a RNA duplex; this is thought to arise via a reverse-complement sequence found at the 5′ end of the pseudogene transcript (Link). In a second example, transcription of the makorin1-p1 TPΨg in mouse was required for the stability of the mRNA from a homologous gene makorin1. This regulation was deduced to arise from an element in the 5′ areas of both the gene and the pseudogene (Link). More recently, Weil et al. discovered that the murine FGFR-3 pseudogene is transcribed in fetal tissues in an antisense direction. This prompted the following consideration:
As the regions of exact identity between FGFR-3 and its pseudogene can be up to 60 nt long, it may be envisioned that FGFR-3 transcripts could play a regulatory role in FGFR-3 expression. If these antisense transcripts could hybridize to sense FGFR-3 transcripts inside the cells, this may lead to either rapid degradation or inhibition of translation. (Link)
As Yao et. al., predict, "Further studies on transcribed pseudogenes will add to our understanding of their potential roles as non-coding RNA genes or other new types of functional elements." (Link) It seems like many transcribed pseudogenes may act as giant miRNAs to regulate the function of protein-coding genes and other genetic elements.
Other interesting differences include the fact that over 6% of the genes between humans and apes are unique to either humans or ape - i.e., they are not shared.
"Our results imply that humans and chimpanzees differ by at least 6% (1,418 of 22,000 genes) in their complement of genes, which stands in stark contrast to the oft-cited 1.5% difference between orthologous nucleotide sequences. This genomic “revolving door” of gene gain and loss represents a large number of genetic differences separating humans from our closest relatives." 55sup>
And, if one thinks a 6% difference is impressive, what about a difference of more than 30 percent? Impossible? Think again. A study published by Nature in early 2010 shows just such a difference between the Y-chromosomes of humans and apes. The Y-chromosome for chimps had never been completely sequenced and mapped directly before this study was performed. It showed many striking differences between human and chimp chromosome structure, gene content, and even qualitatively unique genes between the two species. As far as looking at specific genes, the chimp and human Y-chromosomes seem to have a dramatic difference in gene content of up to 53 percent. In other words, the chimp is lacking approximately half of the genes found on a human Y-chromosome. Because genes occur in families or similarity categories, the researchers also sought to determine if there was any difference in actual gene categories. They found a shocking 33 percent difference. The human Y-chromosome contains a third more gene categories, entirely different classes of genes, compared to chimps.58
Under evolutionary assumptions of long and gradual genetic changes, the Y-chromosome structures, layouts, genes, and other sequences should be much the same in both species, given only six million years or so since chimpanzees and humans supposedly diverged from a common ancestor. Instead, the differences between the Y-chromosomes are marked. R. Scott Hawley, a genetics researcher at the Stowers Institute in Kansas City, though not involved in the research, told the Associated Press, "That result is astounding." 59
Because virtually every structural aspect of the human and chimp Y-chromosomes is different, it is hard to arrive at an overall similarity estimate between the two. The researchers did postulate an overall 70 percent similarity, which did not take into account size differences or structural arrangement differences. This was done by concluding that only 70 percent of the chimp sequence could be aligned with the human sequence - not taking into account differences within the alignments.
In other words, 70 percent was a conservative estimate, especially when considering that 50 percent of the human genes were missing from the chimp, and that the regions that did have some similarity were located in completely different patterns. When all aspects of non-similarity (sequence categories, genes, gene families, and gene position) are taken into account, it is safe to say that the overall similarity is actually much lower than 70 percent. In fact, this difference is so striking that the authors of the Nature article described the discrepancy with the standard evolutionary model in a rather intriguing way:
Indeed, at 6 million years of separation, the difference in MSY gene content in chimpanzee and human is more comparable to the difference in autosomal gene content in chicken and human, at 310 million years of separation.58
Given the standard evolutionary model of origins, it is indeed rather stunning to consider that the human Y-chromosome looks just as different from a chimp as the other human chromosomes do from a chicken. How is this explained within the evolutionary mindset? Obviously, the believer in mainstream evolutionary models is now forced to invent more just-so stories of major chromosomal rearrangements and rapid generation of many new genes, along with vast amounts of regulatory DNA, within very short spans of evolutionary time.
However, since each respective Y-chromosome appears fully integrated and interdependently stable with its host organism, the most logical inference from the Y-chromosome data, without any prior commitment to the evolutionary story of origins, is that humans and chimpanzees were each specially created as distinct creatures, or evolved over a far far greater period of time...
Additional research carried out by scientists at the University of Oxford and the University of Chicago found that hotspot regions that determine the locations for genetic recombination during cellular meiosis in sexual reproduction showed "no overlap between humans and chimpanzees."62 This was an "extraordinarily unexpected finding"62 given the other similarities between humans and chimps. Professor McVean explains:
"Genetic recombination has been likened to shuffling a deck of cards, which ensures that children are given a different genetic 'hand' than their parents. We know that in many cases recombination occurs where a particular thirteen letter sequence is present -- this is like a run of hearts from ace to king determining where we cut the deck of cards. Because humans and chimpanzees are genetically very similar, we might explain that you can only 'cut the cards' at the same point -- in fact, we find that this is not true." 62
Additional information dealing with this most interesting topic is listed in an
by Wade Schauer (used with permission).
Additional information dealing with this most interesting topic is listed in an fairly extensive essay by Wade Schauer (used with permission).
Jose´ M. Cuevas, Santiago F. Elena and Andre´s Moya, Molecular Basis of Adaptive Convergence in Experimental Populations of RNA Viruses, Genetics, October 2002, 162: 533–542 ( Link ):
Our experiment dealt with the existence of evolutionary convergences. Evolutionary convergences constitute a very slippery topic, since a result of convergent evolution can always be seen as a cross-contamination by those critical of the existence of convergence. The only serious way to address evolutionary convergences is to (1) design and run experiments in such a way that physical or temporal coexistence of evolving lineages is minimized and (2) test whether the results can be explained by potential contaminations at different experimental steps. With our experiments, we took all possible precautions to minimize the risk of cross-contaminations and, in fact, a detailed phylogenetic analysis of our results supports the view that our results are better explained by evolutionary convergences than by a general contamination at different steps. . .
One of the most amazing features illustrated in Figure 1 is the large amount of evolutionary convergences observed among independent lineages. Twelve of the variable sites were shared by different lineages. More surprisingly, convergences also occurred within synonymous sites and intergenic regions. Evolutionary convergences during the adaptation of viral lineages under identical artificial environmental conditions have been described previously (Bull et al. 1997; Wichman et al. 1999; Fares et al. 2001). However, this phenomenon is observed not only in the laboratory. It is also a relatively widespread observation among human immunodeficiency virus (HIV)-1 clones isolated from patients treated with different antiviral drugs; parallel changes are frequent, often following a common order of appearance (Larder et al. 1991; Boucher et al. 1992; Kellam et al. 1994; Condra et al. 1996; Martinez-Picado et al. 2000). Subsequent substitutions may confer increasing levels of drug resistance or, alternatively, may compensate for deleterious pleiotropic effects of earlier mutations (Molla et al. 1996; Martinez-Picado et al. 1999; Nijhuis et al. 1999). Also, molecular convergences have been observed between chimeric simian-human immunodeficiency viruses (strain SHIV-vpu+) isolated from pig-tailed macaques, rhesus monkeys, and humans after either chronic infections or rapid virus passage (Hofmann-Lehmann et al. 2002).
Convergent evolution at the molecular level is not controversial as long as it can be reconciled with the neutralist and the selectionist theories. The neutral theory suggests that convergences are simply accidents, whereas within the framework of selectionism, there are two qualifications for convergences. The first explanation considers convergences as being adaptive and the result of organisms facing the same environment (as in the case of our experiments) with a few alternative pathways of adaptation (as expected for compacted genomes). Second, keeping in mind the model of clonal interference, beneficial mutations have to become fixed in an orderly way (Gerrish and Lenski 1998), with the best possible candidate fixed first, and then the second best candidate, and so on. This implies that, given a large enough population size to make clonal interference an important evolutionary factor, we should always expect the same mutations to be fixed.
The above argument is valid for nonsynonymous changes but an alternative explanation must be found for synonymous changes and for changes in the intergenic regions. Genomic RNA is involved in many RNA-RNA and RNA-protein interactions that affect viral replication. This is obvious for noncoding, regulatory regions (Stillman and Whitt 1997, 1998), but there is increasing evidence that capsid-coding regions in picornaviruses may also have an effect on viral replication (McKnight and Lemon 1998; Fares et al. 2001). Therefore, the RNA itself (apart from its protein-coding capacity) may contribute to the viral phenotype, and fitness may also be affected by synonymous replacements. Evidence for selection on synonymous sites has been inferred also in mammals (Eyre-Walker 1999), as a consequence of selection acting upon the base composition of isochors and large sections of junk DNA.
For the sake of illustration, it would be interesting to compare the number of selectively important sites in the VSV genome with those estimated for other genomes. For example, Fay et al (2001) reported that, in humans, the vast majority (80%) of amino acidic changes are deleterious to some extent and only a minor fraction are neutral. Among these deleterious amino acidic mutations, at least 20% are slightly deleterious. Here, we found that 15 amino acid sites changed, with only 5 being significantly advantageous. At this point, we can only speculate about the selective role of all the amino acid sites shown to be invariable in our study. The total number of amino acids in five genes of VSV is 3536. Assuming that changes in any of the 3536 - 15 - 3521 invariable amino acids would be deleterious (and thus washed out by purifying selection during or evolution experiment), then the fraction of amino acid replacements that are potentially harmful would be 3521/3536 = ~99.58%; the fraction of neutral sites would be 10/3536 = ~0.28%; whereas only 5/3536 = ~0.14% would be beneficial. Despite the differences between humans and VSV in genome size and organization and in the nature of the nucleic acid used, in both cases the fraction of potentially deleterious amino acid substitutions is overwhelmingly larger than that of neutral or beneficial ones. 28
Jacq C, Miller JR, Brownlee GG. A
pseudogene structure in 5S DNA of Xenopus laevis,
Cell 12:109-120. 1977.
Gibson L. J., Pseudogenes and
Origins, Origins 21(2):91-108. 1994.
Menotti R.M., Starmer W.T., Sullivan D.T.,
Characterization of the structure and evolution of the Adh region of
Genetics 127:355-366. 1991.
Lalley P.A., Davisson M.T., Graves J.A.M., O’Brien S.J., Womack J.E.,
Roderick T.H., Creau-Goldberg N., Hillyard A.L., Doolittle D.P., Rogers
J.A., Report of the committee on
comparative mapping, Cytogenetics and Cell Genetics 51:503-532. 1989.
Long M., Langley C.H., Natural
selection and the orgin of jingwei, a chimeric processed functional gene in
Science 260:91-95. 1993.
Jerlstrom, Pierre. 2000. Pseudogenes. Creation Ex Nihilo Technical
Journal 14 (no. 3):15.
Woodmorappe, John.2000. Are Pseudogenes 'Shared Mistakes' Between Primate
Genomes? Creation Ex Nihilo Technical Journal 14 (no. 3):58-71.
Abate, Tom. 2001. Genome Discovery Shocks Scientists. San Francisco
Chronicle (February 11).
Cantrell, Michael A. and others. 2001. An Ancient Retrovirus-like Element
Contains Hot Spots for SINE Insertion. Genetics 158:769-777.
Hillis, David M. 1999. SINEs of the perfect character. Proceedings of the National Academy of Sciences 96:9979-9981.
Max, Edwards. Plagiarized Errors and Molecular Genetics. Creation/Evolution (XIX, p.34) 1986-2003. ( http://www.talkorigins.org/faqs/molgen/ )
Lee, Jeannie T., Complicitiy of the gene and pseudogene, Nature 423:26-28. 2003
Hirotsun, Shinji et. al., An expressed pseudogene regulates the messenger-RNA stability of its homologous coding gene, Nature 423:91-96. 2003
Makalowski, Wojciech. 2003. Not Junk After All, Science 300:1246-1247
Balakirev, Evgeniy S., Ayala, Francisco J., PSEUDOGENES: Are They "Junk" or Functional DNA? Annual Review of Genetics, Vol. 37, pp. 123-151, December 2003 ( http://arjournals.annualreviews.org/doi/abs/10.1146%2Fannurev.genet.37.040103.103949 )
Wyatt Gibbs, The Unseen Genome: Gems among the Junk, Scientific American, November 2003, pp 45-53 ( Link )
ENCORE Project Consortium et al., Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project, Nature 447, 799-816 (14 June 2007); Richard Ingham, Landmark study prompts rethink of genetic code, Yahoo News, accessed June 15, 2007 (Link1, Link2)
Nishikimi, M. and Yagi, K. (1991) Molecular basis for the deficiency in humans of gulonolactone oxidase, a key enzyme for ascorbic acid biosynthesis. Am. J. Clin. Nutr. 54(6 Suppl):1203S-1208S.
Nishikimi, M., Fukuyama, R., Minoshima, S., Shimizu, N. and Yagi. K. (1994) Cloning and chromosomal mapping of the human nonfunctional gene for L-gulono-gamma-lactone oxidase, the enzyme for L-ascorbic acid biosynthesis missing in man. J. Biol. Chem. 269:13685-13688.
Ohta, Y. and Nishikimi, M. (1999) Random nucleotide substitutions in primate nonfunctional gene for L-gulono-gamma-lactone oxidase, the missing enzyme in L-ascorbic acid biosynthesis. Biochim. Biophys. Acta. 1472:408-411.
Inai, Y., Ohta. Y., and Nishikimi, M. (2003) The whole structure of the human nonfunctional L-gulono-gamma-lactone oxidase gene--the gene responsible for scurvy--and the evolution of repetitive sequences thereon. J Nutr Sci Vitaminol (Tokyo) 49:315-319.
Peter Borger, Shared mutations: Common descent or common mechanism?, The Independent Research Institute on Origins, Accessed 8/10/07 ( Link )
Eugene Berezikov, Fritz Thuemmler, Linda W van Laake, Ivanela Kondova, Ronald Bontrop4, Edwin Cuppen & Ronald H A Plasterk, "Diversity of microRNAs in human and chimpanzee brain", Nature Genetics, Vol 38 | Number 12 | December 2006 pp. 1375-1377. ( Link )
Richard Twyman, Small RNA: BIG NEWS, The Human Genome, January 2005 ( Link )
J. J. Bull, M. R. Badgett, H. A. Wichman, J. P. Huelsenbeck, D. M. Hillis, A. Gulati, C. Ho, and I. J. Molineux, Exceptional Convergent Evolution in a Virus, Genetics, 1997 December; 147(4): 1497–1507. ( Link )
Jose´ M. Cuevas, Santiago F. Elena and Andre´s Moya, Molecular Basis of Adaptive Convergence in Experimental Populations of RNA Viruses, Genetics, October 2002, 162: 533–542 ( Link )
H A Wichman, L A Scott, C D Yarber, and J J Bull, Experimental evolution recapitulates natural evolution, Philos Trans R Soc Lond B Biol Sci. 2000 November 29; 355(1403): 1677–1684. ( Link )
Troy E. Wood, John M. Burke and Loren H. Rieseberg, Parallel genotypic adaptation: when evolution repeats itself, Genetica, February 2005, Volume 123, Numbers 1-2, pp. 157-170 ( Link )
F. Flam, Hints of a language in junk DNA, Science 266:1320, 1994.
Haussler and Gill Bejerano, Junk DNA, May 6, 2004 online version of Science. ( Link )
Stanford University Medical Center (2007, April 24). 'Junk' DNA Now Looks Like Powerful Regulator, Scientists Find. ScienceDaily. ( Link )
Nóbrega MA, Zhu Y, Plajzer-Frick I, Afzal V, Rubin EM., Megabase deletions of gene deserts result in viable mice, Nature. 2004 Oct 21;431(7011):988-93. ( Link )
Katherine Dunlap, et at., Endogenous retroviruses regulate periimplantation placental growth and differentiation, PNAS September 26, 2006 vol. 103 no. 39 14390-14395 ( Link )
Lori F. Maxfield, Camilla D. Franze, John M. Coffin, Relationship between retroviral DNA-integration-site selection and host cell transcription, PNAS February 1, 2005 vol. 102 no. 5 1436-1441 ( Link )
Sverdlov, E. D. (2000) "Retroviruses and primate evolution." BioEssays 22: 161-171. ( Link )
Luis P. Villarreal, The Viruses That Make Us: A Role for Endogenous Retroviruses in the Evolution of Placental Species, UCI Online Faculty Publication ( Link: last accesses 3/10/09).
Polavarapu N, Bowen NJ, McDonald JF. Identification, characterization and comparative genomics of chimpanzee endogenous retroviruses. Genome Biol. 2006;7(6):R51.
Barbulescu M, Turner G, Su M, Kim R, Jensen-Seaman MI, Deinard AS, Kidd KK, Lenz J., A HERV-K provirus in chimpanzees, bonobos and gorillas, but not humans. Curr Biol. 2001 May 15;11(10):779-83.
Patric Jern, Göran O. Sperber, and Jonas Blomberg, Divergent Patterns of Recent Retroviral Integrations in the Human and Chimpanzee Genomes: Probable Transmissions between Other Primates and Chimpanzees, Journal of Virology, February 2006, p. 1367-1375, Vol. 80, No. 3 ( Link - last accessed 3/11/09 )
Chris T Yohn,#1 Zhaoshi Jiang,#2 Sean D McGrath,2 Karen E Hayden,1 Philipp Khaitovich,3 Matthew E Johnson,1,2 Marla Y Eichler,2 John D McPherson,4 Shaying Zhao,5 Svante Pääbo,3 and Evan E Eichler, Lineage-Specific Expansions of Retroviral Insertions within the Genomes of African Great Apes but Not Humans and Orangutans, PLoS Biol. 2005 April; 3(4): e110.
Sverdlov, E. D. (2000) “Retroviruses and primate evolution.” BioEssays 22:161-171. ( Link )
Andrew B. Conley, Jittima Piriyapongsa and I. King Jordan, "Retroviral promoters in the human genome," Bioinformatics, Vol. 24(14):1563–1567 (2008) ( Link )
Richard Sternberg, "On the Roles of Repetitive DNA Elements in the Context of a Unified Genomic–Epigenetic System," Annals of the New York Academy of Sciences, Vol. 981: 154–88 (2002)
University of California, Santa Cruz, Ancient retroviruses spurred evolution of gene regulatory networks in humans and other primates, Online News Article, Physorg.com, November 14th, 2007 ( Link - last accessed 3/13/09 )
Alex D Greenwood, Claudia C Englbrecht, and Ross DE MacPhee, Characterization of an endogenous retrovirus class in elephants and their relatives, BMC Evol Biol. 2004; 4: 38. ( Link )
Manuela Mura, et al., Late viral interference induced by transdominant Gag of an endogenous retrovirus, PNAS, July 27, 2004 vol. 101 no. 30 ( Link )
Norbert Bannert and Reinhard Kurth, Retroelements and the human genome: New perspectives on an old relation, PNAS October 5, 2004, vol. 101 no. suppl 2 ( Link )
Jennifer F. Hughes and John M. Coffin, Human endogenous retrovirus K solo-LTR formation and insertional polymorphisms: Implications for human and viral evolution, PNAS February 10, 2004 vol. 101 no. 6 ( Link )
IJdo JW, Baldini A, Ward DC, Reeders ST, Wells RA, Origin of human chromosome 2: an ancestral telomere-telomere fusion. Proc Natl Acad Sci U S A 1991 Oct 15;88(20):9051-5; ( Link )
Douglas L. T. Rohde, Steve Olson & Joseph T. Chang, Modelling the recent common ancestry of all living humans, Nature 431, 562-566 (30 September 2004) | doi:10.1038/nature02842; Received 30 December 2003; Accepted 14 July 2004
Demuth JP, Bie TD, Stajich JE, Cristianini N, Hahn MW (2006) The Evolution of Mammalian Gene Families. PLoS ONE 1(1): e85. doi:10.1371/journal.pone.0000085 ( Link )
Li et al. Extensive, Recent Intron Gains in Daphnia Populations. Science, 2009; 326 (5957): 1260 DOI: 10.1126/science.1179302 | See also: Indiana University (2009, December 14). Introns -- nonsense DNA -- may be more important to evolution of genomes than thought. ScienceDaily. Retrieved December 15, 2009, ( Link ).
Erika Check Hayden, Human genome at ten: Life is complicated, Nature 464, 664-667, Published online 31 March 2010 ( Link )
Hughes, J.F. et al. 2010. Chimpanzee and human Y chromosomes are remarkably divergent in structure gene content. Nature. 463 (7280): 536-539. ( Link ).
Borenstein, S. Men More Evolved? Y Chromosome Study Stirs Debate. Associated Press, January 13, 2010. ( Link )
Farr M, Pons M, Bosch M. 2009. "Interstitial telomeric sequences (ITSs) are not located at the exact evolutionary breakpoints in primates," Cytogenetic and Genome Research 124(2): 128-131.
Richard Sternberg, Guy Walks Into a Bar and Thinks He's a Chimpanzee: The Unbearable Lightness of Chimp-Human Genome Similarity, Evolution News and Views, May 14, 2009 ( Link )
Wellcome Trust. "Scientists map hotspots for genetic exchange in chimpanzees." ScienceDaily, 15 Mar. 2012. Web. 9 Apr. 2012. ( Link ); see also: A. Auton, A. Fledel-Alon, S. Pfeifer, O. Venn, L. Segurel, T. Street, E. M. Leffler, R. Bowden, I. Aneas, J. Broxholme, P. Humburg, Z. Iqbal, G. Lunter, J. Maller, R. D. Hernandez, C. Melton, A. Venkat, M. A. Nobrega, R. Bontrop, S. Myers, P. Donnelly, M. Przeworski, G. McVean. A Fine-Scale Chimpanzee Genetic Map from Population Sequencing. Science, 2012; DOI: 10.1126/science.1216872
The Chimpanzee Sequencing and Analysis Consortium, Initial sequence of the chimpanzee genome and comparison with the human genome, Nature 437, 69-87 (1 September 2005) (Link)
Andrew B. Conley, Jittima Piriyapongsa and I. King Jordan, "Retroviral promoters in the human genome," Bioinformatics, Vol. 24(14):1563--1567 (2008).
. Home Page . Truth, the Scientific Method, and Evolution
. Maquiziliducks - The Language of Evolution . Defining Evolution
Evolving the Irreducible
. DNA Mutation Rates . Donkeys, Horses, Mules and Evolution
. Amino Acid Racemization Dating . The Steppingstone Problem
. Harlen Bretz . Milankovitch Cycles
. Kenneth Miller's Best Arguments
Since June 1, 2002