What does the latest DNA data say about evolution?


In the past few years, modern genome sequencing and computer technology have placed enormous volumes of DNA data at the fingertips of researchers worldwide. The first complete human genome sequence was completed in 2000, after a ten-year effort that cost over USD$500 million. But genome sequencing technology is advancing very rapidly — human genomes can now be sequenced for roughly $100,000, and some groups are targeting a price as low as $1,000 [Pollack2008]. This same sequencing technology has enabled biologists to study the genomes of thousands of other biological species, including many common (and not-so-common) plants and animals. This has resulted in an enormous repository of data available for the study of evolution at the most basic level.

Amino acid data

One example of DNA-type data is the table below, which compares the 146-unit amino acid sequences of beta globin (a component of hemoglobin) among various species of animals. Amino acids are coded directly by triplets of DNA letters, and thus the study of amino acid sequences is very close to the study of DNA sequences themselves. Note that human beta globin is identical to that of chimpanzees, differs in only one location from that of gorillas, yet is increasingly distinct from that in red foxes, polar bears, horses, rats, chicken and salmon. Anyone can generate similar data using online tools and databases [Evolution2009]:

Percent Agreement between Beta Globin of Various Species
Species Human Chimp Gorilla Red fox Dog Polar bear Horse Rat Chicken Salmon
Human 100. 100. 99.3 91.1 89.7 89.7 83.6 81.5 69.2 49.7
Chimp 100. 100. 99.3 91.1 89.7 89.7 83.6 81.5 69.2 49.7
Gorilla 99.3 99.3 100. 91.8 90.4 90.4 82.9 80.8 68.5 49.0
Red fox 91.1 91.1 91.8 100. 98.6 95.2 80.8 80.1 72.6 49.7
Dog 89.7 89.7 90.4 98.6 100. 94.5 80.1 79.5 71.2 49.0
Polar bear 89.7 89.7 90.4 95.2 94.5 100. 80.8 82.9 71.9 48.3
Horse 83.6 83.6 82.9 80.8 80.1 80.8 100. 76.0 67.8 46.3
Rat 81.5 81.5 80.8 80.1 79.5 82.9 76.0 100. 65.8 49.7
Chicken 69.2 69.2 68.5 72.6 71.2 71.9 67.8 65.8 100. 54.4
Salmon 49.7 49.7 49.0 49.7 49.0 48.3 46.3 49.7 54.4 100.


The picture is the same if we consider the pattern of mutations between closely related species. One particularly interesting example that has recently been uncovered is the “GULO” gene, which is an essential part of the machinery that makes Vitamin C in most animals. Humans lack a functioning copy of this gene — our copy is highly mutated fragment, classified as a relic gene or pseudogene. Scurvy, that scourge of British sailors and Mormon pioneers crossing the plains, occurs in humans when they do not get enough Vitamin C. Interestingly, although the GULO pseudogene is highly mutated and utterly useless, humans and chimpanzees have almost identical copies of it — the human and chimp versions are 98% identical. Evidently a common ancestor of humans and chimps adopted a diet rich in fruits and vegetables, and thus a chance mutation that disabled Vitamin C production was no longer a fatal one and was passed on to posterity [Fairbanks2007, pg. 53-55; Coyne2009, pg. 67-69].


Another recent development in this arena is the analysis of “transposons” or “jumping genes.” These are sections of DNA that have been randomly copied from one part of an organism’s genome to another. Most of the time, these inserted genes do no damage, because they “land” in relatively unimportant sections of DNA. But they do provide an excellent means to classify species into their phylogenetic (“family tree”) relationship. This is because it is exceedingly unlikely that the same random insertion of an entire gene would occur at the same spot in the genomes of two or more different organisms or species, unless, of course, each inherited this curious feature from a common ancestor, and it is also exceedingly unlikely that a group of species with “random” assortments of transposons could be organized into a family tree. Transposon data has been used, for instance, to classify a large number of vertebrate species into a “family tree,” with a result that is virtually identical to what biologists had earlier reckoned based only physical features and biological functions [Rogers2011, pg. 25-30].

Here is an example of how transposon data can be used to determine the phylogenetic relationships (i.e., “family tree”) of various primates including humans. The columns labeled ABCDE denote five blocks of transposons, and x and o respectively denote that the block is present or absent in the genome of the given species. It is clear from this data that our closest primate relatives are chimpanzees and bonobos [Rogers2011, pg. 89; Salem2003].

						Transposon blocks
			Species		A	B	C	D	E
        /---------	Human		o	x	x	x	x
       /----------	Bonobo		x	x	x	x	x
      / \---------	Chimp		x	x	x	x	x
     /------------	Gorilla		o	o	x	x	x
-----|------------	Orangutan	o	o	o	x	x
     \------------	Gibbon		o	o	o	o	o

Other areas of research

Another research arena that is exploding with activity is in analyzing DNA of groups of existing species, then employing advanced statistical methods (e.g., “maximum likelihood analysis”), running on powerful computer systems, to reconstruct the most likely family tree for a given set of organisms. Soon much of evolutionary history will be deducible purely from this type of automatic computer-based analysis. Already, significant results have been obtained in this area. In May 2010, a researcher announced, on the basis of a very carefully performed statistical analysis, that the hypothesis of a “universal common ancestor” (a conjecture, dating back to Charles Darwin, that all life arose from a single common ancestral organism) has been resoundingly confirmed. The author, Prof. Douglas L. Theobald of Brandeis University, found that the universal common ancestor hypothesis is at least 102860 times more likely to have produced the modern-day protein sequences that we observe in living organisms, compared to the next most probable scenario that involves multiple original ancestors [Harmon2010; Theobald2010].

Researchers are also combining analyses of DNA sequences with paleontological (fossil) data, resulting in more precise determinations of various branches in the tree of life. For example, a study published in November 2010 that combined both paleontological and molecular data established that divergence of humans and chimpanzees very likely took place eight million years in the past instead of five to six million years, as generally believed until recently [SD2010d; Wilkinson2010].


The explosion of genome sequences and DNA data banks in recent years has provided an enormous storehouse of data for biologists. Analyses of these data have dramatically confirmed the central tenets of evolution, including the common ancestry of all biological organisms, all arranged convincingly in a phylogenetic family tree, in most cases exactly as had been previously reckoned based solely on similarities of physical forms and biological functions. As anthropologist Alan R. Rogers recently noted, “Phylogenetic pattern is everywhere in nature. It makes sense only if all living things evolved from a single ancestor.” [Rogers2011, pg. 31]. Similarly, genetist Daniel J. Fairbanks emphasizes that [Fairbanks2007, pg. 170]:

[The] obvious hierarchical arrangement of life, and the literally millions of ancestral relics in our DNA — all undeniably attest to our common evolutionary origin with the rest of life. If someone can believe that all living organisms share the same creator, why not consider that all living organisms share a common genetic heritage?

[This was previously posted at SMR blog.]

Comments are closed.