evolution within the universal genome

Diagram from: Peter Ward & Don Brownlee, University of Washington [likely based on rRNA]

1. The phylogenetic problem

2. The solution

3. The proof

4. The data

5. The future

The phylogenetic problem

That all life on earth is related by descent from a common ancestor (LUCA) is not disputed, it is the nature of the relationship between extant taxa that remains, in some cases, in dispute.

1. mammalian radiation [100 mya]

The difficulty, as in any radiation, is that the period of time in question is short, and hence the underlying change to the genome is small, and this is followed by a relatively long period of time. This is a signal to noise problem.


"This is true of all thirty-two orders of mammals...The earliest and most primitive known members of every order already have the basic ordinal characters, and in no case is an approximately continuous sequence from one order to another known. In most cases the break is so sharp and the gap so large that the origin of the order is speculative and much disputed...

This regular absence of transitional forms is not confined to mammals, but is an almost universal phenomenon, as has long been noted by paleontologists. It is true of almost all classes of animals, both vertebrate and invertebrate...it is true of the classes, and of the major animal phyla, and it is apparently also true of analogous categories of plants.

- George Gaylord Simpson (1944) Tempo and Mode in Evolution.

2. Cambrian explosion (565-525 mya)

Development of nearly all modern metazoan phyla in a radiation of ~25 million years

Tree of life, bilateria

3. protozoa

These lineages have diverged for over a billion years

They have reduced genomes, and so genomic structure is not generally conserved. [structural entropy is high]

Lateral gene transfer [?]

The nature of characters

Phylogenetic inference is the study of the evolutionary relationship between organisms. The process of phylogenetic inference is the use of sets of characters common to some but not all organisms being study to infer that those organisms possessing the common characters are more commonly related.

homoplasy: common characters, not resulting from descent from a common ancestor.

Saturation:

the expected frequency of base identity in two random sequences = 25%

before you reach saturation the sequence is no longer reliably 'alignable'

rate of evolution (molecular clock) is variable in different lineages

characters: DNA and protein within coding regions

non-synonymous 0.67 x 10E-9

four-fold degenerate substitution rate 3.33 x 10E-9 substitutions/nt/yr

intron 3.2 x 10E-9

UTR 2.1 x 10E-9

mitoc 5.7 x 10E-8

Graur and Li (2000)

The stronger the functional constraints on a macromolecule, the slower the rate of evolution.

Kimura (1983)

Long Branch attraction.

protein characters degenerate over time in accord with the functional entropy of the protein sequence

the result is that at large evolutionary distances the sequences will be like at regions of low entropy and unalike at regions of high entropy, and this will result in them being grouped together, but with low support. This might be seen as 'functional saturation'.

example

sumo

rub1

EMBL-EBI clustalw server

genomic structure as characters

1. low entropy

2. high number of possible states

3. model for intron loss and acquisition

4. neutral character, unselected, so rate will not depend on maintenance of function

example, sumo alignment (fasta file)

apg12

1. The presence of introns at common locations suggests two possible hypotheses to explain the correlation

a. insertional bias at common locations, and so introns are lost and regained at these common sites

b. early origin and maintenance within regions of high constraint.

based on the lack of non-intron containing gene duplications in examined pathways, there is not a gDNA > mRNA >gDNA replacement pathway. The genomic structure is long lived.

furthermore, this even more strongly argues against xenologous gene transfer between species within nuclear genes.

Current and future work

Comparative RNA and protein expression within paralogous genes in marsupials and mammals