THE ORGANIZATION AND CONTROL OF EUKARYOTIC GENOMES

Gene expression in eukaryotes has two main differences from the same process in prokaryotes.

The typical multicellular eukaryotic genome is much larger than that of a bacterium.
Cell specialization limits the expression of many genes to specific cells.
The estimated 35,000 genes in the human genome includes an enormous amount of DNA that does not program the synthesis of RNA or protein.

Chromatin structure

Eukaryotic DNA is precisely combined with large amounts of protein.
During interphase, chromatin fibers are highly extended.
If extended, each DNA molecule would be about 6 cm long.

DNA packing Fig 19.1

First level - Histone proteins
Their positively charged amino acids bind tightly to negatively charged DNA.
The five types of histones are very similar from one eukaryote to another and are even present in bacteria.
Unfolded chromatin has the appearance of beads on a string, a nucleosome, in which DNA winds around a core of histone proteins.
The beaded string seems to remain essentially intact throughout the cell cycle.
Histones leave the DNA only transiently during DNA replication.
They stay with the DNA during transcription.
By changing shape and position, nucleosomes allow RNA-synthesizing polymerases to move along the DNA.

Level two - As chromosomes enter mitosis the beaded string coils to form the 30-nm chromatin fiber.
 
Level three - This fiber forms looped domains attached to a scaffold of nonhistone proteins.
 
Level four - the looped domains coil and fold to produce the characteristic metaphase chromosome.
Interphase chromatin is generally much less condensed than the chromatin of mitosis with the 30-nm fibers and looped domains remaining intact.
The chromatin of each chromosome occupies a restricted area within the interphase nucleus.
Interphase chromosomes have areas that remain highly condensed, heterochromatin, and less compacted areas, euchromatin.

Genome Organization at the DNA Level

In eukaryotes, most of the DNA (about 97% in humans) does not code for protein or RNA.
     1. noncoding regions are regulatory sequences.
     2. introns.
     3. repetitive DNA, present in many copies in the genome.
 
In mammals about 10 -15% of the genome is tandemly repetitive DNA, or satellite DNA.
     These differ in density from other regions, so they form a separate band after differential ultracentrifugation.
     There are three types of satellite DNA, differentiated by the total length of DNA at each site. Table 19.1.
 
Some genetic disorders are caused by abnormally long stretches of tandemly repeated nucleotide triplets within the affected gene.
     Fragile X syndrome is caused by hundreds to thousands of repeats of CGG in the fragile X gene.
     Huntington's disease occurs due to repeats of CAG that are translated into a proteins with a long string of glutamines.
     The severity of the disease and the age of onset of these diseases are correlated with the number of repeats.
 
About 25-40% of most mammalian genomes consists of interspersed repetitive DNA.
     Appear at multiple sites in the genome.
     Are similar but usually not identical to each other.

Gene families

While most genes are present as a single copy per haploid set of chromosomes, multigene families exist as a collection of identical or very similar genes.
These likely evolved from a single ancestral gene.
The members of multigene families may be clustered or dispersed in the genome.
 
Identical genes are multigene families that are clustered tandemly. Fig 19.2.
Usually consist of the genes for RNA products or those for histone proteins.
The three largest rRNA molecules are encoded in a single transcription unit that is repeated tandemly hundreds to thousands of times.
This transcript is cleaved to yield three rRNA molecules that combine with proteins and one other kind of rRNA to form ribosomal subunits.
 
Nonidentical genes
Two related families of globin genes, a(alpha) and ß (beta), of hemoglobin, which are located on different chromosomes. Fig 19.3.
The different versions of each globin subunit are expressed at different times in development.
Within both families are sequences that are expressed during the embryonic, fetal, and/or adult stage of development.
The embryonic and fetal hemoglobins have higher affinity for oxygen than do adult forms, ensuring transfer of oxygen from mother to developing fetus.
 
The differences in genes arise from mutations that accumulate in the gene copies over generations.
These mutations may even lead to enough changes to form pseudogenes, DNA segments that have sequences similar to real genes but that do not yield functional proteins.

Gene amplification, loss, or rearrangement

The nucleotide sequence of an organism's genome may be altered in a systematic way during its lifetime.
     Does not affect gametes
     Their effects are confined to particular cells and tissues.
 
In gene amplification, certain genes are replicated as a way to increase expression of these genes.
In amphibians, the genes for rRNA not only have a normal complement of multiple copies but millions of additional copies are synthesized in a developing ovum.
This assists the cell in producing enormous numbers of ribosomes for protein synthesis after fertilization.
 
In some insect cells, whole or parts of chromosomes are lost early in development.
 
Rearrangement of the loci of genes in somatic cells may have a powerful effect on gene expression.
     Transposons are genes that can move from one location to another within the genome.
     10% of the human genome are transposons.
If one "jumps" into a coding sequence of another gene, it can prevent normal gene function.
If the transposon is inserted in a regulatory area, it may increase or decrease transcription.
 
Most transposons are retrotransposons (Fig 19.5), in which the transcribed RNA includes the code for an enzyme that catalyzes the insertion of the retrotransposon and may include a gene for reverse transcriptase.
Reverse transcriptase uses the RNA molecule originally transcribed from the retrotransposon as a templete to synthesize a double stranded DNA copy.
This can populate the eukaryotic genome with multiple copies of its sequence.
 
Major rearrangements of at least one set of genes occur during immune system differentiation.
B lymphocytes produce immunoglobins, or antibodies, that specifically recognize and combat viruses, bacteria, and other invaders. Fig 19.6.
Each differentiated cell produces one specific type of antibody that attacks a specific invader.
Functional antibody genes are pieced together from physically separated DNA regions.
Each immunoglobin consists of four polypeptide chains, each with a constant region and a variable region, giving each antibody its unique function.
As a B lymphocyte differentiates, one of several hundred possible variable segments is connected to the constant section by deleting the intervening DNA.
The random combinations of different variable and constant regions create an enormous variety of different polypeptides, which combine with others to form complete antibody molecules.
As a result, the mature immune system can make millions of different kinds of antibodies from millions of subpopulations of B lymphocytes.

The Control of Gene Expression

Each cell expresses only a small fraction of its genes
Are continually turned on and off in response to signals from their internal and external environments.
Gene expression must be controlled on a long-term basis during cellular differentiation.
Highly specialized cells express only a tiny fraction of their genes.
Problems with gene expression and control can lead to imbalance and diseases, including cancers.

The control of gene expression can occur at any step in the pathway from gene to functional protein. Fig 19.7
These levels of control include chromatin packing, transcription, RNA processing, translation, and various alterations to the protein product.

Chromatin packing modifications

Genes of densely condensed heterochromatin are usually not expressed.

Chemical modifications of chromatin play a key role in chromatin structure and transcription regulation.
 
DNA methylation
Inactive DNA is generally highly methylated compared to DNA that is actively transcribed.
For example, the inactivated mammalian X chromosome in females is heavily methylated.
Methylation enzymes correctly methylate the daughter strands.
This accounts for genomic imprinting in which methylation turns off either the maternal or paternal alleles.
 
Histone acetylation and deacetylation appear to play a direct role in the regulation of gene transcription.
Acetylated histones grip DNA less tightly, providing easier access for transcription proteins in this region.
Some of the enzymes responsible for acetylation or deacetylation are associated with or are components of transcription factors that bind to promotors.
DNA methylation and histone deacetylation may cooperate to repress transcription.

Initiation of transcription is the most important and universally used control point in gene expression.

Control elements are noncoding DNA segments that regulate transcription by binding transcription factors. Fig 19.8
Eukaryotic RNA polymerase is dependent on transcription factors before transcription begins.
One transcription factor recognizes the TATA box.

Distal control elements, enhancers, may be thousands of nucleotides away from the promoter or even downstream of the gene or within an intron. Fig 19.9.
Bending of DNA enables transcription factors, activators, bound to enhancers to contact the protein initiation complex at the promoter.

Eukaryotic genes also have repressor proteins that bind to DNA control elements called silencers.
Repression may operate mostly at the level of chromatin modification.

Each protein generally has a DNA-binding domain that binds to DNA and a protein-binding domain that recognizes other transcription factors.

Genes coding for the enzymes of a metabolic pathway may be scattered over different chromosomes.
Coordinate gene expression depends on the association of a specific control element or collection of control elements with every gene of a dispersed group.
A common group of transcription factors bind to them, promoting simultaneous gene transcription.

Post-transcriptional mechanisms
Gene expression may be blocked or stimulated by any post-transcriptional step.

In alternative RNA splicing, different mRNA molecules are produced from the same primary transcript, depending on which RNA segments are treated as exons and which as introns. Fig 19.11. Movie!
 
Regulation of mRNA degradation.
    Prokaryotic mRNA molecules may be degraded after only a few minutes.
    Eukaryotic mRNAs typically endure for hours and can even last days or weeks.
    For example, in red blood cells the mRNAs for the hemoglobin polypeptides are unusually stable and are translated repeatedly in these cells.
A common pathway of mRNA breakdown begins with enzymatic shortening of the poly(A) tail.
This triggers the enzymatic removal of the 5' cap.
This is followed by rapid degradation of the mRNA by nucleases.

Control of translation
Translation of specific mRNAs can be blocked by regulatory proteins that bind to specific sequences or structures within the 5' leader region of mRNA. Movie!
This prevents attachment to ribosomes.
Protein factors required to initiate translation in eukaryotes offer targets for simultaneously controlling translation of all the mRNA in a cell.
This allows the cell to shut down translation if environmental conditions are poor
 
Eukaryotic polypeptides must often be processed to yield functional proteins. Movie!
Regulation may occur at cleavage, chemical modifications, and transport to the appropriate destination.
For example, cystic fibrosis results from mutations in the genes for a chloride ion channel protein that prevents it from reaching the plasma membrane.
The defective protein is rapidly degraded.
The cell limits the lifetimes of normal proteins by selective degradation.

Proteins intended for degradation are marked by the attachment of ubiquitin proteins. Fig 19.12.
Giant proteosomes recognize the ubiquitin and degrade the tagged protein.

The Molecular Biology of Cancer

Cancer is a disease in which cells escape from the control methods that normally regulate cell growth and division.
Changes can be random spontaneous mutations or environmental influences such as chemical carcinogens or physical mutagens.
Cancer-causing genes, oncogenes, are products of proto-oncogenes, that code for proteins that stimulate normal cell growth and division and have essential functions in normal cells. Fig 19.13.
An oncogene arises from a genetic change that leads to an increase in the proto-oncogene's protein or the activity of each protein molecule.
 
These genetic changes include movements of DNA within the genome, amplification of proto-oncogenes, and point mutations in the gene.
 
Malignant cells frequently have chromosomes that have been broken and rejoined incorrectly.
This may translocate a fragment to a location near an active promotor or other control element.
 
Amplification increases the number of gene copies.
 
A point mutation may lead to translation of a protein that is more active or longer-lived.
Mutations to genes whose normal products inhibit cell division, tumor-suppressor genes, also contribute to cancer.
Some tumor-suppressor proteins normally repair damaged DNA.
Others control the adhesion of cells to each other or to an extracellular matrix, crucial for normal tissues.
Still others are components of cell-signaling pathways that inhibit the cell cycle.

Oncogene proteins and faulty tumor-suppressor proteins interfere with normal signaling pathways. Fig 19.14.

Mutations in the products of two key genes, the ras proto-oncogene, and the p53 tumor suppressor gene occur in 30% and 50% of human cancers respectively.
Both are components of signal-transduction pathways that convey external signals to the DNA.
 
Ras, the product of the ras gene, is a G protein that provides the synthesis of a protein that stimulates the cell cycle.
Many ras oncogenes have a point mutation that leads to a hyperactive version of the Ras protein that can issue signals on its own, resulting in excessive cell division.
 
The tumor-suppressor protein encoded by the normal p53 gene is a transcription factor that promotes synthesis of growth-inhibiting proteins.
A mutation that knocks out the p53 gene can lead to excessive cell growth and cancer.

The p53 gene is often called the "guardian angel of the genome".
Damage to the cell's DNA leads to expression of the p53 gene.
The p53 protein can:
    activate the p21 gene, which halts the cell cycle.
    turn on genes involved in DNA repair.
    activate "suicide genes" whose protein products cause cell death.

Multiple mutations underlie the development of cancer

If cancer results from an accumulation of mutations, and if mutations occur throughout life, then the longer we live, the more likely we are to develop cancer.

 
Colorectal cancer (Fig 19.15), with 135,000 new cases in the U.S. each year, illustrates a multi-step cancer path.
The first sign is often a polyp, a small benign growth in the colon lining with fast dividing cells.
Through gradual accumulation of mutations that activate oncogenes and knock out tumor-suppressor genes, the polyp can develop into a malignant tumor.
 
About a half dozen DNA changes must occur for a cell to become fully cancerous.
These usually include the appearance of at least one active oncogene and the mutation or loss of several tumor-suppressor genes.

Viruses, especially retroviruses, play a role is about 15% of human cancer cases worldwide.
These include some types of leukemia, liver cancer, and cancer of the cervix.
Viruses promote cancer development by integrating their DNA into that of infected cells.
By this process, a retrovirus may donate an oncogene to the cell.
Alternatively, insertion of viral DNA may disrupt a tumor-suppressor gene or convert a proto-oncogene to an oncogene.
 
The fact that multiple genetic changes are required to produce a cancer cell helps explain the predispositions to cancer that run in some families.
An individual inheriting an oncogene or a mutant allele of a tumor-suppressor gene will be one step closer to accumulating the necessary mutations for cancer to develop.
    About 15% of colorectal cancers
    Between 5-10% of breast cancer cases