Eastern Biotech & Life Sciences Dubai
ISO 9001:2008 You Tube
   Home-About Us-Subsidiaries-Services-Media-Online Enquiry- Contact Us
facebook tweeter Linkedin You Tube Contact us

Molecular Basis of Genetics

» Breast Cancer
» Lung Cancer
» Ovarian Cancer
» Prostate Cancer
» Thyroid Cancer
» Colorectal Cancer
» Lymphocytic Leukemia
» Pancreatic Cancer
» Stomach Cancer
» Cytogenetic Tests & Chromosomal Abnormalities
» Cervical Cancer
» DNA Test
» DNA Diagnostic Tests
» Genomic Profiling
» Personal Genome Scan
» Infectious Diseases
» Newborn Screening
» Pre marital Screening
» Prenatal Diagnosis
Genetics - Medicine
» Cell-to-Cell Interactions
» Genes in Embryonic Development
» Origin of Tumors
» Homeostasis
» Mammalian Sex Determination and Differentiation
» Atypical Inheritance Pattern
» Karyotype – Phenotype Correlation
» A Brief Guide to Genetic Diagnosis
» Chromosomal Location of Monogenic Diseases
Molecular Basis of Genetics
» The Cell and Its Components
» Some Types of Chemical Bonds
» Carbohydrates
» Lipids (Fats)
» Nucleotides and Nucleic Acids
» Amino Acids
» Proteins
» DNA as Carrier of Genetic Information
» DNA and Its Components
» DNA Structure
» Alternative DNA Structures
» DNA Replication
» The Flow of Genetic Information:
Transcription and Translation
» Genes and Mutation
» Genetic Code
» The Structure of Eukaryotic Genes
» DNA Sequencing
» Automated DNA Sequencing
» DNA Cloning
» cDNA Cloning
» DNA Libraries
» Restriction Analysis by Southern Blot Analysis
» Restriction Mapping
» DNA Amplification by Polymerase Chain Reaction (PCR)
» Changes in DNA
» Mutation Due to Different Base Modifications
» DNA Polymorphism
» Recombination
» Transposition
» Trinucleotide Repeat Expansion
» DNA Repair
» Xeroderma Pigmentosum
FoodPrint Test 200+
» Food Allergies and Intolerances for Consumers
Newborn Screening (NBS)
» Newborn Screening Fact Sheets

» The Cell and Its Components

» Some Types of Chemical Bonds

» Carbohydrates

» Lipids (Fats)

» Nucleotides and Nucleic Acids

» Amino Acids

» Proteins

» DNA as Carrier of Genetic Information

» DNA and Its Components

» DNA Structure

» Alternative DNA Structures

» DNA Replication

» The Flow of Genetic Information:
Transcription and Translation

» Genes and Mutation

» Genetic Code
» The Structure of Eukaryotic Genes

» DNA Sequencing

» Automated DNA Sequencing

» DNA Cloning

» cDNA Cloning

» DNA Libraries

» Restriction Analysis by Southern Blot Analysis

» Restriction Mapping

» DNA Amplification by Polymerase Chain Reaction (PCR)

» Changes in DNA

» Mutation Due to Different Base Modifications

» DNA Polymorphism

» Recombination

» Transposition

» Trinucleotide Repeat Expansion

» DNA Repair

» Xeroderma Pigmentosum

The Cell and Its Components

Cells are the smallest organized structural units able to maintain an individual, albeit limited, life span while carrying out a wide variety of functions. Cells have evolved on earth during the past 3.5 billion years, presumably orginating from suitable early molecular aggregations. Each cell originates from another living cell as postulated by R. Virchowin 1855 (“omnis cellula e cellula”). The livingworld consists of two basic types of cells: prokaryotic cells, which carry their functional information in a circular genome without a nucleus, and eukaryotic cells, which contain their genome in individual chromosomes in a nucleus and have a well-organized internal structure. Cells communicatewith each other by means of a broad repertoire of molecular signals. Great progress has been made since 1839, when cells were first recognized as the “elementary particles of organisms” by M. Schleiden and T. Schwann. Today we understand most of the biological processes of cells at the molecular level.

  • Eukaryotic cells

    A eukaryotic cell consists of cytoplasm and a nucleus. It is enclosed by a plasma membrane. The cytoplasm contains a complex system of inner membranes that form cellular structures (organelles). The main organelles are the mitochondria (in which important energy–delivering chemical reactions take place), the endoplasmic reticulum (consisting of a series of membranes in which glycoproteins and lipids are formed), the Golgi apparatus (for certain transport functions), and peroxisomes (for the formation or degradation of certain substances). Eukaryotic cells contain lysosomes, in which numerous proteins, nucleic acids, and lipids are broken down. Centrioles, small cylindrical particles made up of microtubules, play an essential role in cell division. Ribosomes are the sites of protein synthesis.

  • Nucleus of the Cell

    The eukaryotic cell nucleus contains the genetic information. It is enclosed by an inner and an outer membrane, which contain pores for the transport of substances between the nucleus and the cytoplasm. The nucleus contains a nucleolus and a fibrous matrix with different DNA–protein complexes.
  • Plasma membrane of the cell

    The environment of cells, whether blood or other body fluids, is water-based, and the chemical processes inside a cell involve watersoluble molecules. In order to maintain their integrity, cells must prevent water and other molecules from flowing in or out uncontrolled. This is accomplished by awater-resistant membrane composed of bipartite molecules of fatty acids, the plasma membrane. These molecules are phospholipids arranged in a double layer (bilayer) with a fatty interior. The plasmamembrane itself contains numerous molecules that traverse the lipid bilayer once or many times to perform special functions. Different types of membrane proteins can be distinguished: (i) transmembrane proteins used as channels for transport ofmolecules into or out of the cell, (ii) proteins connected with each other to provide stability, (iii) receptor molecules involved in signal transduction, and (iv) molecules with enzyme function to catalyze internal chemical reactions in response to an external signal. (Figure redrawn from Alberts et al., 1998.)

  • Comparison of animal and plant cells

    Plant and animal cells have many similar characteristics. One fundamental difference is that plant cells contain chloroplasts for photosynthesis. In addition, plant cells are surrounded by a rigid wall of cellulose and other polymeric molecules and contain vacuoles for water, ions, sugar, nitrogen–containing compounds, orwaste products. Vacuoles are permeable to water but not to the other substances enclosed in the vacuoles. (Figures in A, B and D adapted from de Duve, 1984.)

  • References

    Alberts, B. et al.: Essential Cell Biology. An Introduction
    to the Molecular Biology of the Cell.
    Garland Publishing Co., New York, 1998.
    de Duve, C.: A Guided Tour of the Living Cell. Vol.
    I and II. Scientific American Books, Inc., New
    York, 1984.
    Lodish, H. et al.: Molecular Cell Biology (with an
    animated CD-ROM). 4th ed.W.H. Freeman &
    Co., New York, 2000.
Some Types of Chemical Bonds
Close to 99% of theweight of a living cell is composed of just four elements: carbon (C), hydrogen (H), nitrogen (N), and oxygen (O). Almost 50% of the atoms are hydrogen atoms; about 25% are carbon, and 25% oxygen. Apart from water (about 70% of the weight of the cell) almost all components are carbon compounds. Carbon, a small atom with four electrons in its outer shell, can form four strong covalent bonds with other atoms. But most importantly, carbon atoms can combine with each other to build chains and rings, and thus large complex molecules with specific biological properties.
  • Compounds of hydrogen (H), oxygen (O), and carbon (C)

    Four simple combinations of these atoms occur frequently in biologically important molecules: hydroxyl (—OH; alcohols), methyl (—CH3), carboxyl (—COOH), and carbonyl (C=O; aldehydes and ketones) groups. They impart to the molecules characteristic chemical properties, including possibilities to form compounds.
  • Acids and esters

    Many biological substances contain a carbon– oxygen bond with weak acidic or basic (alkaline) properties. The degree of acidity is expressed by the pH value, which indicates the concentration of H+ ions in a solution, ranging from 10 –1 mol/L (pH 1, strongly acidic) to 10–14 mol/L (pH 14, strongly alkaline). Pure water contains 10–7 moles H+ per liter (pH 7.0). An ester is formed when an acid reacts with an alcohol. Esters are frequently found in lipids and phosphate compounds.

  • Carbon–nitrogen bonds (C—N)

    C—N bonds occur in many biologically important molecules: in amino groups, amines, and amides, especially in proteins. Of paramount significance are the amino acids (cf. p. 30), which are the subunits of proteins. All proteins have a specific role in the functioning of an organism.

  • References

    Alberts, B. et al.: Molecular Biology of the Cell.
    3rd ed. Garland Publishing Co., New York,
    Koolman, J., Röhm K.H.: Color Atlas of Biochemistry.
    Thieme, Stuttgart – New York, 1996.
    Stryer, L.: Biochemistry, 4th ed. W.H. Freeman &
    Co., New York, 1995.

Carbohydrates in their various chemical forms and their derivatives are an important group of biomolecules for genetics. They provide the basic framework of DNA and RNA. Their flexibility makes them especially suitable for transferring genetic information from cell to cell. Along with nucleic acids, lipids, and proteins, carbohydrates are one of the most important classes of biomolecules. Their main functions can be classified into three groups: (i) to deliver and store energy, (ii) to help form DNA and RNA, the information-carrying molecules (see pp. 34 and 38), (iii) to help form cell walls of bacteria and plants. Carbohydrates are often bound to proteins and lipids. As polysaccharides, carbohydrates are important structural elements of the cell walls of animals, bacteria, and plants. They form cell surface structures (receptors) used in conducting signals from cell to cell. Combined with numerous proteins and lipids, carbohydrates are important components of numerous cell structures. Finally, they function to transfer and store energy in intermediary metabolism.
  • Monosaccharides

    Monosaccharides (simple sugars) are aldehydes (—C=O, —H) or ketones (>C=O) with two or more hydroxy groups (general structural formula (CH2O)n). The aldehyde or ketone group can react with one of the hydroxy groups to form a ring. This is the usual configuration of sugars that have five or six C atoms (pentoses and hexoses). The C atoms are numbered. The D- and the L-forms of sugars are mirror-image isomers of the same molecule. The naturally occurring forms are the D-(dextro) forms. These further include !- and "- forms as stereoisomers. In the cyclic forms the C atoms of sugars are not on a plane, but three-dimensionally take the shape of a chair or a boat. The !-D-glucopyranose configuration (glucose) is the energetically favored, since all the axial positions are occupied byH atoms. The arrangement of the —OH groups can differ, so that stereoisomers such asmannose or galactose are formed.

  • Disaccharides

    These are compounds of two monosaccharides. The aldehyde or ketone group of one can bind to an "-hydroxy or a !-hydroxy group of the other. Sucrose and lactose are frequently occurring disaccharides.

  • Derivatives of sugars

    When certain hydroxy groups are replaced by other groups, sugar derivatives are formed. These occur especially in polysaccharides. In a large group of genetically determined syndromes, complex polysaccharides can not be degraded owing to reduced or absent enzyme function (mucopolysaccharidoses, mucolipidoses) (see p. 356).

  • Polysaccharides

    Short (oligosaccharides) and long chains of sugars and sugar derivatives (polysaccharides) form essential structural elements of the cell. Complex oligosaccharides with bonds to proteins or lipids are part of cell surface structures, e.g., blood group antigens.

  • References

    Gilbert-Barness, E., Barness, L.: Metabolic Diseases.
    Foundations of Clinical Management,
    Genetics, and Pathology. Eaton Publishing,
    Natick, MA 01760, USA, 2000.
    Scriver, C. R., Beaudet, A. L., Sly, W. S., Valle, D.,
    editors: The Metabolic and Molecular Bases
    of Inherited Disease. 8th ed., McGraw-Hill,
    New York, 2001.

Lipids (Fats)
Lipids usually occur as large molecules (macromolecules). They are essential components of membranes and precursors of other important biomolecules, such as steroids for the formation of hormones and other molecules for transmitting intercellular signals. In addition to fatty acids, compounds with carbohydrates (glycolipids), phosphate groups (phospholipids), and other molecules are especially important. A special characteristic is their pronounced polarity, with a hydrophilic (water-attracting) and a hydrophobic (water-repelling) region. This makes lipids especially suited for forming the outer limits of the cell (cell membrane).
  • Fatty acids

    Fatty acids are composed of a hydrocarbon chain with a terminal carboxylic acid group. Thus, they are polar, with a hydrophilic (—COOH) and a hydrophobic end (—CH3), and differ in the length of the chain and its degree of saturation. When one or more double bonds occur in the chain, the fatty acid is referred to as unsaturated. A double bond makes the chain relatively rigid and causes a kink. Fatty acids form the basic framework of many important macromolecules. The free carboxyl group (—COOH) of a fatty acid is ionized (—COO–).

  • Lipids

    Fatty acids can combine with other groups of molecules to form other types of lipids. As water-insoluble (hydrophobic) molecules, they are soluble only in organic solvents. The carboxyl group can enter into an ester or an amide bond. Triglycerides are compounds of fatty acids with glycerol. Glycolipids (lipids with sugar residues) and phospholipids (lipids with a phosphate group attached to an alcohol derivative) are the structural bases of important macromolecules. Their intracellular degradation requires the presence of numerous enzymes, disorders of which have a genetic basis and lead to numerous genetically determined diseases. Sphingolipids are an important group of molecules in biological membranes. Here, sphingosine, instead of glycerol, is the fatty acid-binding molecule. Sphingomyelin and gangliosides contain sphingosine. Gangliosides make up 6% of the central nervous system lipids. They are degraded by a series of enzymes. Genetically determined disorders of their catabolism lead to severe diseases, e.g., Tay–Sachs disease due to defective degradation of ganglioside GM2 (deficiency of !-N-acetylhexosaminidase).

  • Lipid aggregates

    Owing to their bipolar properties, fatty acids can form lipid aggregates in water. The hydrophilic ends are attracted to their aqueous surroundings; the hydrophobic ends protrude from the surface of thewater and form a surface film. If completely under the surface, they may form a micelle, compact and dry within. Phospholipids and glycolipids can form two-layered membranes (lipid membrane bilayer). These are the basic structural elements of cell membranes, which prevent molecules in the surrounding aqueous solution from invading the cell.

  • Other lipids: steroids

    Steroids are small molecules consisting of four different rings of carbon atoms. Cholesterol is the precursor of five major classes of steroid hormones: prostagens, glucocorticoids, mineralocorticoids, androgens, and estrogens. Each of these hormone classes is responsible for important biological functions such as maintenance of pregnancy, fat and protein metabolism, maintenance of blood volume and blood pressure, and development of sex characteristics.

Nucleotides and Nucleic Acids
Nucleotides participate in almost all biological processes. They are the subunits of DNA and RNA, the molecules that carry genetic information (see p. 34). Nucleotide derivatives are involved in the biosynthesis of numerous molecules; they convey energy, are part of essential coenzymes, and regulate numerous metabolic functions. Since all these functions are based on genetic information of the cells, nucleotides represent a central class of molecules for genetics. Nucleotides are composed of three integral parts: phosphates, sugars, and purine or pyramidine bases.
  • Phosphate groups

    Phosphate groups may occur alone (monophosphates), in twos (diphosphates) or in threes (triphosphates). They are normally bound to the hydroxy group of the C atom in position 5 of a five-C-atom sugar (pentose).

  • Sugar residues

    The sugar residues in nucleotides are usually derived from either ribose (in ribonucleic acid, RNA) or deoxyribose (in deoxyribonucleic acid, DNA) (ribonucleoside or deoxyribonucleoside). These are the base plus the respective sugar.

  • Nucleotide bases of pyrimidine

    Cytosine (C), thymine (T), and uracil (U) are the three pyrimidine nucleotide bases. They differ from each other in their side chains (—NH2 on C4 in cytosine, —CH3 on C5 in thymine, O on C4 in uracil) and in the presence or absence of a double bond between N3 and C4 (present in cytosine).

  • Nucleotide bases of purine

    Adenine (A) and guanine (G) are the two nucleotide bases of purine. They differ in their side chains and a double bond (between N1 and C6).

Amino Acids
Amino acids are the basic structural units of proteins. A defined linear sequence of the amino acids and a specific three-dimensional structure confer quite specific physicochemical properties to each protein. An amino acid consists of a “central” carbon with one bond to an amino group (—NH2) one to a carboxyl group (—COOH) one to a hydrogen atom, and the fourth to a variable side chain. Amino acids are ionized in neutral solutions, since the amino group takes on a proton (—NH3 +) and the carboxyl group dissociates (—COO–). The side chain determines the distinguishing characteristics of an amino acid, including the size, form, electrical charge or hydrogen-bonding ability, and the total specific chemical reactivity. Amino acids can be differentiated according to whether they are neutral or not neutral (basic or acidic) and whether they have a polar or nonpolar side chain. Each amino acid has its own three-letter and one-letter abbreviations. Essential amino acids in vertebrates are His, Ile, Leu, Lys, Met, Phe, Thr, Tyr, and Val.
  • Neutral amino acids, nonpolar side chains

    All neutral amino acids have a —COO– and an —NH3 + group. The simplest amino acids have a simple aliphatic side chain. For glycine this is merely a hydrogen atom (—H); for alanine it is a methyl group (—CH3). Larger side chains occur on valine, leucine, and isoleucine. These larger side chains are hydrophobic (water-repellent) and make their respective amino acids less water-soluble than do hydrophilic (waterattracting) chains. Proline has an aliphatic side chain that, unlike in other amino acids, is bound to both the central carbon and to the amino group, so that a ringlike structure is formed. Aromatic side chains occur in phenylalanine (a phenyl group bound via a methylene group (—CH2—) and tryptophan (an indol ring bound via a methylene group). These amino acids are very hydrophobic. Two amino acids contain sulfur (S) atoms. In cysteine this is in the form of a sulfhydryl group (—SH); in methionine it is a thioether (—S—CH3). Both are hydrophobic. The sulfhydryl group in cysteine is very reactive and participates in forming disulfide bonds (—S—S—). These play an important role in stabilizing the three-dimensional forms of proteins

  • Hydrophilic amino acids, polar side chains

    Serine, threonine, and tyrosine contain hydroxyl groups (—OH). Thus, they are hydrolyzed forms of glycine, alanine, and phenylalanine. The hydroxyl groups make them hydrophilic and more reactive than the nonhydrolyzed forms. Asparagine and glutamine both contain an amino and an amide group. At physiological pH their side chains are negatively charged.

  • Charged amino acids

    These amino acids have either two ionized amino groups (basic) or two carboxyl groups (acidic). Basic amino acids (positively charged) are arginine, lysine, and histidine. Histidine has an imidazole ring and can be uncharged or positively charged, depending on its surroundings. It is frequently found in the reactive centers of proteins, where it takes part in alternating bonds (e.g., in the oxygen-binding region of hemoglobin). Aspartic acid and glutamic acid each have two carboxyl groups (—COOH) and are thus (as a rule) acidic. Seven of the 20 amino acids have slightly ionizable side chains, making them highly reactive (Asn, Glu, His, Cys, Tyr, Lys, Arg).

Proteins are involved in practically all chemical processes in living organisms. Their universal significance is apparent in that, as enzymes, they drive chemical reactions in living cells. Without enzymatic catalysis, the macromolecules involved would not react spontaneously. All enzymes are the products of one or more genes. Proteins also serve to transport small molecules, ions, or metals. Proteins have important functions in cell division during growth and in cell and tissue differentiation. They control the coordination of movements by regulating muscle cells and the production and transmission of impulses within and between nerve cells. They control blood homeostasis (blood clotting) and immune defense. They carry out mechanical functions in skin, bone, blood vessels, and other areas.
  • Joining of amino acids

    The basic units of proteins, amino acids, can be joined together very easily owing to their dipolar ionization (zwitterions). The carboxyl group of one amino acid binds to the amino group of the next (a peptide bond, sometimes also referred to as an amide bond). When many amino acids are bound together by peptide bonds, they form a polypeptide chain. Each polypeptide chain has a defined direction, determined by the amino group (—NH2) at one end and the carboxyl group (—COOH) at the other. By convention, the amino group represents the beginning, and the carboxyl group the end of a peptide chain.

  • Primary structure of a protein

    sequence of insulin by Frederick Sanger in 1955 was a landmark accomplishment. It showed for the first time that a protein, in genetic terms a gene product, has a precisely defined amino acid sequence. The amino acid sequence yields important information about the function and evolutionary origin of a protein. The primary structure of a protein is its amino acid sequence in a one-dimensional plane. As are many other proteins, insulin is synthesized from precursor molecules: preproinsulin and proinsulin. Preproinsulin consists of 110 amino acids including 24 amino acids of a leader sequence at the amino end. The leader sequence directs the molecule to the correct site in the cell and is then removed to yield proinsulin. This is converted to insulin by removal of the connecting peptide (C peptide) consisting of amino acids 31–65. Amino acids 1–30 form the B chain; the remaining (66–86) amino acids form the A chain. The A and the B chains are connected by two disulfide bridges joining the cysteines in position 7 and position 20 of the A chain to those of positions 7 and 19, respectively, of the B chain. The A chain contains a disulfide bridge between positions 6 and 11. The positions of the cysteines reflect the spatial arrangements of the amino acids, called the secondary structure.
  • Secondary structural units, the ! helix and the " sheet

    Two basic units of global proteins are ! helix formation (! helix) and a flat sheet (" pleated sheet). Panel C shows a schematic drawing of a unit of one ! helix between two "-sheets, called a "!" unit (Figure redrawn from Stryer, 1995).

  • Tertiary structure of insulin

    All functional proteins assume a well-defined three-dimensional structure. This structure is defined by the sequence of amino acids and their physicochemical properties. Tertiary structure is defined by the spatial arrangement of amino acid residues that are far apart in the linear sequence. The quaternary structure is the folding of the protein resulting in a specific three-dimensional spatial arrangement of the subunits and the nature of their contacts. The correct quaternary structure ensures proper function. (Figure from Koolman & Röhm, 1996).

  • References

    Koolman, J., Röhm, K.-H.: Color Atlas of Biochemistry.
    Thieme, Stuttgart–New York,
    Stryer, L.: Biochemistry, 4th ed. W.H. Freeman &
    Co., New York, 1995.

DNA as Carrier of Genetic Information
Although DNA was discovered in 1869 by Friedrich Miescher as a new, acidic, phosphoruscontaining substance made up of very large molecules that he named “nuclein”, its biological rolewas not recognized. In 1889 Richard Altmann introduced the term “nucleic acid”. By 1900 the purine and pyrimidine bases were known. Twenty years later, the two kinds of nucleic acids, RNA and DNA, were distinguished. An incidental but precise observation (1928) and relevant investigations (1944) indicated that DNA could be the carrier of genetic information.
  • The observation of Griffith

    In 1928 the English microbiologist Fred Griffith made a remarkable observation. While investigating various strains of Pneumococcus, he determined that mice injected with strain S (smooth) died (1). On the other hand, animals injected with strain R (rough) lived (2). When he inactivated the lethal S strain by heat, there were no sequelae, and the animal survived (3). Surprisingly, a mixture of the nonlethal R strain and the heat-inactivated S strain had a lethal effect like the S strain (4). And he found normal living pneumococci of the S strain in the animal’s blood. Apparently, cells of the R strain were changed into cells of the S strain (transformed). For a time, this surprising result could not be explained and was met with skepticism. Its relevance for genetics was not apparent.

  • The transforming principle is DNA

    Griffith’s findings formed the basis for investigations by Avery, MacLeod, and McCarty (1944). Avery and co-workers at the Rockefeller Institute in New York elucidated the chemical basis of the transforming principle. From cultures of an S strain (1) they produced an extract of lysed cells (cell-free extract) (2). After all its proteins, lipids, and polysaccharides had been removed, the extract still retained the ability to transform pneumococci of the R strain to pneumococci of the S strain (transforming principle) (3). With further studies, Avery and co-workers determined that this was attributed to the DNA alone. Thus, the DNA must contain the corre corresponding genetic information. This explained Griffith’s observation. Heat inactivation had left the DNA of the bacterial chromosomes intact. The section of the chromosome with the gene responsible for capsule formation (S gene) could be released fromthe destroyed S cells and be taken up by some R cells in subsequent cultures. After the S gene was incorporated into its DNA, an R cellwas transformed into an S cell (4). Page 90 shows howbacteria can take up foreign DNA so that some of their genetic attributes will be altered correspondingly.

  • Genetic information is transmitted by DNA alone

    The final evidence that DNA, and no other molecule, transmits genetic information was provided by Hershey and Chase in 1952. They labeled the capsular protein of bacteriophages (see p. 88) with radioactive sulfur (35S) and the DNA with radioactive phosphorus (32P). When bacteria were infected with the labeled bacteriophage, only 32P (DNA) entered the cells, and not the 35S (capsular protein). The subsequent formation of new, complete phage particles in the cell proved that DNA was the exclusive carrier of the genetic information needed to form new phage particles, including their capsular protein. Next, the structure and function of DNA needed to be clarified. The genes of all cells and some viruses consist of DNA, a long-chained threadlike molecule.

  • References

    Avery, O.T.,MacLeod, C.M., McCarty,M.: Studies
    on the chemical nature of the substance inducing
    transformation of pneumococcal
    types. J. Exp. Med. 79:137–158, 1944.
    Griffith, F., The significance of pneumoccocal
    types. J. Hyg. 27:113–159, 1928.
    Hershey, A.D., Chase, M.: Independent functions
    of viral protein and nucleic acid in
    growth of bacteriophage. J. Gen. Physiol.
    36:39–56, 1952.
    Judson, M.F.: The Eighth Day of Creation.
    Makers of the Revolution in Biology. Expanded
    Edition. Cold Spring Harbor Laboratory
    Press, New York, 1996.
    McCarty, M.: The Transforming Principle. Discovering
    that Genes are made of DNA.W.W.
    Norton & Co., New York–London, 1985.

DNA and Its Components

The information for the development and specific functions of cells and tissues is stored in the genes. A gene is a portion of the genetic information, definable according to structure and function. Genes lie on chromosomes in the nuclei of cells. They consist of a complex longchained molecule, deoxyribonucleic acid (DNA). In the following, the constituents of the DNA molecule will be presented. DNA is a nucleic acid. Its chemical components are nucleotide bases, a sugar (deoxyribose), and phosphate groups. They determine the threedimensional structure of DNA, from which it derives its functional consequence.
  • Nucleotide bases

    The nucleotide bases in DNA are heterocyclic molecules derived from either pyrimidine or purine. Five bases occur in the two types of nucleic acids, DNA and RNA. The purine bases are adenine (A) and guanine (G). The pyrimidine bases are thymine (T) and cytosine (C) in DNA. In RNA, uracil (U) is present instead of thymine. The nucleotide bases are part of a subunit of DNA, the nucleotide. This consists of one of the four nucleotide bases, a sugar (deoxyribose), and a phosphate group. The nitrogen atom in position 9 of a purine or in position 1 of a pyrimidine is bound to the carbon in position 1 of the sugar (N-glycosidic bond). Ribonucleic acid (RNA) differs from DNA in two respects: it contains ribose instead of deoxyribose (unlike the latter, ribose has a hydroxyl group on the position 2 carbon atom) and uracil (U) instead of thymine. Uracil does not have a methyl group at position C5.

  • Nucleotide chain

    DNA is a polymer of deoxyribonucleotide units. The nucleotide chain is formed by joining a hydroxyl group on the sugar of one nucleotide to the phosphate group attached to the sugar of the next nucleotide. The sugars linked together by the phosphate groups form the invariant part of the DNA. The variable part is in the sequence of the nucleotide bases A, T, C, and G. A DNA nucleotide chain is polar. The polarity results from the way the sugars are attached to each other. The phosphate group at position C5 (the 5! carbon) of one sugar joins to the hydroxyl group at position C3 (the 3! carbon) of the next sugar by means of a phosphate diester bridge. Thus, one end of the chain has a 5! triphosphate group free and the other end has a 3! hydroxy group free (5! end and 3! end, respectively). By convention, the sequence of nucleotide bases is written in the 5! to 3! direction.

  • Spatial relationship

    The chemical structure of the nucleotide bases determines a defined spatial relationship. Within the double helix, a purine (adenine or guanine) always lies opposite a pyrimidine (thymine or cytostine). Three hydrogen-bond bridges are formed between cytosine and guanine, and two between thymine and adenine. Therefore, only guanine and cytosine or adenine and thymine can lie opposite and pair with each other (complementary base pairs G–C and A–T). Other spatial relationships are not usually possible.

  • DNA double strand

    DNA forms a double strand. As a result of the spatial relationships of the nucleotide bases, a cytosine will always lie opposite to a guanine and a thymine to an adenine. The sequence of the nucleotide bases on one strand of DNA (in the 5! to 3! direction) is complementary to the nucleotide base sequence (or simply the base sequence) of the other strand in the 3! to 5! direction. The specificity of base pairing is the most important structural characteristic of DNA.

DNA Structure
In 1953, JamesWatson and Francis Crick recognized that DNA must exist as a double helix. This structure explains both important functional aspects: replication and the transmission of genetic information. The elucidation of the structure of DNA is considered as the beginning of the development of modern genetics. With it, gene structure and function can be understood at the molecular level.
  • DNA as a double helix

    The double helix is the characteristic structural feature of DNA. The two helical polynucleotide chains are wound around each other along a common axis. The nucleotide base pairs (bp), either A–T or G–C, lie within. The diameter of the helix is 20 Å (2!10–7 mm). Neighboring bases lie 3.4 Å apart. The helical structure repeats itself at intervals of 34 Å, or every ten base pairs. Because of the fixed spatial relationship of the nucleotide bases within the double helix and opposite each other, the two chains of the double helix are exactly complementary. The form illustrated here is the so-called B form (BDNA). Under certain conditions, DNA can also assume other forms (Z-DNA, A-DNA, see p. 41).

  • Replication

    Since the nucleotide chains lying opposite each other within the double helix are strictly complementary, each can serve as a pattern (template) for the formation (replication) of a new chain when the helix is opened.DNA replication is semiconservative, i.e., one completely new strand will be formed and one strand retained.

  • Denaturation and renaturation

    The noncovalent hydrogen bonds between the nucleotide base pairs are weak. Nevertheless, DNA is stable at physiological temperatures because it is a very long molecule. The two complementary strands can be separated (denaturation) by means of relativelyweak chemical reagents (e.g., alkali, formamide, or urea) or by careful heating. The resulting single-stranded molecules are relatively stable. With cooling, complementary single strands can reunite to form double-stranded molecules (renaturation). Noncomplementary single strands do not unite. This is the basis of an important method of identifying nucleic acids: With a single strand of defined origin, it can be determined with which other single strand it will bind (hydridize). The hybridization of complementary segments of DNA is an important principle in the analysis of genes.

  • Transmission of genetic information

    Genetic information lies in the sequence of nucleotide base pairs (A–T or G–C). A sequence of three base pairs represents a codeword (codon) for an amino acid. The codon sequence determines a corresponding sequence of amino acids. These form a polypeptide (gene product). The sequence of the nucleotide bases is first transferred (transcription) from one DNA strand to a further information-bearing molecule (mRNA, messenger RNA). Then the nucleotide base sequence of themRNA serves as a template for a sequence of amino acids corresponding to the order of the codons (translation). A gene can be defined as a section of DNA responsible for the formation of a polypeptide (one gene, one polypeptide). One or more polypeptides form a protein. Thus, several genes may be involved in the formation of a protein.

  • References

    Crick, F.:What Mad Pursuit. A Personal View of
    Scientific Discovery. Basic Books, Inc., New
    York, 1988.
    Judson, H.F.: The Eighth Day Creation. Makers of
    the Revolution in Biology. Expanded Edition.
    Cold Spring Harbor Laboratory Press,
    New York, 1996.
    Stent, G.S. , ed.: The Double Helix.Weidenfeld &
    Nicolson, London, 1981.
    Watson, J.D.: The Double Helix. A Personal Account
    of the Structure of DNA. Atheneum,
    New York, 1968.
    Watson, J.D., Crick, F.H.C.: Molecular structure
    of nucleic acid. Nature 171:737–738, 1953.
    Watson, J.D., Crick, F.H.C.: Genetic implications
    of the structure of DNA. Nature 171:964–
    967, 1953.
    Wilkins, M.F.H., Stokes, A.R., Wilson, H.R.:
    Molecular structure of DNA. Nature
    171:738–740, 1953.

Alternative DNA Structures
Gene expression and transcription can be influenced by changes of DNA topology. However, this type of control of gene expression is relatively universal and nonspecific. Thus, it is more suitable for permanent suppression of transcription, e.g., in genes that are expressed only in certain tissues or are active only during the embroyonic period and later become permanently inactive.
  • Three forms of DNA

    The DNA double helix does not occur as a single structure, but rather represents a structural family of different types. The original classic form, determined byWatson and Crick in 1953, is B-DNA. The essential structural characteristic of B-DNA is the formation of two grooves, one large (major groove) and one small (minor groove). There are at least two further, alternative forms of the DNA double helix, Z-DNA and the rare form A-DNA. While B-DNA forms a right-handed helix, Z-DNA shows a left-handed conformation. This leads to a greater distance (0.77 nm) between the base pairs than in B-DNA and a zigzag form (thus the designation Z-DNA). A-DNA is rare. It exists only in the dehydrated state and differs from the B form by a 20-degree rotation of the perpendicular axis of the helix. A-DNA has a deep major groove and a flat minor groove (Figures fromWatson et al, 1987).

  • Major and minor grooves in B-DNA

    The base pairing in DNA (adenine–thymine and guanine–cytosine) leads to the formation of a large and a small groove because the glycosidic bonds to deoxyribose (dRib) are not diametrically opposed. In B-DNA, the purine and pyrimidine rings lie 0.34 nm apart. DNA has ten base pairs per turn of the double helix. The distance from one complete turn to the next is 3.4 nm. In this way, localized curves arise in the double helix. The result is a somewhat larger and a somewhat smaller groove.

  • Transition from B-DNA to Z-DNA

    B-DNA is a perfect regular double helix except that the base pairs opposite each other do not lie exactly at the same level. They are twisted in a propeller-like manner. In this way, DNA can easily be bent without causing essential changes in the local structures. In Z-DNA the sugar–phosphate skeleton has a zigzag pattern; the single Z-DNA groove has a greater density of negatively charged molecules. Z-DNA may occur in limited segments in vivo. A segment of B-DNA consisting of GC pairs can be converted into Z-DNA when the bases are rotated 180 degrees. Normally, Z-DNA is thermodynamically relatively unstable. However, transition to Z-DNA is facilitated when cytosine is methylated in position 5 (C5). The modification of DNA by methylation of cytosine is frequent in certain regions of DNA of eukaryotes. There are specific proteins that bind to Z-DNA, but their significance for the regulation of transcription is not clear.

  • References

    Stryer, L.: Biochemistry, 4th ed. W.H. Freeman &
    Co., New York, 1995.
    Watson, J.D. et al.: Molecular Biology of the
    Gene. 3 rd ed. Benjamin/Cummings Publishing
    Co., Menlo Park, California, 1987.

DNA Replication
DNA synthesis involves a highly coordinated action of many proteins. Precision and speed are required. The two new DNA chains are assembled at a rate of about 1000 nucleotides per second in E. coli. The principal enzymatic proteins are polymerases, which carry out template- directed synthesis; helicases, which separate the two strands to generate the replication fork (see D); primases, which initiate chain synthesis at preferred sites; initiation proteins, which recognize the origin of replication point; and proteins that remodel the double helix. The entire complex is called the replisome. In their paper elucidating the structure of DNA, Watson and Crick (1953) noted in closing, “It has not escaped our attention that this structure immediately suggests a copying mechanism for the genetic material,” at that time an unsolved problem. Although biochemically complex, DNA replication is genetically relatively simple. During replication, each strand of DNA serves as a template for the formation of a new strand (semiconservative replication).
  • Prokaryote replication begins at a single site

    In prokaryote cells, replication begins at a defined point in the ring-shaped bacterial chromosome, the origin of replication (1). From here, new DNA is formed at the same speed in both directions until the DNA has been completely duplicated and two chromosomes are formed. Replication can be visualized by autoradiography after the newly replicated DNA has incorporated tritium (3H)-labeled thymidine (2).

  • Eukaryote replication begins at several sites

    DNA synthesis occurs during a defined phase of the cell cycle (S phase). This would take a very long time if there were only one starting point. However, replication of eukaryotic DNA begins at numerous sites (replicons) (1). It proceeds in both directions from each replicon until neighboring replicons fuse (2) and all of the DNA is duplicated (3). The electron micrograph (4) shows replicons at three sites.

  • Scheme of replication

    NewDNA is synthesized in the 5! to 3! direction, but not in the 3! to 5! direction. A new nucleotide cannot be attached to the 5!-OH end of the newnucleotide chain. Only at the 3! end can nucleotides be attached continuously. New DNA at the 5! end is replicated in small segments. This represents an obstacle at the end of a chromosome (telomere, see p. 180).

  • Replication fork

    At the replication fork, each of the two DNA strands serves as a template for the synthesis of new DNA. First, the double helix at the replication fork region is unwound by an enzyme system (topoisomerases). Since the parent strands are antiparallel, DNA replication can proceed continuously in only one DNA strand (5! to 3! direction) (leading strand). Along the 3! to 5! strand (lagging strand), the new DNA is formed in small segments of 1000–2000 bases (Okazaki fragments). In this strand a short piece of RNA is required as a primer to start replication. This is formed by an RNA polymerase (primase). The RNA primer is subsequently removed; DNA is inserted into the gap by polymerase I and, finally, the DNA fragments are linked by DNA ligase. The enzyme responsible for DNA synthesis (DNA polymerase III) is complex and comprises several subunits. There are different enzymes for the leading and lagging strands in eukaryotes. During replication, mistakes are eliminated by a complex proof-reading mechanism that removes any incorrectly incorporated bases and replaces them with the correct ones.

  • References

    Cairns, J.: The bacterial chromosome and its
    manner of replication as seen by autoradiography.
    J. Mol. Biol. 6:208–213, 1963.
    Lodish, H. et al.: Molecular Cell Biology. 4th ed.
    Scientific American Books, F.H. Freeman &
    Co., New York, 2000.
    Marx, J.: How DNA replication originates.
    Science 270:1585–1587, 1995.
    Meselson, M., Stahl, F.W.: The replication of
    DNA in Escherichia coli. Proc. Natl. Acad. Sci.
    44:671–682, 1958.
    Watson, J.D. et al.: Molecular Biology of the
    Gene, 3 rd ed. Benjamin/Cummings Publishing
    Co., Menlo Park, California, 1987.

The Flow of Genetic Information: Transcription and Translation
The information contained in the nucleotide sequence of a gene must be converted into useful biological function. This is accomplished by proteins, either directly, by being involved in a biochemical pathway, or indirectly, by regulating the activity of a gene. The flowof genetic information is unidirectional and requires two major steps: transcription and translation. First, the information of the coding sequences of a gene is transcribed into an intermediary RNA molecule, which is synthesized in sequences that are precisely complementary to those of the coding strand of DNA (transcription). During the second major step the sequence information in the messenger RNA molecule (mRNA) is translated into a corresponding sequence of amino acids (translation). The length and sequence of the amino acid chain specified by a gene results in a polypeptide with a biological function (gene product).
  • Transcription

    First, the nucleotide sequence of one strand of DNA is transcribed into a complementary molecule of RNA (messenger RNA, mRNA). The DNA helix is opened by a complex set of proteins. The DNA strand in the 3! to 5! direction (coding strand) serves as the template for the transcription into RNA, which is synthesized in the 5! to 3! direction. It is called the RNA sense strand. RNA transcribed under experimental conditions from the opposing DNA strand is called antisense RNA.

  • Translation

    During translation the sequence of codons made up of the nucleotide bases in mRNA is converted into a corresponding sequence of amino acids. Translation occurs in a reading framewhich is defined at the start of translation (start codon). Amino acids are joined in the sequence determined by the mRNA nucleotide bases by a further class of RNA, transfer RNA (tRNA). Each amino acid has its own tRNA, which has a region that is complementary to its codon of the mRNA (anticodon). The codons 1, 2, 3, and 4 of the mRNA are translated into the amino acid sequence methionine (Met), glycine (Gly), serine (Ser), and isoleucine (Ile), etc. Codon 1 is always AUG (start codon).

  • Stages of translation

    Translation (protein synthesis) in eukaryotes occurs outside of the cell nucleus in ribosomes in the cytoplasm. Ribosomes consist of subunits of numerous associated proteins and RNA molecules (ribosomal RNA, rRNA; p. 204). Translation begins with initiation (1): an initiation complex comprising mRNA, a ribosome, and tRNA is formed. This requires a number of initiation factors (IF1, IF2, IF3, etc.). Then elongation (2) follows: a further amino acid, determined by the next codon, is attached. A threephase elongation cycle develops, with codon recognition, peptide binding to the next amino acid residue, and movement (translocation) of the ribosome three nucleotides further in the 3! direction of the mRNA. Translation ends with termination (3), when one of three mRNA stop codons (UAA, UGA, or UAG) is reached. The polypeptide chain formed leaves the ribosome, which dissociates into its subunits. The biochemical processes of the stages shown here have been greatly simplified.

  • Structure of transfer RNA (tRNA)

    Transfer RNA has a characteristic, cloverleaflike structure, illustrated here by yeast phenylalanine tRNA (1). It has three single-stranded loop regions and four double-stranded “stem” regions. The three-dimensional structure (2) is complex, but various functional areas can be differentiated, such as the recognition site (anticodon) for the mRNA codon and the binding site for the respective amino acid (acceptor stem) on the 3! end (acceptor end).

  • References

    Brenner, S. , Jacob. F., Meselson, M.: An unstable
    intermediate carrying information from
    genes to ribosomes for protein synthesis.
    Nature 190:576–581, 1961.
    Ibba, M., Söll, D.: Quality control mechanisms
    during translation. Science 286:1893–1897,
    Watson J.D. et al.: Molecular Biology of the
    Gene. 3rd ed. Benjamin/Cummings Publishing
    Co., Menlo Park, California, 1987.

Genes and Mutation
The double helix structure of DNA is the basis of both replication and transcription as seen in the preceding pages. The information transmitted during replication and transcription is arranged in units called genes. The term gene was introduced in 1909 by the Danish biologist Wilhelm Johannsen (along with the terms genotype and phenotype). Until it was realized that a gene consists of DNA, itwas defined in somewhat abstract terms as a factor (Mendel’s term) that confers certain heritable properties to a plant or an animal. However, it was not apparent how mutations could be related to the structure of a gene. The discovery that mutations also occur in bacteria and other microorganisms paved the way to understanding their nature (see p. 84). The organization of genes differs in prokaryotes and eukaryotes as shown below.
  • Transcription in prokaryotes and eukaryotes

    Transcription differs in unicellular organisms without a nucleus, such as bacteria (prokaryotes, 1), and in multicellular organisms (eukaryotes, 2), which have a cell nucleus. In prokaryotes, the mRNA serves directly as a template for translation. The sequences of DNA and mRNA correspond in a strict 1:1 relationship, i.e., they are colinear. Translation begins even before transcription has completely ended. In contrast, a primary transcript of RNA precursor mRNA) is formed first in eukaryotic cells. This is a preliminary form of the maturemRNA. The maturemRNA is formed when the noncoding sections are removed from the primary transcript, before it leaves the nucleus to act as a template for forming a polypeptide (RNA processing). The reason for these important differences is that functionally related genes generally lie together in prokaryotes and that noncoding segments (introns) are present in the genes of eukaryotes (see p. 50).

  • DNA and mutation

    Coding DNA and its corresponding polypeptide are colinear. An alteration (mutation) of the DNA base sequence may lead to a different codon. The position of the resulting change in the sequence of amino acids corresponds to the position of the mutation (1). Panel B shows the gene for the protein tryptophan synthetase A of E. coli bacteria and mutations at four positions. At position 22, phenylalanine (Phe) has been replaced by leucine (Leu); at position 49, glutamic acid (Glu) by glutamine (Gln); at position 177, Leu by arginine (Arg). Every mutation has a defined position. Whether it leads to incorporation of another amino acid depends on how the corresponding codon has been altered. Different mutations at one position (one codon) in different DNA molecules are possible (2). Two different mutations have been observed at position 211: glycine (Gly) to arginine (Arg) and Gly to glutamic acid (Glu). Normally (in the wildtype), codon 211 is GGA and codes for glycine (3). A mutation of GGA to AGA leads to a codon for arginine; amutation to GAA leads to a codon for glutamic acid (4).

  • Types of mutation

    Basically, there are three different types of mutation involving single nucleotides (point mutation): substitution (exchange), deletion (loss), and insertion (addition). With substitution, the consequences depend on howa codon has been altered. Two types of substitution are distinguished: transition (exchange of one purine for another purine or of one pyrimidine for another) and transversion (exchange of a purine for a pyrimidine, or vice versa). A substitution may alter a codon so that a wrong amino acid is present at this site but has no effect on the reading frame (missense mutation), whereas a deletion or insertion causes a shift of the reading frame (frameshift mutation). Thus the sequences that follow no longer code for a functional gene product (nonsense mutation).

  • References

    Alberts, B. et al.: Molecular Biology of the Cell.
    3rd ed. Garland Publishing, New York, 1994.
    Alberts, B. et al.: Essential Cell Biology. An Introduction
    to the Molecular Biology of the Cell.
    Garland Publishing, New York, 1998.
    Lodish, H. et al.: Molecular Cell Biology. 4th ed.
    Scientific American Books, F.H. Freeman &
    Co., New York, 2000.
    Watson, J.D. et al.: Molecular Biology of the
    Gene, 3rd ed. Benjamin/Cummings Publishing
    Co., Menlo Park, California, 1987.

Genetic Code
The genetic code is the set of biological rules by which DNA nucleotide base pair sequences are translated into corresponding sequences of amino acids. Genes do not code for proteins directly, but do so through a messenger molecule (messenger RNA,mRNA). A codeword (codon) for an amino acid consists of a sequence of three nucleotide base pairs (triplet codon). The genetic code also includes sequences for the beginning (start codon) and for the end (stop codon) of the coding region. The genetic code is universal; the same codons are used by different organisms.
  • Genetic code in mRNA for all amino acids

    Each codon corresponds to one amino acid, but one amino acid may be coded for by different codons (redundancy of the code). For example, there are two possibilities to code for the amino acid phenylalanine: UUU and UUC, and there are six possibilities to code for the amino acid serine: UCU, UCC, UCA, UCG, AGU, and AGC. Many amino acids are determined bymore than one codon. The greatest variation is in the third position (at the 3! end of the triplet). The genetic code was elucidated in 1966 by analyzing how triplets transmit information from the genes to proteins. mRNA added to bacteria could be directly converted into a corresponding protein. Synthetic RNA polymers such as polyuridylate (poly (U)), polyadenylate (poly(A)), and polycytidylate (poly(C)) could be directly translated into polyphenylalanine, polylysine, and polyproline in extracts of E. coli bacteria. This showed that UUU must code for phenylalanine, AAA for lysine, and CCC for proline. By further experiments with mixed polymers of different proportions of two or three nucleotides, the genetic codewas determined for all amino acids and all nucleotide compositions.

  • Abbreviated code

    Sequences of amino acids are designated with the single-letter abbreviations (“alphabetic code”). The start codon is AUG (methionine). Stop codons are UAA, UAG, and UGA. The only amino acids that are encoded by a single codon are methionine (AUG) and tryptophan (UGG).

  • Open reading frame (ORF)

    A segment of a nucleotide sequence can correspond to one of three reading frames (e.g., A, B, or C); however, only one is correct (open reading frame). In the example shown, the reading frames B and C are interrupted by a stop codon after three and five codons, respectively. Thus they cannot serve as reading frames for a coding sequence. On the other hand, Amust be the correct reading frame: It begins with the start codon AUG and yields a sequence without stop codons (open reading frame).

  • Coding by several different nucleotide sequences

    Since the genetic code has redundancy, it is possible that different nucleotide sequences code for the same amino acid sequence. However, the differences are limited to one (or at most two) positions of a given triplet codon.

  • References

    Alberts, B. et al.: Essential Cell Biology. An Introduction
    to the Molecular Biology of the Cell.
    Garland Publishing, New York, 1998.
    Crick, F.H.C. et al: General nature of the genetic
    code for proteins. Nature 192:1227–1232,
    Lodish, H. et al.: Molecular Cell Biology. 4th ed.
    Scientific American Books, F.H. Freeman &
    Co., New York, 2000.
    Rosenthal, N.: DNA and the genetic code. New
    Eng. J. Med. 331:39–41, 1995.
    Singer, M., Berg, P.: Genes and Genomes: a
    changing perspective. Blackwell Scientific
    Publications, Oxford–London, 1991.

The Structure of Eukaryotic Genes
Eukaryotic genes consist of coding and noncoding segments of DNA, called exons and introns, respectively. At first glance it seems to be an unnecessary burden to carry DNAwithout obvious functions within a gene. However, it has been recognized that this has great evolutionary advantages. When parts of different genes are rearranged on new chromosomal sites during evolution, new genes may be constructed from parts of previously existing genes.
  • Exons and introns

    In 1977, itwas unexpectedly found that the DNA of a eukaryotic gene is longer than its corresponding mRNA. The reason is that certain sections of the initially formed primary RNA transcript are removed before translation occurs. Electron micrographs show that DNA and its corresponding transcript (RNA) are of different lengths (1). When mRNA and its complementary single-stranded DNA are hybridized, loops of single-stranded DNA arise becausemRNA hybridizes onlywith certain sections of the singlestranded DNA. In (2), seven loops (A to G) and eight hybridizing sections are shown (1 to 7 and the leading section L). Of the total 7700 DNA base pairs of this gene (3), only 1825 hybridize with mRNA. A hybridizing segment is called an exon. An initially transcribed DNA section that is subsequently removed from the primary transcript is an intron. The size and arrangement of exons and introns are characteristic for every eukaryotic gene (exon/intron structure). (Electron micrograph fromWatson et al., 1987).

  • Intervening DNA sequences (introns)

    In prokaryotes, DNA is colinear with mRNA and contains no introns (1). In eukaryotes, mature mRNA is complementary to only certain sections of DNA because the latter contains introns (2). (Figure adapted from Stryer, 1995).

  • Basic eukaryotic gene structure

    Exons and introns are numbered in the 5! to 3! direction of the coding strand. Both exons and introns are transcribed into a precursor RNA (primary transcript). The first and the last exons usually contain sequences that are not translated. These are called the 5! untranslated region (5! UTR) of exon 1 and the 3! UTR at the 3! end of the last exon. The noncoding segments (introns) are removed from the primary transcript and the exons on either side are connected by a process called splicing. Splicing must be very precise to avoid an undesirable change of the correct reading frame. Introns almost always start with the nucleotides GT in the 5! to 3! strand (GU in RNA) and end with AG. The sequences at the 5! end of the intron beginning with GT are called splice donor site and at the 3! end, ending with AG, are called the splice acceptor site. Mature mRNA is modified at the 5! end by adding a stabilizing structure called a “cap” and by adding many adenines at the 3! end (polyadenylation) (see p. 50).

  • Splicing pathway in GU–AG introns

    RNA splicing is a complex process mediated by a large RNA-containing protein called a spliceosome. This consists of five types of small nuclear RNA molecules (snRNA) and more than 50 proteins (small nuclear riboprotein particles). The basic mechanism of splicing schematically involves autocatalytic cleavage at the 5! end of the intron resulting in lariat formation. This is an intermediate circular structure formed by connecting the 5! terminus (UG) to a base (A) within the intron. This site is called the branch site. In the next stage, cleavage at the 3! site releases the intron in lariat form. At the same time the right exon is ligated (spliced) to the left exon. The lariat is debranched to yield a linear intron and this is rapidly degraded. The branch site identifies the 3! end for precise cleavage at the splice acceptor site. It lies 18–40 nucleotides upstream (in 5! direction) of the 3! splice site. (Figure adapted from Strachan and Read, 1999).

  • References

    Lewin, B.: Genes VII. Oxford Univ. Press, Oxford,
    Strachan, T., Read A.P.: Human Molecular
    Genetics. 2nd ed. Bios Scientific Publishers,
    Oxford, 1999.
    Stryer, L.: Biochemistry, 4th ed. W.H. Freeman &
    Co., New York, 1995.
    Watson, J.D. et al.: Molecular Biology of the
    Gene, 3rd ed. Benjamin/Cummings Publishing
    Co., Menlo Park, California, 1987.
DNA Sequencing
Knowledge of the nucleotide sequence of a gene provides important information about its structure, function, and evolutionary relationship to other similar genes in the same or different organisms. Thus, the development in the 1970s of relatively simple methods for sequencing DNA has had a great impact on genetics. Two basic methods for DNA sequencing have been developed: a chemical cleavage method (A. M. Maxam andW. Gilbert, 1977) and an enzymatic method (F. Sanger, 1981). A brief outline of the underlying principles follows.
  • Sequencing by chemical degradation

    This method utilizes base-specific cleavage of DNA by certain chemicals. Four different chemicals are used in four reactions, one for each base. Each reaction produces a set of DNA fragments of different sizes. The sizes of the fragments in a reaction mixture are determined by positions in the DNA of the nucleotide that has been cleaved. A double-stranded or singlestranded fragment of DNA to be sequenced is processed to obtain a single strand labeledwith a radioactive isotope at the 5! end (1). This DNA strand is treated with one of the four chemicals for one of the four reactions. Here the reaction at guanine sites (G) by dimethyl sulfate (DMS) is shown. Dimethyl sulfate attaches a methyl group to the purine ring of G nucleotides. The amount of DMS used is limited so that on average just one G nucleotide per strand is methylated, not the others (shown here in four different positions of G). When a second chemical, piperidine, is added, the nucleotide purine ring is removed and the DNA molecule is cleaved at the phosphodiester bond just upstream of the site without the base. The overall procedure results in a set of labeled fragments of defined sizes according to the positions of G in the DNA sample being sequenced. Similar reactions are carried out for the other three bases (A, T, and C, not shown). The four reaction mixtures, one for each of the bases, are run in separate lanes of a polyacrylamide gel electrophoresis. Each of the four lanes represents one of the four bases G, A, T, or C. The smallest fragment will migrate the farthest downward, the next a little less far, etc. One can then read the sequence in the direction opposite to migration to obtain the sequence in the 5! to 3! direction (here TAGTCGCAGTACCGTA).

  • Sequencing by chain termination

    This method, nowmuchmorewidely used than the chemical cleavage method, rests on the principle that DNA synthesis is terminated when instead of a normal deoxynucleotide (dATP, dTTP, dGTP, dCTP), a dideoxynucleotide (ddATP, ddTTP, ddGTP, ddCTP) is used. A dideoxynucleotide (ddNTP) is an analogue of the normal dNTP. It differs by lack of a hydroxyl group at the 3! carbon position. When a dideoxynucleotide is incorporated during DNA synthesis, no bond between its 3! position and the next nucleotide is possible because the ddNTP lacks the 3! hydroxyl group. Thus, synthesis of the newchain is terminated at this site. The DNA fragment to be sequenced has to be single-stranded (1). DNA synthesis is initiated using a primer and one of the four ddNTPs labeled with 32P in the phosphate groups or, for automated sequencing, with a fluorophore (see next plate). Here an example of chain termination using ddATP is shown (3). Wherever an adenine (A) occurs in the sequence, the dideoxyadenine triphosphate will cause termination of the newDNA chain being synthesized. This will produce a set of different DNA fragments whose sizes are determined by the positions of the adenine residues occurring in the fragment to be sequenced. Similar reactions are done for the other three nucleotides. The four parallel reactions will yield a set of fragments with defined sizes according to the positions of the nucleotides where the new DNA synthesis has been terminated. The fragments are separated according to size by gel electrophoresis as in the chemical method. The sequence gel is read in the direction from small fragments to large fragments to derive the nucleotide sequence in the 5! to 3! direction. An example of an actual sequencing gel is shown between panel A and B.

  • References

    Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
    Rosenthal, N.: Fine structure of a gene—DNA
    sequencing. New Eng. J. Med. 332:589–591,
    Strachan, T., Read, A.P.: Human Molecular
    Genetics. 2nd ed. Bios scientific Publishers,

Automated DNA Sequencing
Large-scale DNA sequencing requires automated procedures based on fluorescence labeling of DNA and suitable detection systems. In general, a fluorescent label can be used either directly or indirectly. Direct fluorescent labels, as used in automated sequencing, are fluorophores. These aremolecules that emit a distinct fluorescent color when exposed to UV light of a specific wavelength. Examples of fluorophores used in sequencing are fluorescein, which fluoresces pale green when exposed to a wavelength of 494 nm; rhodamine, which fluoresces red at 555 nm; and aminomethylcumarin acetic acid, which fluoresces blue at 399 nm. In addition, a combination of different fluorophores can be used to produce a fourth color. Thus, each of the four bases can be distinctly labeled. Another approach is to use PCR-amplified products (thermal cycle sequencing, see A). This has the advantage that double-stranded rather than single-stranded DNA can be used as the starting material. And since small amounts of template DNA are sufficient, the DNA to be sequenced does not have to be cloned beforehand.
  • Thermal cycle sequencing

    The DNA to be sequenced is contained in vector DNA (1). The primer, a short oligonucleotide with a sequence complementary to the site of attachment on the single-stranded DNA, is used as a starting point. For sequencing short stretches of DNA, a universal primer is sufficient. This is an oligonucleotide that will bind to vector DNA adjacent to the DNA to be sequenced. However, if the latter is longer than about 750 bp, only part of it will be sequenced. Therefore, additional internal primers are required. These anneal to different sites and amplify the DNA in a series of contiguous, overlapping chain termination experiments (2). Here, each primer determines which region of the template DNA is being sequenced. In thermal cycle sequencing (3), only one primer is used to carry out PCR reactions, each with one dideoxynucleotide (ddA, ddT, ddG, or ddC) in the reaction mixture. This generates a series of different chain-terminated strands, each dependent on the position of the particular nucleotide base where the chain is being terminated (4). After many cycles and with electrophoresis, the sequence can be read as shown in the previous plate. One advantage of thermal cycle sequencing is that double-stranded DNA can be used as starting material. (Illustration based on Figures 4.5 and 4.6 in Brown, 1999).

  • Automated DNA sequencing (principle)

    Automated DNA sequencing involves four fluorophores, one for each of the four nucleotide bases. The resulting fluorescent signal is recorded at a fixed point when DNA passes through a capillary containing an electrophoretic gel. The base-specific fluorescent labels are attached to appropriate dideoxynucleotide triphosphates (ddNTP). Each ddNTP is labeled with a different color, e.g., ddATP green, ddCTP blue, ddGTP yellow, and ddTTP red (1). (The actual colors for each nucleotide may be different.) All chains terminated at an adenine (A) will yield a green signal; all chains terminated at a cytosine (C) will yield a blue signal, and so on. The sequencing reactions based on this kind of chain termination at labeled nucleotides (2) are carried out automatically in sequencing capillaries (3). The electrophoretic migration of the ddNTP-labeled chains in the gel in the capillary pass in front of a laser beam focused on a fixed position. The laser induces a fluorescent signal that is dependent on the specific label representing one of the four nucleotides. The sequence is electronically read and recorded and is visualized as alternating peaks in one of the four colors, representing the alternating nucleotides in their sequence positions. In practice the peaks do not necessarily show the same maximal intensity as in the schematic diagram shown here. (Illustration based on Brown, 1999, and Strachan and Read, 1999).

  • References

    Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
    Rosenthal, N.: Fine structure of a gene—DNA
    sequencing. New Eng. J. Med. 332:589–591,
    Strachan, T., Read, A.P.: Human Molecular
    Genetics. 2nd ed. Bios Scientific Publishers,
    Oxford, 1999.
    Wilson, R.K., et al.: Development of an automated
    procedure for fluorescent DNA
    sequencing. Genomics 6:626–636, 1990.

DNA Cloning
To obtain sufficient amounts of a specific DNA sequence (e.g., a gene of interest) for study, it must be selectively amplified. This is accomplished by DNA cloning, which produces a homogeneous population of DNA fragments from a mixture of very different DNA molecules or from all the DNA of the genome. Here procedures are required to identify DNA from the correct region in the genome, to separate it fromotherDNA, and to multiply (clone) it selectively. Identification of the correct DNA fragment utilizes the specific hybridization of complementary single-stranded DNA (molecular hybridization). A short segment of singlestranded DNA, a probe, originating from the sequence to be studied, will hybridize to its complementary sequences after these have been denatured (made single-stranded, see Southern blot analysis, p. 62). After the hybridized sequence has been separated from other DNA, it can be cloned. The selected DNA sequences can be amplified in two basic ways: in cells (cell-based cloning) or by cell-free cloning (see polymerase chain reaction, PCR, p. 66).
  • Cell-based DNA cloning

    Cell-based DNA cloning requires four initial steps. First, a collection of different DNA fragments (here labeled 1, 2, and 3) are obtained from the desired DNA (target DNA) by cleaving it with a restriction enzyme (see p. 64). Since fragments resulting from restriction enzyme cleavage have a short single-stranded end of a specific sequence at both ends, they can be ligated to other DNA fragments that have been cleaved with the same enzyme. The fragments produced in step 1 are joined to DNA fragments containing the origin of replication (OR) of a replicon, which enables them to replicate (2). In addition, a fragment may be joined to a selectable marker, e.g., a DNA sequence containing an antibiotic resistance gene. The recombinant DNA molecules are transferred into host cells (bacterial or yeast cells). Here the recombinant DNA molecules can replicate independently of the host cell genome (3). Usually the host cell takes up only one (although occasionally more than one) foreign DNA molecule. The host cells transformed by recombinant (foreign) DNA are grown in culture and multiplied (propagation, 4). Selective growth of one of the cell clones allows isolation of one type of recombinant DNA molecule (5). After further propagation, a homogeneous population of recombinant DNA molecules is obtained (6). A collection of different fragments of cloned DNA is called a clone library (7, see DNA libraries). In cell-based cloning, the replicon-containing DNA molecules are referred to as vector molecules. (Figure adapted from Strachan and Read, 1999)

  • A plasmid vector for cloning

    Many different vector systems exist for cloning DNA fragments of different sizes. Plasmid vectors are used to clone small fragments. The experiment is designed in such a way that incorporation of the fragment to be cloned changes the plasmid’s antibiotic resistance to allow selection for these recombinant plasmids. A formerly frequently used plasma vector (pBR322) is presented. This plasmid contains recognition sites for the restriction enzymes PstI, EcoRI, and SalI in addition to genes for ampicillin and tetracycline resistance (1). If a foreign DNA fragment is incorporated into the plasmid at the site of the EcoRI recognition sequence, then tetracycline and ampicillin resistance will be retained (2). If the enzyme PstI is used to incorporate the fragment to be used, ampicillin resistance is lost (the bacterium becomes ampicillin sensitive), but tetracycline resistance is retained. If the enzyme SalI is used to incorporate the fragment, tetracycline resistance disappears (the bacterium becomes tetracycline sensitive), but ampicillin resistance is retained. Thus, depending on how the fragment has been incorporated, recombinant plasmids containing the DNA fragment to be cloned can be distinguished from nonrecombinant plasmids by altered antibiotic resistance. Cloning in plasmids (bacteria) has become less important since yeast artificial chromosomes (YACs) have become available for cloning relatively large DNA fragments

  • References

    Brown, T.A.: Genomes. Bios Scientific Publ., Oxford, 1999. Strachan, T., Read, A.P.: Human Molecular Genetics. 2nd ed. Bios Scientific Publishers, Oxford, 1999.

cDNA Cloning
cDNA is a single-stranded segment of DNA that is complementary to the mRNA of a coding DNA segment or of a whole gene. It can be used as a probe (cDNA probe as opposed to a genomic probe) for the corresponding gene because it is complementary to coding sections (exons) of the gene. If the gene has been altered by structural rearrangement at a corresponding site, e.g., by deletion, the normal and mutated DNA can be differentiated. Thus, the preparation and cloning of cDNA is of great importance. From the cDNA sequence, essential inferences can be made about a gene and its gene product.
  • Preparation of cDNA

    cDNA is prepared from mRNA. Therefore, a tissue is required in which the respective gene is transcribed and mRNA is produced in sufficient quantities. First, mRNA is isolated. Then a primer is attached so that the enzyme reverse transcriptase can form complementary DNA (cDNA) from the mRNA. Since mRNA contains poly(A) at its 3! end, a primer of poly(T) can be attached. From here, the enzyme reverse transcriptase can start forming cDNA in the 5! to 3! direction. The RNA is then removed by ribonuclease. The cDNA serves as a template for the formation of a new strand of DNA. This requires the enzyme DNA polymerase. The result is a double strand of DNA, one strand of which is complementary to the original mRNA. To this DNA, single sequences (linkers) are attached that are complementary to the single-stranded ends produced by the restriction enzyme to be used. The same enzyme is used to cut the vector, e.g., a plasmid, so that the cDNA can be incorporated for cloning.

  • Cloning vectors

    The cell-based cloning of DNA fragments of different sizes is facilitated by a wide variety of vector systems. Plasmid vectors are used to clone small DNA fragments in bacteria. Their main disadvantage is that only 5–10 kb of foreign DNA can be cloned. A plasmid cloning vector that has taken up a DNA fragment (recombinant vector), e.g., pUC8 with 2.7 kb of DNA, must be distinguished from one that has not. In addition, an ampicillin resistance gene (Amp+) serves to distinguish bacteria that have taken up plasmids from those that have not. Several unique restriction sites in the plasmid DNA segment where a DNA fragment might be inserted serve as markers along with a marker gene, such as the lacZ gene. The uptake of a DNA fragment by a plasmid vector disrupts the plasmid's marker gene. Thus, in the recombinant plasmid the enzyme !-galactosidase will not be produced by the disrupted lagZ gene, whereas in the plasmid without a DNA insert (nonrecombinant) the enzyme is produced by the still intact lacZ gene. The activity of the gene and the presence or absence of the enzyme are determined by observing a difference in color of the colonies in the presence of an artificial substrate sugar. !-Glactosidase splits an artificial sugar (5-bromo-4-chloro-3-indolyl-!-D-galactopyranoside) that is similar to lactose, the natural substrate for this enzyme, into two sugar components, one of which is blue. Thus, bacterial colonies containing nonrecombinant plasmids with an intact lacZ gene are blue. In contrast, colonies that have taken up a recombinant vector remain pale white. The latter are grown in a medium containing ampicillin (the selectable marker for the uptake of plasmid vectors). Subsequently, a clone library can be constructed. (Figure adapted from Brown, 1999)

  • cDNA cloning

    Only those bacteria become ampicillin resistant that have incorporated a recombinant plasmid. Recombinant plasmids, which contain the gene for ampicillin resistance, transform ampicillinsensitive bacteria into ampicillin-resistant bacteria. In an ampicillin-containing medium, only those bacteria grow that contain the recombinant plasmid with the desired DNA fragment. By further replication in these bacteria, the fragment can be cloned until there is enough material to be studied. (Figures afterWatson et al., 1987).

  • References

    Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
    Watson, J.D., et al.: Molecular Biology of the
    Gene, 3 rd ed. Benjamin/Cummings Publishing
    Co., Menlo Park, California, 1987.

DNA Libraries
A DNA library is a collection of DNA fragments that in their entirety represent the genome, that is, a particular gene being sought and all remaining DNA. It is the starting point for cloning a gene of unknown chromosomal location. To produce a library, the total DNA is digestedwith a restriction enzyme, and the resulting fragments are incorporated into vectors and replicated in bacteria. A sufficient number of clones must be present so that every segment is represented at least once. This is a question of the size of the genome being investigated and the size of the fragments. Plasmids and phages are used as vectors. For larger DNA fragments, yeast cells may be employed. There are two different types of libraries: genomic DNA and cDNA.
  • Genomic DNA library

    Clones of genomic DNA are copies of DNA fragments from all of the chromosomes (1). They contain coding and noncoding sequences. Restriction enzymes are used to cleave the genomic DNA into many fragments. Here four fragments are schematically shown, containing two genes, A and B (2). These are incorporated into vectors, e.g., into phage DNA, and are replicated in bacteria. The complete collection of recombinant DNA molecules, containing all DNA sequences of a species or individual, is called a genomic library. To find a particular gene, a screening procedure is required

  • cDNA library

    Unlike a genomic library, which is complete and contains coding and noncoding DNA, a cDNA library consists only of coding DNA sequences. This specificity offers considerable advantages over genomic DNA. However, it requires that mRNA be available and does not yield information about the structure of the gene. mRNA can be obtained only fromcells in which the respective gene is transcribed, i.e., in which mRNA is produced (1). In eukaryotes, the RNA formed during transcription (primary transcript) undergoes splicing to form mRNA (2, see p. 50). Complementary DNA (cDNA) is formed from mRNA by the enzyme reverse transcriptase (3). The cDNA can serve as a template for synthesis of a complementary DNA strand, so that complete double-stranded DNA can be formed (cDNA clone). Its sequence corresponds to the coding sequences of the gene exons. Thus it is well suited for use as a probe (cDNA probe). The subsequent steps, incorporation into a vector and replication in bacteria, correspond to those of the procedure to produce a genomic library. cDNA clones can only be won from coding regions of an active (mRNA-producing) gene; thus, the cDNA clones of different tissues differ according to genetic activity. Since cDNA clones correspond to the coding sequences of a gene (exons) and contain no noncoding sections (introns), cloned cDNA is the preferred starting material when further information about a gene product is sought by analyzing the gene. The sequence of amino acids in a protein can be determined from cloned and sequenced cDNA. Also, large amounts of a protein can be produced by having the cloned gene expressed in bacteria or yeast cells.

  • Screening of a DNA library

    Bacteria that have taken up the vectors can grow on an agar-coated Petri dish, where they form colonies (1). A replica imprint of the culture is taken on a membrane (2), and the DNA that sticks to the membrane is denatured with an alkaline solution (3). DNA of the gene segment being sought can then be identified by hybridization with a radioactively (or otherwise) labeled probe (4). After hybridization, a signal appears on the membrane at the site of the gene segment (5). DNA complementary to the labeled probe is located here; its exact position in the culture corresponds to that of the signal on the membrane (6). A probe is taken fromthe corresponding area of the culture (5). It will contain the desired DNA segment, which can now be further replicated (cloned) in bacteria. By this means, the desired segment can be enriched and is available for subsequent studies.

  • References

    Rosenthal, N.: Stalking the gene—DNA libraries.
    New Eng. J. Med. 331:599–600, 1994.
    Watson, J.D. et al.: Recombinant DNA. 2nd ed.
    Scientific American Books, New York, 1992.

Restriction Analysis by Southern Blot Analysis
Restriction endonucleases are DNA-cleaving enzymes with defined sequences as targets (see next plate). They are often simply called restriction enzymes. Since each enzyme cleaves DNA only at its specific recognition sequence, the total DNA of an individual present in nucleated cells can be cut into pieces of manageable and defined size in a reproducible way. Individual DNA fragments can then be selected, ligated into suitable vectors, multiplied, and examined. Owing to the uneven distribution of recognition sites, theDNA fragments differ in size. A starting mixture of DNA fragments is sorted according to size. Two procedures detect target DNA or RNA fragments after they have been arranged by size in gel electrophoresis—the Southern blot hybridization for DNA (named after E. Southern who developed this method 1975) and the Northern blot hybridization for RNA (a word play on Southern, not named after a Dr. Northern). Immunoblotting (Western blot) detects proteins by an antibody-based procedure.
  • Southern blot hybridization

    The analysis starts with total DNA (1). The DNA is isolated and cut with restriction enzymes (2). One of the not yet identified fragments contains the gene being sought or part of the gene. The fragments are sorted by size in a gel (usually agarose) in an electric field (electrophoresis) (3). The smaller the fragment, the faster it migrates; the larger, the slower it migrates. Next, the blot is carried out: The fragments contained in the gel are transferred to a nitrocellulose or nylon membrane (4). There the DNA is denatured (made single-stranded) with alkali and fixed to the membrane by moderate heating (! 80!C) or UV cross-linkage. The sample is incubated with a probe of complementary singlestranded DNA (genomic DNA or cDNA) from the gene (5). The probe hybridizes solely with the complementary fragment being sought, and not with others (6). Since the probe is labeled with radioactive 32P, the fragment being sought can be identified by placing an X-ray film on the membrane, where it appears as a black band on the film after development (autoradiogram) (6). The size, corresponding to position, is determined by running DNA fragments of known size in the electrophoresis.

  • Restriction fragment length polymorphism (RFLP)

    In about every 100 base pairs of a DNA segment, the nucleotide sequence differs in some individuals (DNA polymorphism). As a result, the recognition sequence of a restriction enzyme may be present on one chromosome but not the other. In this case the restriction fragment sizes differ at this site (restriction fragment length polymorphism, RFLP). An example is shown for two 5 kb (5000 base pair) DNA segments. In one, a restriction site in the middle is present (allele 1); in the other (allele 2) it is absent. With a Southern blot, it can be determined whether in this location an individual is homozygous 1–1 (two alleles 1, no 5 kb fragment), heterozygous 1–2 (one allele each, 1 and 2), or homozygous 2–2 (two alleles 2). If the mutation being sought lies on the chromosome carrying the 5 kb fragment, the presence of this fragment indicates presence of the mutation. The absence of this fragment would indicate that the mutation is absent. It is important to understand that the RFLP itself is unrelated to the mutation. It simply distinguishes DNA fragments of different sizes from the same region. These can be used as markers to distinguish alleles in a segregation analysis (see p. 134). In addition to RFLPs, other types of DNA polymorphism can be detected by Southern blot hybridization, although polymerase chain reaction-based analysis of microsatellites is now used more frequently

  • References

    Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
    Housman, D.: Human DNA polymorphism. New
    Engl. J. Med. 332:318–320, 1995.
    Strachan, T., Read, A.P.: Human Molecular
    Genetics. 2nd ed. Bios Scientific Publishers,
    Oxford, 1999.

Restriction Mapping
Restriction endonucleases (restriction enzymes) are DNA-cutting enzymes. They are obtained from bacteria, which produce the enzymes as protection from foreign DNA. A given enzyme recognizes a specific sequence of 4–8 (usually 6) nucleotides (a restriction site) where it cleaves the DNA. The sizes of the DNA fragments produced depend on the distribution of the restriction sites. More than 400 different types of restriction enzymes have been isolated.
  • DNA cleavage by restriction nucleases

    The cleavage patterns (recognition sequences) of three frequently used restriction enzymes, EcoRI, HindIII, and HpaI, are presented. For EcoRI and HindIII the cut is “palindromic,” i.e., the cut is asymmetric around an axis on which mirrorimage complementary single-stranded DNA segments arise. Each corresponds to its opposite- lying strand in the reverse direction. Therefore, they can be joined to a DNA fragment whose ends contain complementary singlestranded sequences. HpaI cuts both strands so that no single-stranded ends are formed. Frequently cutting and seldom cutting enzymes can be distinguished according to the frequency of occurrence of their recognition sites.

  • Examples of restriction enzymes

    The recognition sequences of some restriction enzymes are shown. The names of the enzymes are derived from those of the bacteria in which they occur, e.g., EcoRI from Escherichia coli Restriction enzyme I, etc. Some enzymes have a cutting site with limited specificity. In HindII it suffices that the two middle nucleotides are a pyrimidine and a purine (GTPyPuAC), and it does not matter whether the former is thymine (T) or cytosine (C), and whether the latter is adenine (A) or guanine (G). Such a recognition site occurs frequently and produces many relatively small fragments, whereas enzymes that cut very infrequently produce few and large DNA fragments.

  • Restriction fragments

    In a given DNA segment, the recognition sequence of a restriction enzyme occurs irregularly. Thus, the distances between restriction sites differ. DNA fragments of various sizes (restriction fragments) result from digestion with a restriction enzyme. A given restriction enzymewill cleave a given segment of DNA into a series of DNA fragments of characteristic sizes. This leads to a pattern that can be employed for diagnostic purposes.

  • Determination of the locations of restriction sites

    Since the fragment sizes reflect the relative positions of the cutting sites, they can be used to characterize a DNA segment (restriction map). If a 10-kb DNA segment cut by two enzymes, A and B, yields three fragments, of 2 kb, 3 kb, and 5 kb, then the relative location of the cutting sites can be determined by using enzymes A and B alone in further experiments. If enzyme A yields two fragments of 3 kb and 7 kb, and enzyme B two fragments of 2 kb and 8 kb, then the two cutting sites of enzymes A and B must lie 5 kb apart. To the left of the restriction site of enzyme A are 3 kb; to the right of the restriction site of enzyme B, 2 kb (1 kb = 1000 base pairs).

  • Restriction map

    A given DNA segment can be characterized by the distribution pattern of restriction sites. In the example shown, a DNA segment is characterized by the distribution of the cutting sites for enzymes E (EcoRI) and H (HindIII). The individual sites are separated by intervals defined by the size of the fragments after digestion with the enzyme. A restriction map is a linear sequence of restriction sites at defined intervals along the DNA. Restriction mapping is of considerable importance in medical genetics and evolutionary research.
DNA Amplification by Polymerase Chain Reaction (PCR)
The introduction of cell-free methods formultiplying DNA fragments of defined origin (DNA amplification) in 1985 ushered in a new era in molecular genetics (the principle of PCR is contained in earlier publications). This fundamental technology has spread dramaticallywith the development of automated equipment used in basic and applied research.
  • Polymerase chain reaction (PCR)

    PCR is a cell-free, rapid, and sensitive method for cloning DNA fragments. A standard reaction and a wide variety of PCR-based methods have been developed to assay for polymorphisms and mutations. Standard PCR is an in vitro procedure for amplifying defined target DNA sequences, even from very small amounts of material or material of ancient origin. Selective amplification requires some prior information about DNA sequences flanking the target DNA. Based on this information, two oligonucleotide primers of about 15–25 base pairs length are designed. The primers are complementary to sequences outside the 3! ends of the target site and bind specifically to these. PCR is a chain reaction because newly synthesized DNA strands act as templates for further DNA synthesis for about 25–35 subsequent cycles. Theoretically each cycle doubles the amount of DNA amplified. At the end, at least 105 copies of the specific target sequence are present. This can be visualized as a distinct band of a specific size after gel electrophoresis. Each cycle, involving three precisely time-controlled and temperature-controlled reactions in automated thermal cyclers, takes about 1–5 min. The three steps in each cycle are (1) denaturation of double-stranded DNA, at about 93–95!C for human DNA, (2) primer annealing at about 50–70!C depending on the expected melting temperature of the duplex DNA, and (3) DNA synthesis using heat-stable DNA polymerase (from microorganisms living in hot springs, such as Thermophilus aquaticus, Taq polymerase), typically at about 70–75!C. At each subsequent cycle the template (shown in blue) and the DNA newly synthesized during the preceding cycle (shown in red) act as templates for another round of synthesis. The first cycle results in newly synthesized DNA of varied lengths (shown with an arrow) at the 3! ends because synthesis is continued beyond the target sequences. The same happens during subsequent cycles, but the variable strands are rapidly outnumbered by new DNA of fixed length at both ends because synthesis cannot proceed past the terminus of the primer at the opposite template DNA.

  • cDNA amplification and RT-PCR

    A partially known amino acid sequence of a polypeptide can be used to obtain the sequence information required for PCR. From its mRNA one can derive cDNA (see complementary DNA, p. 58) and determine the sequence of the sense and the antisense strand to prepare appropriate oligonucleotide primers (1). When different RNAs are available in small amounts, rapid PCRbased methods are employed to amplify cDNA from different exons of a gene. cDNA is obtained by reverse transcriptase from mRNA, which is then removed by alkaline hydrolysis (2). After a complementary new DNA strand has been synthesized, the DNA can be amplified by PCR (3). Reverse transcriptase PCR (RT-PCR) can be used when the known exon sequences are widely separated within a gene. With rapid amplification of cDNA ends (RACE-PCR), the 5! and 3! end sequences can be isolated from cDNA.

  • References

    Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
    Erlich, H.A., Gelfand D., Sninsky, J.J.: Recent advances
    in the polymerase chain reaction.
    Science 252:1643–1651, 1991.
    Erlich, H.A., Arnheim, N.: Genetic analysis with
    the polymerase chain reaction. Ann. Rev.
    Genet. 26:479–506, 1992.
    Strachan, T., Read, A.P.: Human Molecular
    Genetics. 2nd ed. Bios Scientific Publishers,
    Oxford, 1999.
    Volkenandt, M., Löhr, M., Dicker, A.P.: Gen-
    Amplification durch Polymerase-Kettenreaktion.
    Dtsch. Med. Wschr. 17:670–676,
    White, T.J., Arnheim, N., Erlich, H.A.: The polymerase
    chain reaction. Trends Genet. 5:185–
    189, 1989.

Changes in DNA
When it was recognized that changes (mutations) in genes occur spontaneously (T. H. Morgan, 1910) and can be induced by X-rays (H. J. Muller, 1927), the mutation theory of heredity became a cornerstone of early genetics. Genes were defined asmutable units, but the question what genes and mutations are remained. Today we know that mutations are changes in the structure of DNA and their functional consequences. The study of mutations is important for several reasons. Mutations cause diseases, including all forms of cancer. They can be induced by chemicals and by irradiation. Thus, they represent a link between heredity and environment. And without mutations, well-organized forms of life would not have evolved. The following two plates summarize the chemical nature of mutations.
  • Error in replication

    The synthesis of a new strand of DNA occurs by semiconservative replication based on complementary base pairing (see DNA replication). Errors in replication occur at a rate of about 1 in 105. This rate is reduced to about 1 in 107 to 109 by proofreading mechanisms. When an error in replication occurs before the next cell division (here referred to as the first division after the mutation), e.g., a cytosine (C) might be incorporated instead of an adenine (A) at the fifth base pair as shown here, the resulting mismatch will be recognized and eliminated by mismatch repair (see DNA repair) in most cases. However, if the error is undetected and allowed to stand, the next (second) divisionwill result in a mutant molecule containing a CG instead of an AT pair at this position. This mutationwill be perpetuated in all daughter cells. Depending on its location within or outside of the coding region of a gene, functional consequences due to a change in a codon could result.

  • Mutagenic alteration of a nucleotide

    A mutation may result when a structural change of a nucleotide affects its base-pairing capability. The altered nucleotide is usually present in one strand of the parent molecule. If this leads to incorporation of awrong base, such as a C instead of a T in the fifth base pair as shown here, the next (second) round of replication will result in two mutant molecules.

  • Replication slippage

    A different class of mutations does not involve an alteration of individual nucleotides, but results from incorrect alignment between allelic or nonallelic DNA sequences during replication. When the template strand contains short tandem repeats, e.g., CA repeats as in microsatellites (see DNA polymorphism and Part II, Genomics), the newly replicated strand and the template strand may shift their positions relative to each other. With replication or polymerase slippage, leading to incorrect pairing of repeats, some repeats are copied twice or not at all, depending on the direction of the shift. One can distinguish forward slippage (shown here) and backward slippage with respect to the newly replicated strand. If the newly synthesized DNA strand slips forward, a region of nonpairing remains in the parental strand. Forward slippage results in an insertion. Backward slippage of the new strand results in deletion. Microsatellite instability is a characteristic feature of hereditary nonpolyposis cancer of the colon (HNPCC). HNPCC genes are localized on human chromosomes at 2p15–22 and 3p21.3. About 15% of all colorectal, gastric, and endometrial carcinomas show microsatellite instability. Replication slippage must be distinguished from unequal crossing-over during meiosis. This is the result of recombination between adjacent, but not allelic, sequences on nonsister chromatids of homologous chromosomes (Figures redrawn from Brown, 1999).

  • References

    Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
    Dover, G.A.: Slippery DNA runs on and on and
    on ... Nature Genet. 10:254–256, 1995.
    Lewin, B.: Genes VII. Oxford University Press,
    Oxford, 2000.
    Rubinstein, D.C., et al.: Microsatellite evolution
    and evidence for directionality and variation
    in rate between species. Nature Genet.
    10:337–343, 1995.
    Strachan, T.A., Read, A.P.: Human Molecular
    Genetics. 2nd ed. Bios Scientific Publ., Oxford,
    Vogel, F., Rathenberg, R.: Spontaneous mutation
    in man. Adv. Hum. Genet. 5:223–318, 1975.

Mutation Due to Different Base Modifications
Mutations can result from chemical or physical events that lead to base modification. When they affect the base-pairing pattern, they interfere with replication or transcription. Chemical substances able to induce such changes are called mutagens. Mutagens cause mutations in different ways. Spontaneous oxidation, hydrolysis, uncontrolled methylation, alkylation, and ultraviolet irradiation result in alterations that modify nucleotide bases. DNA-reactive chemicals change nucleotide bases into different chemical structures or remove a base.
  • Deamination and methylation

    Cytosine, adenine, and guanine contain an amino group. When this is removed (deamination), a modified base with a different basepairing pattern results. Nitrous acid typically removes the amino group. This also occurs spontaneously at a rate of 100 bases per genome per day (Alberts et al., 1994, p. 245). Deamination of cytosine removes the amino group in position 4 (1). The resulting molecule is uracil (2). This pairs with adenine rather than guanine. Normally this change is efficiently repaired by uracil-DNA glycosylase. Deamination at the RNA level occurs in RNA editing (see Expression of genes). Methylation of the carbon atom in position 5 of cytosine results in 5- methylcytosine, containing a methyl group in position 5 (3). Deamination of 5-methylcytosine will result in a change to thymine, containing an oxygen in position 4 instead of an amino group (4). This mutation will not be corrected because thymine is a natural base. Adenine (5) can be deaminated in position 6 to form hypoxanthine, which contains an oxygen in this position instead of an amino group (6), and which pairs with cytosine instead of thymine. The resulting change after DNA replication is a cytosine instead of a thymine in the mutant strand.
  • Depurination

    About 5000 purine bases (adenine and guanine) are lost per day from DNA in each cell (depurination) owing to thermal fluctuations. Depurination of DNA involves hydrolytic cleavage of the N-glycosyl linkage of deoxyribose to the guanine nitrogen in position 9. This leaves a depurinated sugar. The loss of a base pair will lead to a deletion after the next replication if not repaired in time (see DNA repair).

  • Alkylation

    Alkylation is the introduction of a methyl or an ethyl group into a molecule. The alkylation of guanine involves the replacement of the hydrogen bond to the oxygen atom in position 6 by a methyl group, to form 6-methylguanine. This can no longer pair with cytosine. Instead, it will pair with thymine. Thus, after the next replication the opposite cytosine (C) is replaced by a thymine (T) in the mutant daughter molecule. Important alkylating agents are ethylnitrosourea (ENU), ethylmethane sulfonate (EMS), dimethylnitrosamine, and N-methyl-N-nitro- N-nitrosoguanidine.

  • Nucleotide base analogue

    Base analogs are purines or pyrimidines that are similar enough to the regular nucleotide DNA bases to be incorporated into the new strand during replication. 5-Bromodeoxyuridine (5- BrdU) is an analog of thymine. It contains a bromine atom instead of the methyl group in position 5. Thus, it can be incorporated into the new DNA strand during replication. However, the presence of the bromine atom causes ambiguous and often wrong base pairing.

  • UV-light-induced thymine dimers

    Ultraviolet irradiation at 260 nm wavelength induces covalent bonds between adjacent thymine residues at carbon positions 5 and 6. If located within a gene, this will interferewith replication and transcription unless repaired. Another important type of UV-induced change is a photoproduct consisting of a covalent bond between the carbons in positions 4 and 6 of two adjacent nucleotides, the 4–6 photoproduct (not shown). (Figures redrawn from Lewin, 2000).
  • References

    Brown, T.A.: Genomes. Bios Scientific Publ., Oxford, 1999. Lewin, B.: Genes VII. Oxford Univ. Press, Oxford, 2000. Strachan, T., Read, A.P.: Human Molecular Genetics. 2nd ed. Bios Scientific Publishers, Oxford, 1999.

DNA Polymorphism
Genetic polymorphism is the existence of variants with respect to a gene locus (alleles), a chromosome structure (e.g., size of centromeric heterochromatin), a gene product (variants in enzymatic activity or binding affinity), or a phenotype. The term DNA polymorphism refers to a wide range of variations in nucleotide base composition, length of nucleotide repeats, or single nucleotide variants. DNA polymorphisms are important as genetic markers to identify and distinguish alleles at a gene locus and to determine their parental origin.
  • Single nucleotide polymorphism (SNP)

    These allelic variants differ in a single nucleotide at a specific position. At least one in a thousand DNA bases differs among individuals (1). The detection of SNPs does not require gel electrophoresis. This facilitates large-scale detection. A SNP can be visualized in a Southern blot as a restriction fragment length polymorphism (RFLP) if the difference in the two alleles corresponds to a difference in the recognition site of a restriction enzyme (see Southern blot, p. 62).

  • Simple sequence length polymorphism (SSLP)

    These allelic variants differ in the number of tandemly repeated short nucleotide sequences in noncoding DNA. Short tandem repeats (STRs) consist of units of 1, 2, 3, or 4 base pairs repeated from 3 to about 10 times. Typical short tandem repeats are CA repeats in the 5! to 3! strand, i.e., alternating CG and AT base pairs in the double strand. Each allele is defined by the number of CA repeats, e.g., 3 and 5, as shown (1). These are also called microsatellites. The size differences due to the number of repeats are determined by PCR. Variable number of tandem repeats (VNTR), also called minisatellites, consist of repeat units of 20–200 base pairs (2).

  • Detection of SNP by oligonucleotide hybridization analysis

    Oligonucleotides, short stretches of about 20 nucleotides with a complementary sequence to the single-stranded DNA to be examined, will hybridize completely only if perfectly matched. If there is a difference of even one base, such as due to an SNP, the resulting mismatch can be detected because the DNA hybrid is unstable and gives no signal.

  • Detection of STRs by PCR

    Short tandem repeats (STRs) can be detected by the polymerase chain reaction (PCR). The allelic regions of a stretch ofDNA are amplified; the resulting DNA fragments of different sizes are subjected to electrophoresis; and their sizes are determined.

  • CEPH families

    An important step in gene identification is the analysis of large families by linkage analysis of polymorphic marker loci on a specific chromosomal region near a locus of interest. Large families are of particular value. DNA from such families has been collected by the Centre pour l’Étude du Polymorphisme Humain (CEPH) in Paris, now called the Centre Jean Dausset, after the founder. Immortalized cell lines are stored from each family. A CEPH family consists of four grandparents, the two parents, and eight children. If four alleles are present at a given locus they are designated A, B, C, and D. Starting with the grandparents, the inheritance of each allele through the parents to the grandchildren can be traced (shown here as a schematic pattern in a Southern blot). Of the four grandparents shown, three are heterozygous (AB, CD, BC) and one is homozygous (CC). Since the parents are heterozygous for different alleles (AD the father and BC the mother), all eight children are heterozygous (BD, AB, AC, or CD).

  • References

    Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
    Collins, F.S. , Guyer, M.S. , Chakravarti, A.: Variations
    on a theme: cataloguing human DNA
    sequence variation. Science 282:682–689,
    Deloukas, P., Schuler, G., Gyapay, G., et al.: A
    physical map of 30,000 human genes.
    Science 282:744–746, 1998.
    Lewin, B.: Genes VII. Oxford Univ. Press, Oxford,
    Strachan, T., Read, A.P.: Human Molecular
    Genetics. 2nd ed. Bios Scientific Publishers,
    Oxford, 1999.

Recombination lends the genome flexibility. Without genetic recombination, the genes on each individual chromosome would remain fixed in their particular position. Changes could occur by mutation only, which would be hazardous. Recombination provides the means to achieve extensive restructuring, eliminate unfavorable mutation, maintain and spread favorable mutations, and endow each individual with a unique set of genetic information. This greatly enhances the evolutionary potential of the genome. Recombination must occur between precisely corresponding sequences (homologous recombination) to ensure that not one base pair is lost or added. The newly combined (recombined) stretches of DNA must retain their original structure in order to function properly. Two types of recombination can be distinguished: (1) generalized or homologous recombination, which in eukaryotes occurs at meiosis (see p. 116) and (2) site-specific recombination. A third process, transposition, utilizes recombination to insert one DNA sequence into another without regard to sequence homology. Here we consider homologous recombination, a complex biochemical reaction between two duplexes of DNA. The necessary enzymes, which can involve any pair of homologous sequences, are not considered. Two general models can be distinguished, recombination initiated from a single-strand DNA break and recombination initiated from a double-strand break.
  • Recombination initiated by single-strand breaks

    This model assumes that the process starts with breaks at corresponding positions of one of the strands of homologousDNA (same sequences of different parental origin) (1). A nick ismade by a single-strand-breaking enzyme (endonuclease) in each molecule at the corresponding site (2), but see below. This allows the free ends of one nicked strand to join with the free ends of the other nicked strand, from the other molecule, to form single-strand exchanges between the two duplex molecules at the recombination joint (3). The recombination joint moves along the duplex (branchmigration) (4). This is an important feature because it ensures that sufficient distance for the second nick is present in each of the other strands (5). After the two other strands have joined and gaps have been sealed (6), a reciprocal recombinant molecule is generated (7). Recombination involving DNA duplexes requires topological changes, i.e., either the molecules must be free to rotate or the restraint must be relieved in some other way. This model has an unresolved difficulty: How is it assured that the single-strand nicks shown in step 2 occur at precisely the same position in the two double helix DNA molecules?

  • Recombination initiated by double-strand breaks

    The current model for recombination is based on initial double-strand breaks in one of the two homologous DNA molecules (1). Both strands are cleaved by an endonuclease, and the break is enlarged to a gap by an exonuclease that removes the new 5! ends of the strands at the break and leaves 3! single-stranded ends (2). One free 3! end recombines with a homologous strand of the other molecule (3). This generates a D loop consisting of a displaced strand from the “donor” duplex. The D loop is extended by repair synthesis until the entire gap of the recipient molecule is closed (4). This displaced strand anneals to the single-stranded complementary homologous sequences of the recipient strand and closes the gap (5). DNA repair synthesis from the other 3! end closes the remaining gap (6). The integrity of the two molecules is restored by two rounds of singlestrand repair synthesis. In contrast to the single-strand exchange model, the doublestrand breaks result in heteroduplex DNA in the entire region that has undergone recombination. An apparent disadvantage is the temporary loss of information in the gaps after the initial cleavage. However, the ability to retrieve this information by resynthesis from the other duplex avoids permanent loss. (Figures redrawn from Lewin, 2000).

  • References

    Alberts, B. et al.: Essential Cell Biology. An Introduction
    to the Molecular Biology of the Cell.
    Garland Publishing, New York, 1998.
    Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
    Lewin, B.: Genes VII. Oxford Univ. Press, Oxford,

Aside from homologous recombination, the overall stability of the genome is interrupted by mobile sequences called transposable elements or transposons. There are different classes of distinct DNA sequences that are able to transport themselves to other locations within the genome. This process utilizes recombination but does not result in an exchange. Rather, a transposon moves directly from one site of the genome to another without an intermediary such as plasmid or phage DNA (see section on prokaryotes). This results in rearrangements that create newsequences and change the functions of target sequences. Transposons may be a major source of evolutionary changes in the genome. In some cases they cause disease when inserted into a functioning gene. Three examples are presented below: insertion sequences (IS), transposons (Tn), and retroelements transposing via an RNA intermediate.
  • Insertion sequences (IS) and transposons (Tn)

    A characteristic feature of IS transposition is the presence of a pair of short direct repeats of target DNA at either end. The IS itself carries inverted repeats of about 9–13 bp at both ends and depending on the particular class consists of about 750–1500 bp, which contain a single long coding region for transposase (the enzyme responsible for transposition of mobile sequences). Target selection is either random or at particular sites. The presence of inverted terminal repeats and the short direct repeats of host DNA result in a characteristic structure (1). Transposons carry in addition a central region with genetic markers unrelated to transposition, e.g., antibiotic resistance (2). They are flanked either by direct repeats (same direction) or by inverted repeats (opposite direction, 3).

  • Replicative and nonreplicative transposition

    With replicative transposition (1) the original transposon remains in place and creates a new copy of itself, which inserts into a recipient site elsewhere. Thus, this mechanism leads to an increase in the number of copies of the transposon in the genome. This type involves two enzymatic activities: a transposase acting on the ends of the original transposon and resolvase acting on the duplicated copies. In nonreplicative transposition (2) the transposing element itself moves as a physical entity directly to another site. The donor site is either repaired (in eukaryotes) or may be destroyed (in bacteria) if more than one copy of the chromosome is present.

  • Transposition of retroelements

    Retrotransposition requires synthesis of an RNA copy of the inserted retroelement. Retroviruses including the human immunodeficiency virus and RNA tumor viruses are important retroelements (see p. 100 and p. 314). The first step in retrotransposition is the synthesis of an RNA copy of the inserted retroelement, followed by reverse transcription up to the polyadenylation sequence in the 3! long terminal repeat (3! LTR). Three important classes of mammalian transposons that undergo or have undergone retrotransposition through an RNA intermediary are shown. Endogenous retroviruses (1) are sequences that resemble retroviruses but cannot infect new cells and are restricted to one genome. Nonviral retrotransposons (2) lack LTRs and usually other parts of retroviruses. Both types contain reverse transcriptase and are therefore capable of independent transposition. Processed pseudogenes (3) or retropseudogenes lack reverse transcriptase and cannot transpose independently. They contain two groups: low copy number of processed pseudogenes transcribed by RNA polymerase II and high copy number of mammalian SINE sequences, such as human Alu and the mouse B1 repeat families.

  • References

    Brown, T.A.: Genomes. Bios Scientific Publ., Oxford,
    Lewin, B.: Genes VII. Oxford Univ. Press, Oxford,
    Strachan, T., Read, A.P.: Human Molecular
    Genetics. 2nd ed. Bios Scientific Publishers,
    Oxford, 1996.

Trinucleotide Repeat Expansion
The human genome contains tandem repeats of trinucleotides.Normally they occur in groups of 5–35 repeats. When their number exceeds a certain threshold and they occur in a gene or close to it, they cause diseases. Once the normal, variable length has expanded, the increased number of repeats tends to increase even further when passed through the germline or during mitosis. Thus, trinucleotide expansions form a class of unstable mutations, to date observed in humans only.
  • Different types of trinucleotide repeats and their expansions

    Trinucleotide repeats can be distinguished according to their localization with respect to a gene. Expansions are greater outside genes and more moderate within coding regions. In several severe neurological diseases, abnormally expanded CAG repeats are part of the gene. CAG repeats encode a series of glutamines (polyglutamine tracts). Within a normal number of repeats, which varies according to the gene involved, the gene functions normally (1). However, an expanded number of repeats leads to an abnormal gene product with altered function. Trinucleotide repeats also occur in noncoding regions of a gene (2). Fairly common types are CGG and GCC repeats. The increase in the number of these repeats can be drastic, up to 1000 or more repeats. The first stages of expansion usually do not lead to clinical signs of a disease, but they do predispose to increased expansion of the repeat in the offspring of a carrier (premutation).

  • Unstable trinucleotide repeats in different diseases

    Disorders due to expansion of trinucleotide repeats can be distinguished according to the type of trinucleotide repeat, i.e., the sequence of the three nucleotides, their location with respect to the gene involved, and their clinical features. All involve the central or the peripheral nervous system. Type I trinucleotide diseases are characterized by CAG trinucleotide expansion within the coding region of different genes. The triplet CAG codes for glutamine. About 20 CAG repeats occur normally in these genes, so that about 20 glutamines occur in the gene product. In the disease state the number of glutamines is greatly increased in the protein. Hence, they are collectively referred to as polyglutamine disorders. Type II trinucleotide diseases are characterized by expansion of CTG, GAA, GCC, or CGG trinucleotides within a noncoding region of the gene involved, either at the 5! end (GCC in fragile X syndrome type A, FRAXA), at the 3! end (CGG in FRAXE; CTG inmyotonic dystrophy), or in an intron (GAA in Friedreich ataxia). A brief reviewof these disorders is given on p. 394.

  • Principle of laboratory diagnosis of unstable trinucleotide repeats

    The laboratory diagnosis compares the sizes of the trinucleotide repeats in the two alleles of the gene examined. One can distinguish very large expansions of repeats outside coding sequences (50 to more than 1000 repeats) and moderate expansion within coding sequences (20 to 100–200). The figure shows 11 lanes, each representing one person: normal controls in lanes 1–3; confirmed patients in lanes 4–6; and a family with an affected father (lane 7), an affected son (lane 10), the unaffected mother (lane 11), and two unaffected children: a son (lane 8) and a daughter (lane 9). Size markers are shown at the left. Each lane represents a polyacrylamide gel and the (CAG)n repeat of the Huntington locus amplified by polymerase chain reaction shown as a band of defined size. Each person shows the two alleles. In the affected persons the band representing one allele lies above the threshold in the expanded region (in practice the bands are somewhat blurred because the exact repeat size varies in DNA from different cells).

  • References

    Strachan, T., Read, A.P.: Human Molecular
    Genetics. 2nd ed. Bios Scientific Publishers,
    Oxford, 1999.
    Warren, S. T.: The expanding world of trinucleotide
    repeats. Science 271: 1374–1375,
    Rosenberg, R.N.: DNA-triplet repeats and neurologic
    disease.NewEng. J.Med. 335: 1222–
    1224, 1996.
    Zoghbi, H.Y.: Spinocerebellar ataxia and other
    disorders of trinucleotide repeats, pp. 913–
    920, In: Principles of Molecular Medicine,
    J.C. Jameson, ed. Humana Press, Totowa, NJ,

DNA Repair
Lifewould not be possible without the ability to repair damaged DNA. Since replication errors, including mismatch, and harmful exogenous factors are everyday problems for a living organism, a broad repertoire of repair genes has evolved in prokaryotes and eukaryotes. The following types of DNA repair can be distinguished by their basic mechanisms: (1) excision repair to remove a damaged DNA site, such as a strand with a thymine dimer; (2) mismatch repair to correct errors of replication by excising a stretch of single-stranded DNA containing the wrong base; (3) repair of UV-damaged DNA during replication; and (4) transcriptioncoupled repair in active genes.
  • Excision repair

    The damaged strand of DNA is distorted and can be recognized by a set of three proteins, the UvrA, UvrB, and UvrC endonucleases in prokaryotes and XPA, XPB, and XPC in human cells. This DNA strand is cleaved on both sides of the damage by an exonuclease protein complex, and a stretch of about 12 or 13 nucleotides in prokaryotes and 27 to 29 nucleotides in eukaryotes is removed. DNA repair synthesis restores the missing stretch and a DNA ligase closes the gap.

  • Mismatch repair

    Mismatch repair corrects errors of replication. However, the newly synthesized DNA strand containing the wrong base must be distinguished fromthe parent strand, and the site of a mismatch identified. The former is based on a difference in methylation in prokaryotes. The daughter strand is undermethylated at this stage. E. coli has three mismatch repair systems: long patch, short patch, and very short patch. The long patch systemcan replace 1 kb DNA and more. It requires three repair proteins, MutH, MutL, and MutS, which have the human homologues hMSH1, hMLH1, and hMSH2. Mutations in their respective genes lead to cancer due to defective mismatch repair.

  • Replication repair of UV-damaged DNA

    DNA damage interferes with replication, especially in the leading strand. Large stretches remain unreplicated beyond the damaged site (in the 3! direction of the new strand) unless swiftly repaired. The lagging strand is not affected as much because Okazaki fragments (about 100 nucleotides in length) of newly synthesized DNA are also formed beyond the damaged site. This leads to an asymmetric replication fork and single-stranded regions of the leading strand. Aside from repair by recombination, the damaged site can be bypassed.

  • Double-strand repair by homologous recombination

    Double-strand damage is a common consequence of ! radiation. An important human pathway for mediating repair requires three proteins, encoded by the genes ATM, BRCA1, and BRCA2. Their names are derived fromimportant diseases that result from mutations in these genes: ataxia telangiectasia (see p. 334) and hereditary predisposition to breast cancer (BRCA1 and BRCA2, see p. 328. ATM, a member of a protein kinase family, is activated in response to DNA damage (1). Its active form phosphorylates BRCA1 at specific sites (2). Phosphorylated BRCA1 induces homologous recombination in cooperation with BRCA2 and mRAD5, the mammalian homologue of E. coli RecA repair protein (3). This is required for efficient DNA double-break repair. Phosphorylated BRCA1 may also be involved in transcription and transcription-coupled DNA repair (4). (Figure redrawn from Ventikaraman, 1999).

  • References

    Buermeyer, A.B. et al.: Mammalian DNA mismatch
    repair. Ann. Rev. Genet. 33:533–564,
    Cleaver, J.E.: Stopping DNA replication in its
    tracks. Science 285:212–213, 1999.
    Cortez D., et al.: Requirement of ATM-dependent
    phosphorylation of Brca1 in the DNA
    damage response to double-strand breaks.
    Science 286:1162–1166, 1999.
    Masutani, C., et al.: The XPV (xeroderma pigmentosum
    variant) gene encodes human
    DNA polymerase. Nature 399:700–704,
    Sancar, A.: Excision repair invades the territory
    of mismatch repair. Nature Genet. 21:247–
    249, 1999.
    Ventikaraman A.R.: Breast cancer genes and
    DNA repair. Science 286:1100–1101, 1999.

Xeroderma Pigmentosum
Xeroderma pigmentosum (XP) is a heterogeneous group of genetically determined skin disorders due to unusual sensitivity to ultraviolet light. They are manifested by dryness and pigmentation of the exposed regions of skin (xeroderma pigmentosum = “dry, pigmented skin”). The exposed areas of skin also show a tendency to develop tumors. The causes are different genetic defects of DNA repair. Repair involves mechanisms similar to those involved in transcription and replication. The necessary enzymes are encoded by at least a dozen genes, which are highly conserved in bacteria, yeast, and mammals.
  • Clinical phenotype

    The skin changes are limited to UV-exposed areas (1 and 2). Unexposed areas show no changes. Thus it is important to protect patients from UV light. An especially important feature is the tendency for multiple skin tumors to develop in the exposed areas (3). These may even occur in childhood or early adolescence. The types of tumors are the same as those occurring in healthy individuals after prolonged UV exposure.

  • Cellular phenotype

    The UV sensitivity of cells can be demonstrated in vitro. When cultured fibroblasts from the skin of patients are exposed to UV light, the cells show a distinct dose-dependent decrease in survival rate compared with normal cells (1). Different degrees of UV sensitivity can be demonstrated. The short segment of new DNA normally formed during excision repair can be demonstrated by culturing cells in the presence of [3H]thymidine and exposing them to UV light. The DNA synthesis induced for DNA repair can be made visible in autoradiographs. Since [3H]thymidine is incorporated during DNA repair, these bases are visible as small dots caused by the isotope on the film (2). In contrast, xeroderma (XP) cells show markedly decreased or almost absent repair synthesis. (Photograph of Bootsma & Hoeijmakers, 1999).

  • Genetic complementation in cell hybrids

    If skin cells (fibroblasts) from normal persons and from patients (XP) are fused (cell hybrids) in culture and exposed to UV light, the cellular XP phenotype will be corrected (1). Normal DNA repair occurs. Also, hybrid cells from two different forms of XP shownormal DNA synthesis (2) because cells with different repair defects correct each other (genetic complementation). However, if the mutant cells have the same defect (3), they are not be able to correct each other (4) because they belong to the same complementation group. At present about ten complementation groups are known in xeroderma pigmentosum. They differ clinically in terms of severity and central nervous system involvement. Each complementation group is based on a mutation at a different gene locus. Several of these genes have been cloned and show homology with repair genes of other organisms, including yeast and bacteria.

  • References

    Berneburg, M. et al.: UV damage causes uncontrolled
    DNA breakage in cells from patients
    with combined features of XP-D and Cockayne
    syndrome. Embo J. 19:1157–1166,
    Bootsma, D.A., Hoeijmakers, J.H.J.: The genetic
    basis of xeroderma pigmentosum. Ann.
    Génét. 34:143–150, 1991.
    Cleaver, J.E., et al.: A summary of mutations in
    the UV-sensitive disorders: xeroderma pigmentosum,
    Cockayne syndrome, and trichothiodystrophy.
    Hum. Mutat. 14:9–22,
    Cleaver, J.E.: Common pathways for ultraviolet
    skin carcinogenesis in the repair and replication
    defective groups of xeroderma pigmentosum.
    J. Dermatol. Sci. 23:1–11, 2000.
    de Boer, J., Hoeijmakers J.H.: Nucleotide excision
    repair and human syndromes. Carcinogenesis
    21:453–460, 2000.
    Hanawalt, P.C.: Transcription-coupled repair
    and human diseases. Science 266:1957–
    1958, 1994.
    Sancar, A.: Mechanisms of DNS excision repair.
    Science 266:1954–1956, 1994.
    Taylor, E.M., et al.: Xeroderma pigmentosum
    and trichothiodystrophy are associated
    with different mutations in the XPD
    (ERCC2). Proc. Natl. Acad. Sci. 94: 8658–
    8663, 1997.

Eastern Biotech is proud to announce its own diagnostic laboratory in Dubai, Eastern Clinical Laboratories, to serve its clients in the UAE & GCC.

Lab Timings 09:00 AM to 09:00 PM
Eastern Clinical Laboratories
P.O.Box 212671 Elite Business Center, 1st Floor, Office No. 105 Al Barsha,
Behind Mall of Emirates, Dubai, UAE
Home | Resources | Insurance | Privacy Policy | Delivery Policy | Return Policy
FAQ's | Testimonials | Feedback Form | Web Mail | Contact Us
© 2014 Eastern Biotech    
ISO 9001:2008 ISO 9001:2008
Premium Paternity Test Super Paternity Test Grandparent-Grandchild Test Aunt/Uncle DNA Test 1st Cousin Analysis Brother/Brother Sister/Sister Sister/Brother