Difference between revisions of "DNA" - New World Encyclopedia

From New World Encyclopedia
(Added article form Wikipedia and credit and category tags)
 
 
(84 intermediate revisions by 8 users not shown)
Line 1: Line 1:
[[Image:DNA_Overview.png|thumb|270px|The general structure of a section of DNA]]
+
{{Approved}}{{Images OK}}{{Submitted}}{{Paid}}{{copyedited}}
'''Deoxyribonucleic acid''' ('''DNA''') is a [[nucleic acid]] —usually in the form of a double [[helix]]— that contains the [[genetics|genetic]] instructions specifying the [[developmental biology|biological development]] of all [[Cell (biology)|cellular]] forms of [[life]], and most [[virus]]es.  DNA is a long [[polymer]] of [[nucleotides]] and encodes the sequence of the [[amino acid residue]]s in [[protein]]s using the [[genetic code]], a triplet code of [[nucleotide]]s.
 
  
In complex [[eukaryote|eukaryotic]] [[Cell (biology)|cells]] such as those from [[plant]]s, [[animal]]s, [[fungi]] and [[protist]]s, most of the DNA is located in the [[cell nucleus]]. By contrast, in simpler cells called [[prokaryotes]], including the [[bacterium|eubacteria]] and [[archaea]], DNA is not separated from the [[cytoplasm]] by a [[nuclear envelope]]. The cellular [[organelle]]s known as [[chloroplast]]s and [[mitochondria]] also carry DNA.
+
[[Image:DNA Overview.png|thumb|300px|The structure of part of a DNA double helix]]
 +
'''Deoxyribonucleic acid''' '''(DNA)''' is a [[nucleic acid]] that contains the [[genetics|genetic]] instructions used in the [[developmental biology|development]] and functioning of all known [[life|living organisms]]. The main role of DNA [[molecule]]s is the long-term storage of [[information]]. DNA is often compared to a set of [[blueprint]]s, since it contains the instructions needed to construct other components of [[cell (biology)|cell]]s, such as [[protein]]s and [[RNA]] molecules. The DNA segments that carry this genetic information are called [[gene]]s, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information.
  
DNA is often referred to as the molecule of [[heredity]] as it is responsible for the genetic propagation of most [[biological inheritance|inherited]] [[Trait (biological)|trait]]s. In humans, these traits can range from hair colour to disease susceptibility. During [[cell division]], DNA is [[DNA replication|replicated]] and can be transmitted to offspring during [[reproduction]]. [[Kinship and descent|Lineage]] studies can be done based on the facts that the [[mitochondrial DNA]] only comes from the mother, and the male [[Y chromosome]] only comes from the father.
+
Chemically, DNA is a long [[polymer]] of simple units called [[nucleotide]]s, with a backbone made of sugars (deoxyribose) and phosphate groups joined by [[ester]] bonds. Attached to each sugar is one of four types of molecules called [[nucleobase|bases]]. It is the sequence of these four bases along the backbone that encodes information. This information is read using the [[genetic code]], which specifies the sequence of the [[amino acid]]s within proteins. The code is read by copying stretches of DNA into the related nucleic acid RNA, in a process called [[transcription (genetics)|transcription]]. Most of these RNA molecules are used to synthesize proteins, but others are used directly in structures such as [[ribosome]]s and [[spliceosome]]s. RNA also serves as a a genetic blueprint for certain viruses.
  
Every person's DNA, their [[genome]], is inherited from both parents. The mother's [[mitochondrial DNA]] together with twenty-three [[chromosome]]s from each parent combine to form the genome of a [[zygote]], the [[fertilization|fertilized]] [[ovum|egg]]. As a result, with certain exceptions such as [[red blood cell]]s, most human cells contain 23 pairs of chromosomes, together with mitochondrial DNA inherited from the mother.
+
Within cells, DNA is organized into structures called [[chromosome|chromosomes]]. These chromosomes are duplicated before cells [[cell division|divide]], in a process called [[DNA replication]]. [[Eukaryote|Eukaryotic organisms]] such as [[animal]]s, [[plant]]s, and [[fungi]] store their DNA inside the [[cell nucleus]], while in [[prokaryote]]s such as [[bacteria]], which lack a cell nucleus, it is found in the cell's [[cytoplasm]]. Within the chromosomes, [[chromatin]] proteins such as [[histone]]s compact and organize DNA, which helps control its interactions with other proteins and thereby control which [[genes]] are transcribed. Some eukaryotic cell organelles, [[mitochondria]] and [[chloroplast]]s, also contain DNA, giving rise to the endosymbionic theory that these organelles may have arisen from prokaryotes in a symbionic relationship.
  
==Overview==
+
The identification of DNA, combined with human creativity, has been of tremendous importance not only for understanding [[life]] but for practical applications in [[medicine]], [[agriculture]], and other areas. Technologies have been developed using [[recombinant DNA]] to mass produce medically important proteins, such as [[insulin]], and have found application in agriculture to make plants with desirable qualities. Through understanding the [[allele]]s that one is carrying for particular genes, one can gain an understanding of the probability that one's offspring may inherent certain genetic disorders, or one's own predisposition for a particular [[disease]]. DNA technology is used in forensics, anthropology, and many other areas as well.
[[Image:DNA123.png|thumb|right|125px|Space-filling model of a section of DNA molecule]]
+
{{toc}}
[[Image:Dna_pairing_aa.gif|thumb|300px|DNA base pairing]]
+
DNA and the biological processes centered on its activities (translation, transcription, replication, [[genetic recombination]], and so forth) are amazing in their complexity and coordination. The presence of DNA also reflects on the unity of life, since organisms share nucleic acids as genetic blueprints and share a nearly universal genetic code. On the other hand, the discovery of DNA has at times led to an overemphasis on DNA to the point of believing that life can be totally explained by physico-chemical processes alone.
  
Contrary to a common misconception, the DNA is not a single molecule, but rather a pair of molecules joined by [[hydrogen bond]]s: it is organized as two complementary strands, head-to-toe, with the hydrogen bonds between them. Each strand of DNA is a chain of chemical "building blocks", called [[nucleotide]]s, of which there are four types: [[adenine]] (abbreviated A), [[cytosine]] (C), [[guanine]] (G) and [[thymine]] (T). (Thymine should not be confused with [[thiamine]], which is vitamin B<sub>1</sub>.) In some organisms, most notably the PBS1 [[phage]], [[Uracil]] (U) replaces T in the organism's DNA.<ref>I. Takahashi and J. Marmur. Replacement of thymidylic acid by deoxyuridylic acid in the deoxyribonucleic acid of a transducing phage for Bacillus subtilis. ''Nature'' 197, 794&ndash;795, 1963.</ref> These allowable base components of nucleic acids can be [[polymerized]] in any order giving the molecules a high degree of uniqueness.
+
==History==
 +
[[Image:Francis Crick.png|thumb|400px|right|[[Francis Crick]]]]
  
Between the two strands, each base can only "pair up" with one single predetermined other base: A+T, T+A, C+G and G+C are the only possible combinations; that is, an "A" on one strand of double-stranded DNA will "mate" properly only with a "T" on the other, complementary strand; therefore, naming the bases on the conventionally chosen side of the strand is enough to describe the entire double-strand sequence. Two nucleotides paired together are called a [[base pair]]. On rare occasions, wrong pairing can happen, when [[thymine]] goes into its [[enol]] form or [[cytosine]] goes into its [[imino]] form. The double-stranded structure of DNA provides a simple mechanism for [[DNA replication]]: the DNA double strand is first "unzipped" down the middle, and the "other half" of each new single strand is recreated by exposing each half to a mixture of the four bases. An enzyme makes a new strand by finding the correct base in the mixture and pairing it with the original strand. In this way, the base on the old strand dictates which base will be on the new strand, and the cell ends up with an extra copy of its DNA.
+
[[File:James Watson 2012 TTChao Symposium.jpg|thumb|350px|right|[[James D. Watson|James Watson]] in 2012]]
  
DNA contains the genetic [[information]] that is inherited by the offspring of an organism; this information is determined by the [[DNA sequence|sequence]] of base pairs along its length. A strand of DNA contains [[gene]]s, areas that [[gene regulation|regulate genes]], and areas that either have no function, or a function [[junk DNA|yet unknown]]. Genes can be loosely viewed as the organism's "cookbook" or "blueprint".
+
DNA was first isolated by the [[Switzerland|Swiss]] physician [[Friedrich Miescher]] who, in 1869, discovered a microscopic substance in the [[pus]] of discarded surgical bandages. As it resided in the nuclei of cells, he called it "nuclein."<ref>R. Dahm, "Friedrich Miescher and the discovery of DNA," ''Dev Biol'' 278 (2008): 274-88. PMID 15680349.</ref> In 1919, this discovery was followed by [[Phoebus Levene]]'s identification of the base, sugar, and phosphate nucleotide unit.<ref>P. Levene, "The structure of yeast nucleic acid," ''J Biol Chem'' 40(2) (1919):415-424. </ref> Levene suggested that DNA consisted of a string of nucleotide units linked together through the phosphate groups. However, Levene thought the chain was short and the bases repeated in a fixed order. In 1937, [[William Astbury]] produced the first [[X-ray diffraction]] patterns that showed that DNA had a regular structure.<ref>W. Astbury, "Nucleic acid," ''Symp. SOC. Exp. Bbl'' 1 (1947).</ref>
  
[[Image:DNA Under electron microscope Image 3576B-PH.jpg|thumb|left|250px|DNA Under an electron microscope]]
+
In 1928, [[Frederick Griffith]] discovered that [[trait (biology)|traits]] of the "smooth" form of the ''Pneumococcus'' bacteria could be transferred to the "rough" form of the same bacteria by mixing killed "smooth" bacteria with the live "rough" form.<ref>M. G. Lorenze and W. Wackernagel, "Bacterial gene transfer by natural genetic transformation in the environment," ''Microbiol. Rev.'' 58(1994): 563–602. PMID 7968924. </ref> This system provided the first clear suggestion that DNA carried genetic information, when [[Oswald Theodore Avery]], along with coworkers [[Colin MacLeod]] and [[Maclyn McCarty]], identified DNA as the [[transforming principle]] in 1943.<ref>O. Avery, C. MacLeod, and M. McCarty, "Studies on the chemical nature of the substance inducing transformation of pneumococcal types. Inductions of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III," ''J Exp Med'' 79 (1944): 137-158. </ref> DNA's role in [[heredity]] was confirmed in 1953, when [[Alfred Hershey]] and [[Martha Chase]], in the [[Hershey-Chase experiment]], showed that DNA is the [[genetic material]] of the [[T2 phage]].<ref>A. Hershey and M. Chase, "Independent functions of viral protein and nucleic acid in growth of bacteriophage," ''J Gen Physiol'' 36(1952): 39-56. PMID 12981234.</ref>
  
Other interesting points:
+
In 1953, based on [[Photo 51|X-ray diffraction images]] taken by [[Rosalind Franklin]] and the information that the bases were paired, [[James D. Watson]] and [[Francis Crick]] suggested what is now accepted as the first accurate model of [[Molecular structure of Nucleic Acids|DNA structure]] in the journal ''Nature''.<ref name=FWPUB>J. D. Watson and F. H. C. Crick, "Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid," ''Nature'' 171 (1953): 737-738.</ref> Experimental evidence for Watson and Crick's model were published in a series of five articles in the same issue of ''Nature''.<ref name=NatureDNA50>Nature Archives, "Double helix of DNA: 50 Years," ''Nature'' (2003). </ref> Of these, Franklin and [[Raymond Gosling]]'s paper was the first publication of X-ray diffraction data that supported the Watson and Crick model,<ref name=NatFranGos>R. Franklin, and R. G. Gosling, "Molecular configuration in sodium thymonucleate," ''Nature'' 171 (1953):740-741. </ref> This issue also contained an article on DNA structure by [[Maurice Wilkins]] and his colleagues.<ref name=NatWilk>M. H. F. Wilkins, A. R. Stokes, and H. R. Wilson, "Molecular structure of deoxypentose nucleic acids," ''Nature'' 171 (1953): 738-740. </ref> In 1962, after Franklin's death, Watson, Crick, and Wilkins jointly received the [[Nobel Prize]] in [[Nobel Prize in Physiology or Medicine|Physiology or Medicine]]. However, speculation continues on who should have received credit for the discovery, as it was based on Franklin's data.
  
 +
In an influential presentation in 1957, Crick laid out the [[central dogma of molecular biology|"Central Dogma" of molecular biology]], which foretold the relationship between DNA, RNA, and proteins, and articulated the "adaptor hypothesis". Final confirmation of the replication mechanism that was implied by the double-helical structure followed in 1958 through the [[Meselson-Stahl experiment]].<ref>M. Meselson, and F. Stahl, "The replication of DNA in ''Escherichia coli'', ''Proc Natl Acad Sci U S A'' 44 (1958): 671-682. PMID 16590258.</ref> Further work by Crick and coworkers showed that the genetic code was based on non-overlapping triplets of bases, called codons, allowing [[Har Gobind Khorana]], [[Robert W. Holley]], and [[Marshall Warren Nirenberg]] to decipher the [[genetic code]].<ref>Nobel Foundation, [https://www.nobelprize.org/prizes/medicine/1968/summary/ "The Nobel Prize in Physiology or Medicine 1968,"] ''Nobelprize.org''. Retrieved January 23, 2023.</ref> These findings represent the birth of [[molecular biology]].
  
 +
==Physical and chemical properties==
 +
[[Image:DNA chemical structure.svg|right|thumb|350px|The chemical structure of DNA.]]
  
* DNA is an acid because of the phosphate groups between each deoxyribose.  This is the primary reason why DNA has a negative charge.
+
DNA is a long [[polymer]] made from repeating units called [[nucleotide]]s.<ref name="Alberts">B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, and P. Walters, ''Molecular Biology of the Cell,'' 4th edition. (New York: Garland Science, 2002, ISBN 0815332181).</ref><ref name=Butler>J. Butler, ''Forensic DNA Typing'' (San Diego: Academic Press, 2001, ISBN 9780121479510).</ref> The DNA chain is 22 to 26&nbsp;[[Ångström]]s wide (2.2 to 2.6&nbsp;[[nanometre]]s), and one nucleotide unit is 3.3&nbsp;Ångstroms (0.33&nbsp;nanometres) long.<ref>M. Mandelkern, J. Elias, D. Eden, and D. Crothers, "The dimensions of DNA in solution," ''J Mol Biol'' 152 (1981): 153–161.</ref> Although each individual repeating unit is very small, DNA polymers can be enormous molecules containing millions of nucleotides. For instance, the largest human [[chromosome]], chromosome number 1, is 220 million [[base pair]]s long.<ref>S. Gergory, et al., "The DNA sequence and biological annotation of human chromosome," ''Nature'' 441(7091) (2006):315-321. PMID 16710414.</ref>
* The "polarity" of each pair is important: A+T is not the same as T+A, just as C+G is not the same as G+C (note that "polarity" as such is never used in this context — it's just a suggestive way to get the idea across)
 
* [[Mutation]]s are chemical imperfections in this process, where a base is accidentally skipped, inserted, or incorrectly copied, or the chain is trimmed, or added to; many basic mutations can be described as combinations of these accidental "operations". Mutations can also occur through chemical damage (through [[mutagens]]), light ([[Ultraviolet|UV]] damage), or through other more complicated gene swapping events.
 
*[[Deoxyribozyme|DNA molecules that act as enzymes]] are known in laboratories, but none have been known to be found in life so far.
 
* In addition to the traditionally viewed duplex form of DNA, DNA can also acquire triplex and quadruplex forms. Here instead of the Watson-Crick base pairing, [[Hoogsteen base pair|Hoogsteen base pairing]] comes into the picture.
 
* DNA differs from [[ribonucleic acid]] (RNA) by having a sugar 2-deoxyribose instead of [[ribose]] in its backbone. This is the basic chemical distinction between RNA and DNA. In addition, in RNA, the nucleotides [[thymine]] (T) are replaced by [[uracil]] (U).
 
  
==DNA in practice==
+
In living organisms, DNA does not usually exist as a single molecule, but instead as a tightly-associated pair of molecules.<ref name=FWPUB/><ref name=berg>J. Berg, J. Tymoczko, and L. Stryer, ''Biochemistry'' (W. H. Freeman and Company, 2002, ISBN 0716749556).</ref> These two long strands entwine like vines, in the shape of a [[helix|double helix]]. The nucleotide repeats contain both the segment of the backbone of the molecule, which holds the chain together, and a base, which interacts with the other DNA strand in the helix. In general, a base linked to a sugar is called a [[nucleoside]] and a base linked to a sugar and one or more phosphate groups is called a [[nucleotide]]. If multiple nucleotides are linked together, as in DNA, this polymer is referred to as a [[polynucleotide]].
  
===DNA in crime===
+
The backbone of the DNA strand is made from alternating [[phosphate]] and [[carbohydrate|sugar]] residues.<ref name=Ghosh>A. Ghosh, and M. Basal, "A glossary of DNA structures from A to Z," ''Acta Crystallogr D Biol Crystallogr'' 59 (2003): 620–626. PMID 12657780.</ref> The sugar in DNA is 2-deoxyribose, which is a [[pentose]] (five-[[carbon]]) sugar. The sugars are joined together by phosphate groups that form [[phosphodiester bond]]s between the third and fifth carbon [[atom]]s of adjacent sugar rings. These asymmetric [[covalent bond|bonds]] mean a strand of DNA has a direction. In a double helix, the direction of the nucleotides in one strand is opposite to their direction in the other strand. This arrangement of DNA strands is called antiparallel. The asymmetric ends of DNA strands are referred to as the [[directionality (molecular biology)|5′]] ''(five prime)'' and 3′ ''(three prime)'' ends. One of the major differences between DNA and RNA is the sugar, with 2-deoxyribose being replaced by the alternative pentose sugar [[ribose]] in RNA.<ref name=berg/>
{{main|Genetic fingerprinting}}
 
[[Forensic science|Forensic scientists]] can use DNA located in [[blood]], [[semen]], [[skin]], [[saliva]] or hair left at the scene of a crime to identify a possible suspect, a process called [[genetic fingerprinting]] or DNA profiling. In DNA profiling the relative lengths of sections of repetitive DNA, such as [[short tandem repeats]] and [[minisatellite]]s, are compared.  DNA profiling was developed in 1984 by English geneticist [[Alec Jeffreys]] of the [[University of Leicester]], and was first used to convict Colin Pitchfork in 1988 in the [[Enderby murders]] case in [[Leicestershire]], [[England]].  Many jurisdictions require convicts of certain types of crimes to provide a sample of DNA for inclusion in a computerized database. This has helped investigators solve old cases where the perpetrator was unknown and only a DNA sample was obtained from the scene (particularly in [[rape]] cases between strangers). This method is one of the most reliable techniques for identifying a criminal, but is not always perfect, for example if no DNA can be
 
retrieved, or if the scene is contaminated with the DNA of several possible suspects.
 
  
===DNA in computation ===
+
The DNA double helix is stabilized by [[hydrogen bond]]s between the bases attached to the two strands. The four bases found in DNA are [[adenine]] (abbreviated A), [[cytosine]] (C), [[guanine]] (G), and [[thymine]] (T). These four bases are shown below and are attached to the sugar/phosphate to form the complete nucleotide, as shown for adenosine monophosphate.
DNA plays an important role in [[computer science]], both as a motivating research problem and as a method of computation in itself.
 
  
Research on [[string searching algorithm]]s, which find an occurrence of a sequence of letters inside a larger sequence of letters, was motivated in part by DNA research, where it is used to find specific sequences of nucleotides in a large sequence.<ref>Gusfield, Dan. ''Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology''. Cambridge University Press, 15 January [[1997]]. ISBN 0521585198.</ref> In other applications such as [[text editor]]s, even simple algorithms for this problem usually suffice, but DNA sequences cause these algorithms to exhibit near-worst-case behavior due to their small number of distinct characters.
+
These bases are classified into two types; adenine and guanine are fused five- and six-membered [[heterocyclic compound]]s called [[purine]]s, while cytosine and thymine are six-membered rings called [[pyrimidine]]s.<ref name=berg/> A fifth pyrimidine base, called [[uracil]] (U), usually takes the place of thymine in RNA and differs from thymine by lacking a [[methyl group]] on its ring. Uracil is not usually found in DNA, occurring only as a breakdown product of cytosine, but a very rare exception to this rule is a [[phage|bacterial virus]] called PBS1 that contains uracil in its DNA.<ref name="nature1963-takahashi">I. Takahashi, and J. Marmur, "Replacement of thymidylic acid by deoxyuridylic acid in the deoxyribonucleic acid of a transducing phage for Bacillus subtilis," ''Nature'' 197 (1963): 794–795. PMID 13980287.</ref> In contrast, following synthesis of certain RNA molecules, a significant number of the uracils are converted to thymines by the enzymatic addition of the missing methyl group. This occurs mostly on structural and enzymatic RNAs like [[transfer RNA]]s and [[ribosomal RNA]].<ref>P. Agris, "Decoding the genome: a modified view," ''Nucleic Acids Res'' 32 (2004): 223–238. PMID 14715921.</ref>
  
[[Database]] theory has been influenced by DNA research, which poses special problems for storing and manipulating DNA sequences. Databases specialized for DNA research are called [[genomic database]]s, and must address a number of unique technical challenges associated with the operations of approximate matching, sequence comparison, finding repeating patterns, and homology searching.
+
===Major and minor grooves===
 +
[[Image:DNA orbit animated small.gif|frame|right|Animation of the structure of a section of DNA. The bases lie horizontally between the two spiraling strands. Created from [http://www.rcsb.org/pdb/cgi/explore.cgi?pdbId=1D65 PDB 1D65].]]
  
In 1994, [[Leonard Adleman]] of the [[University of Southern California]] made headlines when he discovered a way of solving the directed [[Hamiltonian path problem]], an [[NP-complete]] problem, using tools from molecular biology, in particular DNA. The new approach, dubbed [[DNA computing]], has practical advantages over traditional computers in power use, space use, and efficiency, due to its ability to highly parallelize the computation (see [[parallel computing]]), although there is labor worth mentioning involved in retrieving the answers. A number of other problems, including simulation of various [[abstract machine]]s, the [[boolean satisfiability problem]], and the bounded version of the [[Post correspondence problem]], have since been analyzed using DNA computing.
+
The double helix is a right-handed spiral. As the DNA strands wind around each other, they leave gaps between each set of phosphate backbones, revealing the sides of the bases inside (see animation). There are two of these grooves twisting around the surface of the double helix: one groove, the major groove, is 22&nbsp;Å wide and the other, the minor groove, is 12&nbsp;Å wide.<ref>R. Wing, H. Drew, T. Takano, C. Broka, S. Tanaka, K. Itakura, and R. Dickerson, "Crystal structure analysis of a complete turn of B-DNA," ''Nature'' 287 (1980): 755–758. PMID 7432492.</ref> The narrowness of the minor groove means that the edges of the bases are more accessible in the major groove. As a result, proteins like [[transcription factor]]s that can bind to specific sequences in double-stranded DNA usually make contacts to the sides of the bases exposed in the major groove.<ref>C. Pabo, and R. Sauer, "Protein-DNA recognition," ''Annu Rev Biochem'' 53 (1984): 293–321. PMID 6236744.</ref>
  
Due to its compactness, DNA also has a theoretical role in [[cryptography]], where in particular it allows unbreakable [[one-time pad]]s to be efficiently constructed and used.<ref>Ashish Gehani, Thomas LaBean and John Reif. [http://citeseer.ist.psu.edu/gehani99dnabased.html DNA-Based Cryptography].
+
===Base pairing===
Proceedings of the 5th DIMACS Workshop on DNA Based Computers, Cambridge, MA, USA, 14&ndash;15 June 1999.</ref>
 
  
=== DNA in historical and anthropological study ===
+
<div class="thumb tright" style="background-color: #f9f9f9; border: 1px solid #CCCCCC; margin:0.5em;">
 +
{|border="0" width=230px border="0" cellpadding="2" cellspacing="0" style="font-size: 85%; border: 1px solid #CCCCCC; margin: 0.3em;"
 +
|[[Image:GC DNA base pair.svg|281px]]
 +
|}
 +
{|border="0" width=230px border="0" cellpadding="2" cellspacing="0" style="font-size: 85%; border: 1px solid #CCCCCC; margin: 0.3em;"
 +
|[[Image:AT DNA base pair.svg|281px]]
 +
|}
 +
<div style="border: none; width:281px;"><div class="thumbcaption">At top, a '''GC''' base pair with three [[hydrogen bond]]s. At the bottom, '''AT''' base pair with two hydrogen bonds. Hydrogen bonds are shown as dashed lines.</div></div></div>
  
Because DNA collects mutations over time, which are then passed down from parent to offspring, it contains information about processes that have occurred in the past. By comparing different DNA sequences, geneticists can attempt to infer the history of organisms.  
+
Each type of base on one strand forms a bond with just one type of base on the other strand. This is called complementary [[base pair]]ing. Here, purines form [[hydrogen bond]]s to pyrimidines, with A bonding only to T, and C bonding only to G. This arrangement of two nucleotides binding together across the double helix is called a base pair. In a double helix, the two strands are also held together via [[force]]s generated by the [[hydrophobic effect]] and [[pi stacking]], which are not influenced by the sequence of the DNA.<ref>P. Ponnuswamy, and M. Gromiha, "On the conformational stability of oligonucleotide duplexes and tRNA molecules," ''J Theor Biol'' 169(4): 419–432. PMID 7526075.</ref> As hydrogen bonds are not [[covalent bond|covalent]], they can be broken and rejoined relatively easily. The two strands of DNA in a double helix can therefore be pulled apart like a zipper, either by a mechanical force or high [[temperature]].<ref>H. Clausen-Schaumann, M. Rief, C. Tolksdorf, and H. Gaub, "Mechanical stability of single DNA molecules,"  ''Biophys J'' 78 (2000, issue 4): 1997–2007. PMID 10733978. </ref> As a result of this complementarity, all the information in the double-stranded sequence of a DNA helix is duplicated on each strand, which is vital in DNA replication. Indeed, this reversible and specific interaction between complementary base pairs is critical for all the functions of DNA in living organisms.<ref name=Alberts/>
  
If DNA sequences from different [[species]] are compared, then the resulting family tree, or [[phylogeny]] can be used to study the [[evolution]] of these species. This field of [[phylogenetics]] is a powerful tool in [[evolutionary biology]]. If DNA sequences within a species are compared, [[population genetics|population geneticists]] can glean information on the history of particular populations. This can be used in studies ranging from [[ecological genetics]] to [[anthropology]] (for example, DNA evidence is also being used to try to identify the [[Ten Lost Tribes of Israel]]<ref>''Lost Tribes of Israel'', [[NOVA (TV series)|NOVA]], PBS airdate: 22 February 2000. Transcript available from http://www.pbs.org/wgbh/nova/transcripts/2706israel.html (last accessed on 4 March 2006)</ref><ref>{{cite web| url=http://www.aish.com/societywork/sciencenature/the_cohanim_-_dna_connection.asp|  title=The Cohanim/DNA Connection| first= Yaakov | last=Kleiman| accessdate=2006-03-04}}</ref>).
+
The two types of base pairs form different numbers of hydrogen bonds, AT forming two hydrogen bonds, and GC forming three hydrogen bonds (see figures, left). The GC base pair is therefore stronger than the AT base pair. As a result, it is both the percentage of GC base pairs and the overall length of a DNA double helix that determine the strength of the association between the two strands of DNA. Long DNA helices with a high GC content have stronger-interacting strands, while short helices with high AT content have weaker-interacting strands.<ref>T. Chalikian, J. Völker, G. Plum, and K. Breslauer, "A more unified picture for the thermodynamics of nucleic acid duplex melting: a characterization by calorimetric and volumetric techniques," ''Proc Natl Acad Sci U S A'' 96(14) (1999): 7853–7858. PMID 10393911. </ref> Parts of the DNA double helix that need to separate easily, such as the TATAAT [[Pribnow box]] in bacterial [[promoter]]s, tend to have sequences with a high AT content, making the strands easier to pull apart.<ref>P. deHaseth and J. Helmann, "Open complex formation by Escherichia coli RNA polymerase: the mechanism of polymerase-induced strand separation of double helical DNA," ''Mol Microbiol'' 16(5) (1995): 817–824. PMID 7476180.</ref> In the laboratory, the strength of this interaction can be measured by finding the temperature required to break the hydrogen bonds, their [[melting temperature]] (also called ''T<sub>m</sub>'' value). When all the base pairs in a DNA double helix melt, the strands separate and exist in solution as two entirely independent molecules. These single-stranded DNA molecules have no single common shape, but some conformations are more stable than others.<ref>J. Isaksson, S. Acharya, J. Barman, P. Cheruku, and J. Chattopadhyaya, "Single-stranded adenine-rich DNA and RNA retain structural characteristics of their respective double-stranded conformations and show directional differences in stacking pattern," ''Biochemistry'' 43(51) (2004): 15996–16010. PMID 15609994.</ref>
  
DNA has also been used to look at fairly recent issues of family relationships, such as establishing some manner of familial relationship between the descendants of [[Sally Hemings]] and the family of [[Thomas Jefferson]]. This usage is closely related to the use of DNA in criminal investigations detailed above. Indeed, some criminal investigations have been solved when DNA from crime scenes has fortuitously matched relatives of the guilty individual.[http://www.newscientist.com/article.ns?id=dn4908][http://news.bbc.co.uk/1/hi/wales/3044282.stm]
+
===Sense and antisense===
 +
A DNA sequence is called "sense" if its sequence is the same as that of a [[messenger RNA]] copy that is translated into protein. The sequence on the opposite strand is complementary to the sense sequence and is therefore called the "antisense" sequence. Since [[RNA polymerase]]s work by making a complementary copy of their templates, it is this antisense strand that is the template for producing the sense messenger RNA. Both sense and antisense sequences can exist on different parts of the same strand of DNA (that is, both strands contain both sense and antisense sequences).  
  
==Molecular structure==
+
In both [[prokaryote]]s and [[eukaryote]]s, antisense RNA sequences are produced, but the functions of these RNAs are not entirely clear.<ref>A. Hüttenhofer, P. Schattner, and N. Polacek, "Non-coding RNAs: hope or hype?" ''Trends Genet'' 21(5) (2005): 289–297. PMID 15851066.</ref> One proposal is that antisense RNAs are involved in regulating [[gene expression]] through RNA-RNA base pairing.<ref>S. Munroe, "Diversity of antisense regulation in eukaryotes: multiple mechanisms, emerging patterns," ''J Cell Biochem'' 93(4) (2004): 664–671. PMID 15389973.</ref>
[[Image:NA-comparedto-DNA thymineAndUracilCorrected.png|right|400px|thumb|Comparisons between DNA and single stranded RNA with the diagram of the bases showing.]]
 
Although sometimes called "the molecule of heredity", DNA macromolecules as people typically think of them are not single molecules. Rather, they are pairs of molecules, which entwine like vines to form a '''double [[helix]]''' (see the illustration at the right).
 
  
Each vine-like molecule is a strand of DNA: '''a chemically linked chain of [[nucleotide]]s, each of which consists of a [[sugar]] ([[deoxyribose]]), a [[phosphate]] and one of five kinds of [[nucleobase]]s ("bases")'''. Because DNA strands are composed of these nucleotide subunits, they are [[polymer]]s.
+
A few DNA sequences in prokaryotes and eukaryotes, and more in [[plasmid]]s and [[virus]]es, blur the distinction made above between sense and antisense strands by having overlapping genes.<ref>I. Makalowska, C. Lin, and W. Makalowski, "Overlapping genes in vertebrate genomes," ''Comput Biol Chem'' 29 (1) (2005): 1-12. PMID 15680581.</ref> In these cases, some DNA sequences do double duty, encoding one protein when read 5′ to 3′ along one strand, and a second protein when read in the opposite direction (still 5′ to 3′) along the other strand. In [[bacteria]], this overlap may be involved in the regulation of gene transcription,<ref>Z. Johnson and S. Chisholm, "Properties of overlapping genes are conserved across microbial genomes," ''Genome Res'' 14(11) (2004): 2268–2272. PMID 15520290.</ref> while in viruses, overlapping genes increase the amount of information that can be encoded within the small viral genome.<ref>R. Lamb and C. Horvath, "Diversity of coding strategies in influenza viruses," ''Trends Genet'' 7(8) (1991): 261–266. PMID 1771674.</ref> Another way of reducing genome size is seen in some viruses that contain linear or circular single-stranded DNA as their genetic material.<ref>J. Davies and J. Stanley, "Geminivirus genes and vectors," ''Trends Genet'' 5(3) (1989): 77–81. PMID 2660364.</ref><ref>K. Berns, "Parvovirus replication," ''Microbiol Rev'' 54(3) (1990): 316–329. PMID 2215424.</ref>
  
The diversity of the bases means that there are five kinds of nucleotides, which are commonly referred to by the identity of their bases. These are [[adenine]] (A), [[thymine]] (T), [[uracil]] (U), [[cytosine]] (C), and [[guanine]] (G). U is rarely found in DNA except as a result of chemical degradation of C, but in some viruses, notably PBS1 phage DNA, U completely replaces the usual T in its DNA. Similarly, RNA usually contains U in place of T, but in certain RNAs such as [[transfer RNA]], T is always found in some positions. Thus, the only true difference between DNA and RNA is the sugar, 2-deoxyribose in DNA and ribose in RNA.
+
===Supercoiling===
 +
DNA can be twisted like a rope in a process called [[DNA supercoil]]ing. With DNA in its "relaxed" state, a strand usually circles the axis of the double helix once every 10.4 base pairs, but if the DNA is twisted the strands become more tightly or more loosely wound.<ref>C. Benham and S. Mielke, "DNA mechanics," ''Annu Rev Biomed Eng'' 7 (2005): 21–53. PMID 16004565.</ref> If the DNA is twisted in the direction of the helix, this is positive supercoiling, and the bases are held more tightly together. If they are twisted in the opposite direction, this is negative supercoiling, and the bases come apart more easily.  
  
In a DNA double helix, two polynucleotide strands can associate through the [[hydrophobic effect]] and [[pi stacking]]. Specificity of which strands stay associated is determined by [[base pair|complementary pairing]]. Each base forms [[hydrogen bond]]s readily to only one other — A to T and C to G — so that the identity of the base on one strand dictates the strength of the association; the more complementary bases exist, the stronger and longer-lasting the association.
+
In nature, most DNA has slight negative supercoiling that is introduced by [[enzyme]]s called [[topoisomerase]]s.<ref name=Champoux>J. Champoux, "DNA topoisomerases: structure, function, and mechanism," ''Annu Rev Biochem'' 70 (2001): 369–413. PMID 11395412.</ref> These enzymes are also needed to relieve the twisting stresses introduced into DNA strands during processes such as [[transcription (genetics)|transcription]] and [[DNA replication]].<ref name=Wang>J. Wang, "Cellular roles of DNA topoisomerases: a molecular perspective," ''Nat Rev Mol Cell Biol'' 3(6) (2002): 430–440. PMID 12042765.</ref>
  
The cell's machinery is capable of ''melting'' or disassociating a DNA double helix, and using each  DNA strand as a template for synthesizing a new strand which is nearly identical to the previous strand.  Errors that occur in the synthesis are known as [[mutations]].  The process known as [[Polymerase chain reaction|PCR]] (polymerase chain reaction) mimics this process [[in vitro]] in a nonliving system.
+
[[Image:A-DNA, B-DNA and Z-DNA.png|thumb|right|400px|From left to right, the structures of A, B and Z DNA]]
  
Because pairing causes the nucleotide bases to face the helical axis, the sugar and phosphate groups of the nucleotides run along the outside; the two chains they form are sometimes called the "'''backbones'''" of the helix. In fact, it is chemical bonds between the phosphates and the sugars that link one nucleotide to the next in the DNA strand.
+
===Alternative double-helical structures===
 +
DNA exists in several possible [[Conformational isomerism|conformations]]. The conformations so far identified are: [[A-DNA]], B-DNA, [[C-DNA]], D-DNA,<ref name=Hayashi2005>G. Hayashi, M. Hagihara, and K. Nakatani, "Application of L-DNA as a molecular tag," ''Nucleic Acids Symp Ser (Oxf)'' 49 (2005): 261–262. PMID 17150733.</ref> E-DNA,<ref name=Vargason2000>J. M. Vargason, B. F. Eichman, and P. S. Ho, "The extended and eccentric E-DNA structure induced by cytosine methylation or bromination," ''Nature Structural Biology'' 7 (2000): 758-761. PMID 10966645.</ref> H-DNA,<ref name=Wang2006>G. Wang and K. M. Vasquez, "Non-B DNA structure-induced genetic instability," ''Mutat Res'' 598(1-2) (2006): 103-119. PMID 16516932.</ref> L-DNA,<ref name=Hayashi2005/> P-DNA,<ref name="Allemand1998">Allemand, et al, "Stretched and overwound DNA forms a Pauling-like structure with exposed bases," ''PNAS'' 24(1998): 14152-14157. PMID 9826669.</ref> and [[Z-DNA]].<ref name=Ghosh/><ref>E. Palecek, "Local supercoil-stabilized DNA structures," Critical Reviews in Biochemistry and Molecular Biology'' 26(2) (1991): 151-226. PMID 1914495.</ref> However, only A-DNA, B-DNA, and Z-DNA have been observed in naturally occurring biological systems.  
  
{{multi-video start}}
+
Which conformation DNA adopts depends on the sequence of the DNA, the amount and direction of supercoiling, chemical modifications of the bases, and also solution conditions, such as the concentration of [[metal]] [[ion]]s and [[polyamine]]s.<ref>H. Basu, B. Feuerstein, D. Zarling, R. Shafer, and L. Marton, "Recognition of Z-RNA and Z-DNA determinants by polyamines in solution: experimental and theoretical studies," ''J Biomol Struct Dyn'' 6 (2) (1988): 299-309. PMID 2482766.</ref> Of these three conformations, the "B" form described above is most common under the conditions found in cells.<ref>A. G. Leslie, S. Arnott, R. Chandrasekaran, and R. L. Ratliff, "Polymorphism of DNA double helices," ''J. Mol. Biol.'' 143(1) (1980): 49–72. PMID 7441761.</ref> The two alternative double-helical forms of DNA differ in their geometry and dimensions.
{{multi-video item |
 
  filename      = ADN animation.gif |
 
  title        = Rotating DNA stick model |
 
  description  = Animation of a section of DNA rotating. (1.00 [[Megabyte|MB]], [[animated GIF]] format). |
 
  format        = [[animated GIF]]
 
}}
 
{{multi-video end}}
 
  
==Sequence role==
+
The A form is a wider right-handed spiral, with a shallow, wide minor groove and a narrower, deeper major groove. The A form occurs under non-physiological conditions in dehydrated samples of DNA, while in the cell it may be produced in hybrid pairings of DNA and RNA strands, as well as in enzyme-DNA complexes.<ref>M. Wahl and M. Sundaralingam, "Crystal structures of A-DNA duplexes," ''Biopolymers'' 44(1) (1997): 45-63. PMID 9097733.</ref><ref>X. J. Lu, Z. Shakked, and W. K. Olson, "A-form conformational motifs in ligand-bound DNA structures," ''J. Mol. Biol.'' 300(4) (2000): 819-840. PMID.</ref> Segments of DNA where the bases have been chemically-modified by [[methylation]] may undergo a larger change in conformation and adopt the [[Z-DNA|Z form]]. Here, the strands turn about the helical axis in a left-handed spiral, the opposite of the more common B form.<ref>S. Rothenburg, F. Koch-Nolte, and F. Haag, "DNA methylation and Z-DNA formation as mediators of quantitative differences in the expression of alleles," ''Immunol Rev'' 184 (2001): 286-298. PMID 12086319.</ref> These unusual structures can be recognized by specific Z-DNA binding proteins and may be involved in the regulation of transcription.<ref>D. Oh, Y. Kim, and A. Rich, "Z-DNA-binding proteins can act as potent effectors of gene expression in vivo," ''Proc. Natl. Acad. Sci. U.S.A.'' 99(26) (2002): 16666-16671. PMID 12486233. </ref>
Within a gene, the sequence of [[nucleotides]] along a DNA strand defines a messenger RNA sequence which then defines a [[protein]], that an [[organism]] is liable to manufacture or "[[gene expression|express]]" at one or several points in its life using the information of the sequence. The relationship between the nucleotide sequence and the [[amino acid|amino-acid]] sequence of the protein is determined by simple cellular rules of [[Translation (genetics)|translation]], known collectively as the [[genetic code]]. The genetic code consists of three-letter 'words' (termed a codon) formed from a sequence of three nucleotides (e.g. ACT, CAG, TTT). These codons can then be translated with [[messenger RNA]] and then [[transfer RNA]], with a codon corresponding to a particular amino acid. There are 64 possible codons (4 bases in 3 places <math>4^3</math>) that encode 20 amino acids. Most amino acids, therefore, have more than one possible codon. There are also three 'stop' or 'nonsense' codons signifying the end of the coding region, namely the UAA, UGA and UAG codons.
 
  
In many [[species]], only a small fraction of the total sequence of the [[genome]] appears to encode protein. For example, only about 1.5% of the [[human genome]] consists of protein-coding [[exons]]. The function of the rest is a matter of speculation. It is known that certain nucleotide sequences specify affinity for [[DNA binding protein]]s, which play a wide variety of vital roles, in particular through control of replication and transcription. These sequences are frequently called [[regulatory sequence]]s, and researchers assume that so far they have identified only a tiny fraction of the total that exist. "[[Junk DNA]]" represents sequences that do not yet appear to contain genes or to have a function. The reasons for the presence of so much [[non-coding DNA]] in [[eukaryotic]] genomes and the extraordinary differences in [[genome size]] ("[[C-value]]") among species represent a long-standing puzzle in DNA research known as the "[[C-value enigma]]".
+
[[Image:Parallel telomere quadruple.png|thumb|right|350px|Structure of a DNA quadruplex formed by [[telomere]] repeats. The conformation of the DNA backbone diverges significantly from the typical helical structure.]]
  
Some DNA sequences play structural roles in chromosomes. [[Telomere]]s and [[centromere]]s typically contain few (if any) protein-coding genes, but are important for the function and stability of chromosomes. Some genes code for "RNA genes" (see [[tRNA]] and [[rRNA]]). Some RNA genes code for transcripts that function as regulatory RNAs (see [[RNA interference|siRNA]]) that influence the function of other RNA molecules. The intron-exon structure of some genes (such as immunoglobin and protocadeherin genes) is important for allowing alternative splicing of pre-mRNA which allows several different proteins to be made from the same gene. Some non-coding DNA represents [[pseudogene]]s that can be used as raw material for the creation of new genes with new functions. Some non-coding DNA provided hot-spots for duplication of short DNA regions; such sequence duplication has been the major form of genetic change in the human lineage (see evidence from the [[Chimpanzee Genome Project]]). Exons interspersed with introns allows for "exon shuffling" and the creation of modified genes that might have new adaptive functions. Large amounts of non-coding DNA is probably adaptive in that it provides chromosomal regions where [[Genetic recombination|recombination]] between homologous portions of chromosomes can take place without disrupting the function of genes. Some biologists such as [[Stuart Kauffman]] have speculated that non-coding DNA may modify the rate of evolution of a species.{{fact}}
+
===Quadruplex structures===
 +
At the ends of the linear [[chromosome]]s are specialized regions of DNA called [[telomere]]s. The main function of these regions is to allow the cell to replicate chromosome ends using the enzyme [[telomerase]], as the enzymes that normally replicate DNA cannot copy the extreme 3′ ends of chromosomes.<ref name=Greider>C. Greider, and E. Blackburn, "Identification of a specific telomere terminal transferase activity in Tetrahymena extracts," ''Cell'' 43(2 pt 1) (1985): 405-413. PMID 3907856.</ref> As a result, if a chromosome lacked telomeres it would become shorter each time it was replicated. These specialized chromosome caps also help protect the DNA ends from [[exonuclease]]s and stop the [[DNA repair]] systems in the cell from treating them as damage to be corrected.<ref name=Nugent>C. Nugent and V. Lundblad, [http://genesdev.cshlp.org/content/12/8/1073.full "The telomerase reverse transcriptase: components and regulation,"] ''Genes Dev'' 12(8) (1998): 1073-1085. PMID 9553037. Retrieved January 23, 2023.</ref> In human cells, telomeres are usually lengths of single-stranded DNA containing several thousand repeats of a simple TTAGGG sequence.<ref>W. Wright, V. Tesmer, K. Huffman, S. Levene, and J. Shay, [http://genesdev.cshlp.org/content/11/21/2801.full "Normal human chromosomes have long G-rich telomeric overhangs at one end,"] ''Genes Dev'' 11(21) (1997): 2801-2809. PMID 9353250. Retrieved January 23, 2023.</ref>
  
Sequence also determines a DNA segment's susceptibility to cleavage by [[restriction enzyme]]s, the quintessential tools of [[genetic engineering]]. The position of cleavage sites throughout an individual's genome determines one kind of an individual's "[[DNA fingerprinting|DNA fingerprint]]".
+
These guanine-rich sequences may stabilize chromosome ends by forming very unusual structures of stacked sets of four-base units, rather than the usual base pairs found in other DNA molecules. Here, four guanine bases form a flat plate and these flat four-base units then stack on top of each other, to form a stable ''[[G-quadruplex]]'' structure.<ref name=Burge>S. Burge, G. Parkinson, P. Hazel, A. Todd, and S. Neidle, "Quadruplex DNA: sequence, topology and structure," ''Nucleic Acids Res'' 34(19) (2006): 5402-5415. PMID 17012276. </ref> These structures are stabilized by hydrogen bonding between the edges of the bases and [[chelation]] of a metal ion in the centre of each four-base unit. The structure shown to the left is a top view of the quadruplex formed by a DNA sequence found in human telomere repeats. The single DNA strand forms a loop, with the sets of four bases stacking in a central quadruplex three plates deep. In the space at the center of the stacked bases are three chelated [[potassium]] ions.<ref>G. Parkinson, M. Lee, and S. Neidle, "Crystal structure of parallel quadruplexes from human telomeric DNA," ''Nature'' 417(6891) (2002): 876-880. PMID 12050675.</ref> Other structures can also be formed, with the central set of four bases coming from either a single strand folded around the bases, or several different parallel strands, each contributing one base to the central structure.
  
==Replication==
+
In addition to these stacked structures, telomeres also form large loop structures called telomere loops, or T-loops. Here, the single-stranded DNA curls around in a long circle stabilized by telomere-binding proteins.<ref>J. Griffith, L. Comeau, S. Rosenfield, R. Stansel, A. Bianchi, H. Moss and T. de Lange, "Mammalian telomeres end in a large duplex loop," ''Cell'' 97(4) (1999): 503-514. PMID 10338214.</ref> At the very end of the T-loop, the single-stranded telomere DNA is held onto a region of double-stranded DNA by the telomere strand disrupting the double-helical DNA and base pairing to one of the two strands. This [[Triple-stranded DNA|triple-stranded]] structure is called a displacement loop or [[D-loop]].<ref name=Burge/>
''Main article:'' [[DNA replication]]
 
[[image:dna-split.png|frame|DNA replication]]
 
<!-- summary has been added, below, also include any extra context relevant for this article as well
 
  
..[[origin of replication]]...chromosome...plasmid...DNA polymerase...[[mutation]]...[a paragraph including these ideas would be useful and go well here]
+
==Chemical modifications==
-->
+
<div class="thumb tright" style="background-color: #f9f9f9; border: 1px solid #CCCCCC; margin:0.5em;">
DNA replication or DNA synthesis is the process of copying the double-stranded DNA prior to [[cell division]].  The two resulting double strands are generally almost perfectly identical, but occasionally errors in replication or exposure to chemicals, or radiation can result in a less than perfect copy (see [[mutation]]), and each of them consists of one original and one newly synthesized strand. This is called ''[[semiconservative replication]]''.  The process of replication consists of three steps: ''initiation'', ''elongation'' and ''termination''.
+
{|border="0" width=400px border="0" cellpadding="2" cellspacing="0" style="font-size: 85%; border: 1px solid #CCCCCC; margin: 0.3em;"
 +
|[[Image:Cytosine chemical structure.png|100px]]
 +
|[[File:5-Methylcytosine.png|125px]]
 +
|[[Image:Thymine chemical structure.png|125px]]
 +
|-
 +
|align=center|[[cytosine]]
 +
|align=center|[[5-Methylcytosine|5-methylcytosine]]
 +
|align=center|[[thymine]]
 +
|}
 +
<div style="border: none; width:300px;font-size: 85%;"><div class="thumbcaption">Structure of cytosine with and without the 5-methyl group. After deamination the 5-methylcytosine has the same structure as thymine</div></div></div>
 +
===Base modifications===
  
==Mechanical biological properties==
+
The expression of genes is influenced by the [[chromatin]] structure of a chromosome and regions of [[heterochromatin]] (low or no gene expression) correlate with the [[methylation]] of [[cytosine]]. For example, cytosine methylation, to produce [[5-Methylcytosine|5-methylcytosine]], is important for [[X-inactivation|X-chromosome inactivation]].<ref>R. Klose, and A. Bird, "Genomic DNA methylation: the mark and its mediators," ''Trends Biochem Sci'' 31(2) (2006): 89-97. PMID 16403636.</ref> The average level of methylation varies between organisms, with ''[[Caenorhabditis elegans]]'' lacking cytosine methylation, while [[vertebrate]]s show higher levels, with up to 1% of their DNA containing 5-methylcytosine.<ref>A. Bird, "DNA methylation patterns and epigenetic memory," ''Genes Dev'' 16(1) (2002): 6-21. PMID 11782440.</ref> Despite the biological role of 5-methylcytosine it is susceptible to spontaneous [[deamination]] to leave the thymine base, and methylated cytosines are therefore [[mutation]] hotspots.<ref>C. Walsh, and G. Xu, "Cytosine methylation and DNA repair," ''Curr Top Microbiol Immunol'' 301 (2006): 283-315. PMID 16570853.</ref> Other base modifications include adenine methylation in bacteria and the [[glycosylation]] of uracil to produce the "J-base" in [[kinetoplastid]]s.<ref>D. Ratel, J. Ravanat, F. Berger, and D. Wion, "N6-methyladenine: the other methylated base of DNA," ''Bioessays'' 28(3) (2006): 309-315. PMID 16479578.</ref><ref>J. Gommers-Ampt, F. Van Leeuwen, A. de Beer, J. Vliegenthart, M. Dizdaroglu, J. Kowalak, P. Crain, and P. Borst, "Beta-D-glucosyl-hydroxymethyluracil: a novel modified base present in the DNA of the parasitic protozoan T. brucei," ''Cell'' 75(6) (1993): 1129-1136. PMID 8261512.</ref>
''Main article:'' [[Mechanical properties of DNA]].
 
  
===Strands association and dissociation===
+
===DNA damage===
The hydrogen bonds between the strands of the double helix are weak enough that they can be easily separated by [[enzyme]]s. Enzymes known as [[helicase]]s unwind the strands to facilitate the advance of sequence-reading enzymes such as [[DNA polymerase]]. The unwinding requires that helicases chemically cleave the phosphate backbone of one of the strands so that it can swivel around the other. The strands can also be separated by gentle heating, as used in [[PCR]], provided they have fewer than about 10,000 '''base pairs''' (10 kilobase pairs, or 10 kbp). The intertwining of the DNA strands makes long segments difficult to separate.
+
{{further|[[Mutation]]}}
  
===Circular DNA===
+
[[Image:Benzopyrene DNA adduct 1JDG.png|thumb|right|300px|[[Benzopyrene]], the major mutagen in [[tobacco smoking|tobacco smoke]], in an adduct to DNA.]]
When the ends of a piece of double-helical DNA are joined so that it forms a circle, as in [[plasmid]] DNA, the strands are [[knot theory|topologically]] knotted. This means they cannot be separated by gentle heating or by any process that does not involve breaking a strand. The task of unknotting topologically linked strands of DNA falls to enzymes known as [[topoisomerase]]s. Some of these enzymes unknot circular DNA by cleaving two strands so that another double-stranded segment can pass through. Unknotting is required for the replication of circular DNA as well as for various types of [[recombination]] in linear DNA.
+
DNA can be damaged by many different sorts of [[mutagen]]s. These include [[oxidizing agent]]s, [[alkylating agent]]s, and also high-energy [[electromagnetic radiation]] such as [[ultraviolet]] light and [[x-ray]]s. The type of DNA damage produced depends on the type of mutagen. For example, UV light mostly damages DNA by producing [[thymine dimer]]s, which are cross-links between adjacent pyrimidine bases in a DNA strand.<ref>T. Douki, A. Reynaud-Angelin, J. Cadet, and E. Sage, "Bipyrimidine photoproducts rather than oxidative lesions are the main type of DNA damage involved in the genotoxic effect of solar UVA radiation," ''Biochemistry'' 42(30) (2003): 9221-9226. PMID 12885257.</ref> On the other hand, oxidants such as [[free radical]]s or [[hydrogen peroxide]] produce multiple forms of damage, including base modifications, particularly of guanosine, as well as double-strand breaks.<ref>J. Cadet, T. Delatour, T. Douki, D. Gasparutto, J. Pouget, J. Ravanat, and S. Sauvaigo, "Hydroxyl radicals and DNA base damage," ''Mutat Res'' 424(1-2) (1999): 9-21. PMID 10064846.</ref> It has been estimated that in each human cell, about 500 bases suffer oxidative damage per day.<ref>M. Shigenaga, C. Gimeno, and B. Ames, "Urinary 8-hydroxy-2′-deoxyguanosine as a biological marker of ''in vivo'' oxidative DNA damage,"  ''Proc Natl Acad Sci U S A'' 86(24) (1989): 9697-9701. PMID 2602371.</ref><ref>R. Cathcart, E. Schwiers, R. Saul, and B. Ames, "Thymine glycol and thymidine glycol in human and rat urine: A possible assay for oxidative DNA damage," ''Proc Natl Acad Sci U S A'' 81(18) (1984): 5633-5637. PMID 6592579.</ref> Of these oxidative lesions, the most dangerous are double-strand breaks, as these lesions are difficult to repair and can produce [[point mutation]]s, [[Insertion (genetics)|insertions]] and [[Genetic deletion|deletions]] from the DNA sequence, as well as [[chromosomal translocation]]s.<ref>K. Valerie and L. Povirk, "Regulation and mechanisms of mammalian double-strand break repair," ''Oncogene'' 22(37) (2003): 5792-5812. PMID 12947387.</ref>
  
===Great length versus tiny breadth===
+
Many mutagens [[intercalation (chemistry)|intercalate]] into the space between two adjacent base pairs. Intercalators are mostly [[aromaticity|aromatic]] and planar molecules, and include [[ethidium]], [[daunomycin]], [[doxorubicin]], and [[thalidomide]]. In order for an intercalator to fit between base pairs, the bases must separate, distorting the DNA strands by unwinding of the double helix. These structural changes inhibit both transcription and DNA replication, causing toxicity and mutations. As a result, DNA intercalators are often [[carcinogen]]s, with [[benzopyrene|benzopyrene diol epoxide]], [[acridine]]s, [[aflatoxin]], and [[ethidium bromide]] being well-known examples.<ref>L. Ferguson and W. Denny, "The genetic toxicology of acridines," ''Mutat Res'' 258(2) (1991): 123-160. PMID 1881402.</ref><ref>A. Jeffrey, "DNA modification by chemical carcinogens," ''Pharmacol Ther'' 28(2) (1985): 237&ndash;272. PMID 3936066.</ref><ref>T. Stephens, C. Bunde, and B. Fillmore, "Mechanism of action in thalidomide teratogenesis," ''Biochem Pharmacol'' 59(12) (2000): 1489&ndash;1499. PMID 10799645.</ref> Nevertheless, due to their properties of inhibiting DNA transcription and replication, they are also used in [[chemotherapy]] to inhibit rapidly-growing [[cancer]] cells.<ref>M. Braña, M. Cacho, A. Gradillas, B. Pascual-Teresa, and A. Ramos, "Intercalators as anticancer drugs," ''Curr Pharm Des'' 7(17) (2001): 1745&ndash;1780. PMID 11562309.</ref>
The narrow breadth of the double helix makes it impossible to detect by conventional [[transmission electron microscope|electron microscopy]], except by heavy staining. At the same time, the DNA found in many cells can be macroscopic in length — approximately 2 [[meter]]s long for strands in a human chromosome.<ref>{{cite web| url=http://hypertextbook.com/facts/1998/StevenChen.shtml| title=Length of a Human DNA Molecule| accessdate=2006-03-04}}</ref> Consequently, cells must compact or "package" DNA to carry it within them. This is one of the functions of the chromosomes, which contain spool-like [[protein]]s known as [[histone]]s, around which DNA winds.
 
  
===Entropic stretching behavior===
+
==Overview of biological functions==
When DNA is in solution, it undergoes conformational fluctuations due to the energy available in the [[thermal bath]]. For [[Entropy|entropic]] reasons, floppy states are more thermally accessible than stretched out states; for this reason, a single molecule of DNA stretches similarly to a rubber band. Using [[optical tweezers]], the entropic stretching behavior of DNA has been studied and analyzed from a [[polymer physics]] perspective, and it has been found that DNA behaves like the ''Kratky-Porod'' [[worm-like chain]] model with a persistence length of about 53 nm.
+
DNA usually occurs as linear [[chromosome]]s in [[eukaryote]]s, and circular chromosomes in [[prokaryote]]s. The set of chromosomes in a cell makes up its [[genome]]. The [[human genome]] has approximately 3 billion base pairs of DNA arranged into 46 chromosomes.<ref>J. Venter, et al., "The sequence of the human genome," ''Science'' 291(5507) (2001): 1304&ndash;1351. PMID 11181995.</ref>
  
Furthermore, DNA undergoes a stretching [[phase transition]] at a force of 65 [[Newtons|pN]]; above this force, DNA is thought to take the form that [[Linus Pauling]] originally hypothesized, with the phosphates in the middle and bases splayed outward. This proposed structure for overstretched DNA has been called "P-form DNA," in honor of Pauling.
+
The information carried by DNA is held in the [[DNA sequence|sequence]] of pieces of DNA called [[gene]]s. [[Transmission (genetics)|Transmission]] of genetic information in genes is achieved via complementary base pairing. For example, in transcription, when a cell uses the information in a gene, the DNA sequence is copied into a complementary RNA sequence through the attraction between the DNA and the correct RNA nucleotides. Usually, this RNA copy is then used to make a matching protein sequence in a process called [[Translation (biology)|translation]], which depends on the same interaction between RNA nucleotides. Alternatively, a cell may simply copy its genetic information in a process called DNA replication. The details of these functions are covered in other articles; here we focus on the interactions between DNA and other molecules that mediate the function of the genome.
  
===Different helix geometries===
+
===Genome structure===
The DNA helix can assume one of three slightly different geometries, of which the "B" form described by [[James D. Watson]] and [[Francis Crick]] is believed to predominate in cells. It is 2 [[nanometre]]s wide and extends 3.4 nanometres per 10 [[Base pair|bp]] of sequence. This is also the approximate length of sequence in which the double helix makes one complete turn about its axis. This frequency of twist (known as the helical ''pitch'') depends largely on stacking forces that each base exerts on its neighbors in the chain.
+
{{further|[[Chromosome]], [[Gene]]}}
 +
Genomic DNA is located in the [[cell nucleus]] of eukaryotes, as well as small amounts in [[mitochondrion|mitochondria]] and [[chloroplast]]s. In prokaryotes, the DNA is held within an irregularly shaped body in the cytoplasm called the [[nucleoid]].<ref>M. Thanbichler, S. Wang, and L. Shapiro, "The bacterial nucleoid: a highly organized and dynamic structure," ''J Cell Biochem'' 96(3) (2005): 506&ndash;521. PMID 15988757.</ref>
  
====Supercoiled DNA====
+
The genetic information in a genome is held within [[gene]]s. A gene is a unit of [[heredity]] and is a region of DNA that influences a particular characteristic in an organism. Genes contain an [[open reading frame]] that can be transcribed, as well as [[regulatory sequence]]s such as [[promoter]]s and [[enhancer (genetics)|enhancers]], which control the expression of the open reading frame.  
{{main|Supercoil}}
 
The B form of the DNA helix twists 360° per 10 bp in the absence of strain. But many molecular biological processes can induce strain. A DNA segment with excess or insufficient helical twisting is referred to, respectively, as positively or negatively "supercoiled". DNA ''in vivo'' is typically negatively supercoiled, which facilitates the unwinding of the double-helix required for [[transcription (genetics)|RNA transcription]].
 
  
====Sugar pucker====
+
In many [[species]], only a small fraction of the total sequence of the [[genome]] encodes protein. For example, only about 1.5% of the human genome consists of protein-coding [[exon]]s, with over 50% of human DNA consisting of non-coding [[repeated sequence (DNA)|repetitive sequences]].<ref>T. Wolfsberg, J. McEntyre, and G. Schuler, "Guide to the draft human genome," ''Nature'' 409(6822) (2001): 824&ndash;826. PMID 11236998.</ref> The reasons for the presence of so much [[noncoding DNA|non-coding DNA]] in eukaryotic genomes and the extraordinary differences in [[genome size]], or ''[[C-value]]'', among species represent a long-standing puzzle known as the "[[C-value enigma]]."<ref>T. Gregory, [https://academic.oup.com/aob/article/95/1/133/198525?login=false "The C-value enigma in plants and animals: a review of parallels and an appeal for partnership,"] ''Ann Bot (Lond) '' 95(1) (2005): 133&ndash;146. PMID 15596463. Retrieved January 23, 2023.</ref>
There are four conformations that the [[ribofuranose]] rings in nucleotides can acquire:
 
# C-2' endo
 
# C-2' exo
 
# C-3' endo
 
# C-3' exo
 
Ribose is usually in C-3'endo, while deoxyribose is usually in the C-2' endo sugar pucker conformation.
 
The A and B forms differ mainly in their ''sugar pucker''.  In the A form, the C3' configuration is above the sugar ring, whilst the C2' configuration is below it.  Thus, the A form is described as "C3'-endo." Likewise, in the B form, the C2' configuration is above the sugar ring, whilst C3' is below; this is called "C2'-endo."  Altered sugar puckering in A-DNA results in shortening the distance between adjacent phosphates by around one angstrom.  This gives 11 to 12 base pairs to each helix in the DNA strand, instead of 10.5 in B-DNA. Sugar pucker gives uniform ribbon shape to DNA, a cylindrical open core, and also a deep major groove more narrow and pronounced that grooves found in B-DNA.
 
  
====A and Z helices formation====
+
However, DNA sequences that do not code protein may still encode functional [[non-coding RNA]] molecules, which are involved in the regulation of gene expression.<ref>The ENCODE Project Consortium, "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project," ''Nature'' 447(7146) (2007): 799-816. doi:10.1038/nature05874</ref>
The two other known double-helical forms of DNA, called A and [[Z-DNA|Z]], differ modestly in their geometry and dimensions. The A form appears likely to occur only in dehydrated samples of DNA, such as those used in [[crystallography|crystallographic]] experiments, and possibly in hybrid pairings of DNA and [[RNA]] strands. Segments of DNA that cells have [[methylation|methylated]] for regulatory purposes may adopt the Z geometry, in which the strands turn about the helical axis like a mirror image of the B form.
+
[[Image:T7 RNA polymerase at work.png|thumb|right|400px|[[T7 RNA polymerase]] (blue) producing a mRNA (green) from a DNA template (orange). Created from [http://www.rcsb.org/pdb/explore/explore.do?structureId=1MSW PDB 1MSW].]]
 +
Some non-coding DNA sequences play structural roles in chromosomes. [[Telomere]]s and [[centromere]]s typically contain few genes, but are important for the function and stability of chromosomes.<ref>A. Pidoux, and R. Allshire, "The role of heterochromatin in centromere function," ''Philos Trans R Soc Lond B Biol Sci'' 360(1455) (2005): 569&ndash;579. PMID 15905142. </ref> An abundant form of non-coding DNA in humans are [[pseudogene]]s, which are copies of genes that have been disabled by mutation.<ref>P. Harrison, H. Hegyi, S. Balasubramanian, N. Luscombe, P. Bertone, N. Echols, T. Johnson, and M. Gerstein, [https://genome.cshlp.org/content/12/2/272.full "Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22,"] ''Genome Res'' 12(2) (2002): 272&ndash;280. PMID 11827946. Retrieved January 23, 2023.</ref> These sequences are usually just molecular [[fossil]]s, although they can occasionally serve as raw genetic material for the creation of new genes through the process of [[gene duplication]] and [[divergent evolution|divergence]].<ref>P. Harrison, and M. Gerstein, "Studying genomes through the aeons: protein families, pseudogenes and proteome evolution," ''J Mol Biol'' 318(5) (2002): 1155&ndash;1174. PMID 12083509.</ref>
  
====Properties of different helical forms====
+
===Transcription and translation===
{| border="0" align="center" style="border: 1px solid #999; background-color:#FFFFFF"
+
A [[gene]] is a sequence of DNA that contains genetic information and can influence the [[phenotype]] of an organism. Within a gene, the sequence of bases along a DNA strand defines a [[messenger RNA]] sequence, which then defines one or more protein sequences. The relationship between the nucleotide sequences of genes and the [[amino acid|amino-acid]] sequences of proteins is determined by the rules of [[translation (genetics)|translation]], known collectively as the [[genetic code]]. The genetic code consists of three-letter "words" called ''codons'' formed from a sequence of three nucleotides (e.g. ACT, CAG, TTT).
|-align="center" bgcolor="#CCCCCC"
+
 
!Geometry attribute
+
In transcription, the codons of a gene are copied into messenger RNA by [[RNA polymerase]]. This RNA copy is then decoded by a [[ribosome]] that reads the RNA sequence by base-pairing the messenger RNA to [[transfer RNA]], which carries amino acids. Since there are 4 bases in 3-letter combinations, there are 64 possible codons (<math>4^3</math> combinations). These encode the twenty [[list of standard amino acids|standard amino acids]], giving most amino acids more than one possible codon. There are also three "stop" or "nonsense" codons signifying the end of the coding region; these are the TAA, TGA and TAG codons.
!A-form
+
 
!B-form
+
[[Image:DNA replication.svg|thumb|450px|right|DNA replication. The double helix is unwound by a [[helicase]] and [[topoisomerase]]. Next, one [[DNA polymerase]] produces the [[leading strand]] copy. Another DNA polymerase binds to the [[lagging strand]]. This enzyme makes discontinuous segments (called [[Okazaki fragment]]s) before [[DNA ligase]] joins them together.]]
!Z-form
 
|-
 
|Helix sense ||align="center"| right-handed ||align="center"| right-handed ||align="center"| left-handed
 
|—bgcolor="#EFEFEF"
 
|Repeating unit ||align="right"| 1 bp ||align="right"| 1 bp ||align="right"| 2 bp
 
|-----
 
|Rotation/bp ||align="right"| 33.6° ||align="right"| 35.9° ||align="right"| 60°/2
 
|--bgcolor="#EFEFEF"
 
|Mean bp/turn ||align="right"| 10.7 ||align="right"| 10.4 ||align="right"| 12
 
|-----
 
|Inclination of bp to axis ||align="right"| +19° ||align="right"| -1.2° ||align="right"| -9°
 
|—bgcolor="#EFEFEF"
 
|Rise/bp along axis ||align="right"| 0.23 nm ||align="right"| 0.332 nm ||align="right"| 0.38 nm
 
|-----
 
|Pitch/turn of helix ||align="right"| 2.46 nm ||align="right"| 3.32 nm ||align="right"| 4.56 nm
 
|--bgcolor="#EFEFEF"
 
|Mean propeller twist ||align="right"| +18° ||align="right"| +16° ||align="right"| 0°
 
|-----
 
|Glycosyl angle ||align="center"| anti ||align="center"| anti ||align="center"| C: anti,<br> G: syn
 
|—bgcolor="#EFEFEF"
 
|Sugar pucker ||align="center"| C3'-endo ||align="center"| C2'-endo ||align="center"| C: C2'-endo,<br>G: C2'-exo
 
|-----
 
|Diameter ||align="right"| 2.6 nm ||align="right"| 2.0 nm ||align="right"| 1.8 nm
 
|—bgcolor="#EFEFEF"
 
|}
 
  
===Non-helical forms===
+
===Replication===
There is an argument to be made that the native, intracellular form of DNA is not the B-form double helix, as commonly supposed. Rather, this argument proposes, the strands of DNA remain almost entirely separate in their normal states.
+
[[Cell division]] is essential for an organism to grow, but when a cell divides it must replicate the DNA in its genome so that the two daughter cells have the same genetic information as their parent.  
Information on this alternative theory is available from this online book, presented in PDF format:
 
  
http://www.notahelix.com/delmonte/new_struct_mol_biol.pdf
+
The double-stranded structure of DNA provides a simple mechanism for [[DNA replication]]. Here, the two strands are separated and then each strand's complementary DNA sequence is recreated by an [[enzyme]] called [[DNA polymerase]]. This enzyme makes the complementary strand by finding the correct base through complementary base pairing, and bonding it onto the original strand. As DNA polymerases can only extend a DNA strand in a 5′ to 3′ direction, different mechanisms are used to copy the antiparallel strands of the double helix.<ref>M. M. Albà, "Replicative DNA polymerases," ''Genome Biol'' 2(1) (2001). PMID 11178285. </ref> In this way, the base on the old strand dictates which base appears on the new strand, and the cell ends up with a perfect copy of its DNA.
  
and a recent research paper summarises some key experimental data which are better explained by SBS models than by the double helix:
+
==Interactions with proteins==
 +
All the functions of DNA depend on interactions with [[protein]]s. These protein interactions can be non-specific, or the protein can bind specifically to a single DNA sequence. [[Enzyme]]s can also bind to DNA and of these, the polymerases that copy the DNA base sequence in transcription and DNA replication are particularly important.
  
http://www.ias.ac.in/currsci/dec102003/1564.pdf
+
===DNA-binding proteins===
 +
<div class="thumb tright" style="background-color: #f9f9f9; border: 1px solid #CCCCCC; margin:0.5em;">
 +
{|border="0" width=400px border="0" cellpadding="0" cellspacing="0" style="font-size: 85%; border: 1px solid #CCCCCC; margin: 0.3em;"
 +
|[[Image:Nucleosome 2.jpg|400px]]
 +
|-
 +
|}
 +
<div style="border: none; width:400px;"><div class="thumbcaption">Interaction of DNA with [[histone]]s (shown in white, top). These proteins' basic amino acids (below left, blue) bind to the acidic phosphate groups on DNA (below right, red).</div></div></div>
  
with subsequent correspondence:
+
Structural proteins that bind DNA are well-understood examples of non-specific DNA-protein interactions. Within chromosomes, DNA is held in complexes with structural proteins. These proteins organize the DNA into a compact structure called [[chromatin]]. In [[eukaryote]]s, this structure involves DNA binding to a complex of small basic proteins called [[histone]]s, while in [[prokaryote]]s multiple types of proteins are involved.<ref>K. Sandman, S. Pereira, and J. Reeve, "Diversity of prokaryotic chromosomal proteins and the origin of the nucleosome," ''Cell Mol Life Sci '' 54(12) (1998): 1350&ndash;1364. PMID 9893710.</ref><ref>R. T. Dame, "The role of nucleoid-associated proteins in the organization and compaction of bacterial chromatin," ''Mol. Microbiol.'' 56(4) (2005): 858-870. PMID 15853876.</ref> The histones form a disk-shaped complex called a [[nucleosome]], which contains two complete turns of double-stranded DNA wrapped around its surface. These non-specific interactions are formed through basic residues in the histones making [[ionic bond]]s to the acidic sugar-phosphate backbone of the DNA, and are therefore largely independent of the base sequence.<ref>K. Luger, A. Mäder, R. Richmond, D. Sargent, and T. Richmond, "Crystal structure of the nucleosome core particle at 2.8 A resolution," ''Nature'' 389(6648) (1997): 251&ndash;260. PMID 9305837.</ref> Chemical modifications of these basic amino acid residues include [[methylation]], [[phosphorylation]], and [[acetylation]].<ref>T. Jenuwein, and C. Allis, "Translating the histone code," ''Science'' 293(5532) (2001): 1074-1080. PMID 11498575.</ref> These chemical changes alter the strength of the interaction between the DNA and the histones, making the DNA more or less accessible to [[transcription factor]]s and changing the rate of transcription.<ref>T. Ito, "Nucleosome assembly and remodelling," ''Curr Top Microbiol Immunol'' 274(2003): 1&ndash;22. PMID 12596902.</ref> Other non-specific DNA-binding proteins found in chromatin include the high-mobility group proteins, which bind preferentially to bent or distorted DNA.<ref>J. Thomas, "HMG1 and 2: architectural DNA-binding proteins," ''Biochem Soc Trans'' 29(4) (2001): 395&ndash;401. PMID 11497996.</ref> These proteins are important in bending arrays of nucleosomes and arranging them into more complex chromatin structures.<ref>R. Grosschedl, K. Giese, and J. Pagel, "HMG domain proteins: architectural elements in the assembly of nucleoprotein structures," ''Trends Genet'' 10(3) (1994): 94–100. PMID 8178371.</ref>
  
http://www.ias.ac.in/currsci/may252004/1352.pdf
+
A distinct group of DNA-binding proteins are the single-stranded-DNA-binding proteins that specifically bind single-stranded DNA. In humans, replication protein A is the best-characterized member of this family and is essential for most processes where the double helix is separated, including DNA replication, recombination, and DNA repair.<ref>C. Iftode, Y. Daniely, and J. Borowiec, "Replication protein A (RPA): the eukaryotic SSB," ''Crit Rev Biochem Mol Biol'' 34(3) (1999): 141&ndash;180. PMID 10473346.</ref> These binding proteins seem to stabilize single-stranded DNA and protect it from forming [[stem loop]]s or being degraded by [[nuclease]]s.
  
However, these theories have problems of their own, such as explaining the near-perfect symmetry of DNA in cells and the activity of DNA repair in the absence of a base-paired strand for comparison. Additionally, the activity of [[topoisomerase|topoisomerases]] would be entirely redundant, and not nearly as important to cellular function as it patently is, if not for the fact that base-paired double-strands are at least the primary form of cellular DNA.
+
[[Image:Lambda repressor 1LMB.png|thumb|right|300px|The lambda repressor [[helix-turn-helix]] transcription factor bound to its DNA target<ref>Created from [https://www.rcsb.org/structure/1LMB PDB 1LMB] Retrieved January 23, 2023.</ref>]]
 +
In contrast, other proteins have evolved to specifically bind particular DNA sequences. The most intensively studied of these are the various classes of [[transcription factor]]s, which are proteins that regulate transcription. Each one of these proteins bind to one particular set of DNA sequences and thereby activates or inhibits the transcription of genes with these sequences close to their [[promoter]]s. The transcription factors do this in two ways. Firstly, they can bind the RNA polymerase responsible for transcription, either directly or through other mediator proteins; this locates the polymerase at the promoter and allows it to begin transcription.<ref>L. Myers, and R. Kornberg, "Mediator of transcriptional regulation," ''Annu Rev Biochem'' 69 (2000): 729&ndash;749. PMID 10966474.</ref> Alternatively, transcription factors can bind [[enzyme]]s that modify the histones at the promoter; this will change the accessibility of the DNA template to the polymerase.<ref>B. Spiegelman, and R. Heinrich, "Biological control through regulated transcriptional coactivators," ''Cell'' 119(2) (2004): 157-167. PMID 15479634.</ref>
  
==Strand direction==
+
As these DNA targets can occur throughout an organism's genome, changes in the activity of one type of transcription factor can affect thousands of genes.<ref>Z. Li, S. Van Calcar, C. Qu, W. Cavenee, M. Zhang, and B. Ren, "A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells," ''Proc Natl Acad Sci U S A'' 100(14) (2003): 8164&ndash;8169. PMID 12808131. Retrieved November 29, 2017.</ref> Consequently, these proteins are often the targets of the [[signal transduction]] processes that mediate responses to environmental changes or cellular differentiation and development. The specificity of these transcription factors' interactions with DNA come from the proteins making multiple contacts to the edges of the DNA bases, allowing them to "read" the DNA sequence. Most of these base-interactions are made in the major groove, where the bases are most accessible.<ref>C. Pabo, and R. Sauer, "Protein-DNA recognition," ''Annu Rev Biochem'' 53 (1984): 293&ndash;321. PMID 6236744.</ref>
The asymmetric shape and linkage of nucleotides means that a DNA strand always has a discernible orientation or directionality. Because of this directionality, close inspection of a double helix reveals that nucleotides are heading one way along one strand (the "''ascending strand''"), and the other way along the other strand (the "''descending strand''"). This arrangement of the strands is called '''antiparallel'''.
 
  
===Chemical nomenclature ([[5' end|5']] and [[3' end|3']])===
+
[[Image:EcoRV 1RVA.png|thumb|right|400px|The [[restriction enzyme]] [[EcoRV]] (green) in a complex with its substrate DNA<ref>Created from [https://www.rcsb.org/structure/1RVA PDB 1RVA]. Retrieved January 23, 2023.</ref>]]
For reasons of chemical nomenclature, people who work with DNA refer to the asymmetric ends of ("five prime" and "three prime"). Within a cell, the enzymes that perform [[DNA replication|replication]] and [[DNA transcription|transcription]] read DNA in the "'''[[3' end|3']] to [[5' end|5']] direction'''", while the enzymes that perform translation read in the opposite directions (on [[RNA|RNA]]). However, because chemically produced DNA is synthesized and manipulated in the opposite or in non-directional manners, the orientation should not be assumed. In a vertically oriented double helix, the [[3' end|3']] strand is said to be ascending while the [[5' end|5']] strand is said to be descending.
 
  
===Sense and antisense===
+
===DNA-modifying enzymes===
As a result of their antiparallel arrangement and the sequence-reading preferences of enzymes, even if both strands carried identical instead of complementary sequences, cells could properly translate only one of them. The other strand a cell can only read backwards. [[molecular biology|Molecular biologists]] call a sequence "'''sense'''" if it is translated or translatable, and they call its complement  "'''antisense'''". It follows then, somewhat paradoxically, that the template for transcription is the ''antisense'' strand. The resulting transcript is an RNA replica of the ''sense'' strand and is itself ''sense.''
+
====Nucleases and ligases====
 +
[[Nuclease]]s are [[enzyme]]s that cut DNA strands by catalyzing the [[hydrolysis]] of the [[phosphodiester bond]]s. Nucleases that hydrolyse nucleotides from the ends of DNA strands are called [[exonuclease]]s, while [[endonuclease]]s cut within strands. The most frequently-used nucleases in [[molecular biology]] are the [[restriction enzyme|restriction endonucleases]], which cut DNA at specific sequences. For instance, the EcoRV enzyme shown to the left recognizes the 6-base sequence 5′-GAT|ATC-3′ and makes a cut at the vertical line.  
  
===Distinction between sense and antisense strands===
+
In nature, these enzymes protect [[bacteria]] against [[phage]] infection by digesting the phage DNA when it enters the bacterial cell, acting as part of the [[restriction modification system]].<ref>T. Bickle, and D. Krüger, "Biology of DNA restriction," ''Microbiol Rev'' 57(2) (1993): 434&ndash;450. PMID 8336674. </ref> In technology, these sequence-specific nucleases are used in [[clone (genetics)|molecular cloning]] and [[DNA fingerprinting]].
A small proportion of genes in [[prokaryotes]], and more in [[plasmids]] and [[viruses]], blur the distinction made above between sense and antisense strands. Certain sequences of their [[genome|genomes]] do double duty, encoding one protein when read 5' to 3' along one strand, and a second protein when read in the opposite direction (still 5' to 3') along the other strand. As a result, the genomes of these viruses are unusually compact for the number of genes they contain, which biologists view as an [[adaptation (biology)|adaptation]]. This merely confirms that there is no biological distinction between the two strands of the double helix. Typically each strand of a DNA double helix will act as sense and antisense in different regions.
 
  
===As viewed by topologists===
+
Enzymes called [[DNA ligase]]s can rejoin cut or broken DNA strands, using the energy from either [[adenosine triphosphate]] or [[nicotinamide adenine dinucleotide]].<ref name=Doherty>A. J. Doherty, and S. W. Suh, "Structural and mechanistic conservation in DNA ligases," ''Nucleic Acids Res'' 28(21) (2000): 4051&ndash;4058. PMID 11058099.</ref> Ligases are particularly important in [[lagging strand]] DNA replication, as they join together the short segments of DNA produced at the [[replication fork]] into a complete copy of the DNA template. They are also used in [[DNA repair]] and [[genetic recombination]].<ref name=Doherty/>
Topologists like to note that the juxtaposition of the [[3′ end]] of one DNA strand beside the [[5′ end]] of the other at both ends of a double-helical segment makes the arrangement a "[[crab canon]]".
 
  
==Single-stranded DNA (ssDNA) and repair of mutations==
+
====Topoisomerases and helicases====
In some [[virus]]es DNA appears in a non-helical, single-stranded form. Because many of the [[DNA repair]] mechanisms of cells work only on paired bases, viruses that carry single-stranded DNA [[genome]]s [[mutation|mutate]] more frequently than they would otherwise. As a result, such species may adapt more rapidly to avoid extinction. The result would not be so favorable in more complicated and more slowly replicating organisms, however, which may explain why only viruses carry single-stranded DNA. These viruses presumably also benefit from the lower cost of replicating one strand versus two.
+
[[Topoisomerase]]s are enzymes with both nuclease and ligase activity. These proteins change the amount of [[DNA supercoil|supercoiling]] in DNA. Some of these enzyme work by cutting the DNA helix and allowing one section to rotate, thereby reducing its level of supercoiling; the enzyme then seals the DNA break.<ref name=Champoux/> Other types of these enzymes are capable of cutting one DNA helix and then passing a second strand of DNA through this break, before rejoining the helix.<ref>A. Schoeffler, and J. Berger, "Recent advances in understanding structure-function relationships in the type II topoisomerase mechanism," ''Biochem Soc Trans'' 33(6) (2005): 1465&ndash;1470. PMID 16246147.</ref> Topoisomerases are required for many processes involving DNA, such as DNA replication and transcription.<ref name=Wang/>
  
==History of DNA research==
+
[[Helicase]]s are proteins that are a type of [[molecular motor]]. They use the chemical energy in [[nucleoside triphosphate]]s, predominantly [[Adenosine triphosphate|ATP]], to break hydrogen bonds between bases and unwind the DNA double helix into single strands.<ref>N. Tuteja, and R. Tuteja, "Unraveling DNA helicases. Motif, structure, mechanism and function," ''Eur J Biochem'' 271(10) (2004): 1849–1863. PMID 15128295.</ref> These enzymes are essential for most processes where enzymes need to access the DNA bases.
[[Image:JamesWatson.jpg|thumb|200px|[[James D. Watson|James Watson]] in the [[Cavendish Laboratory]] at the [[University of Cambridge]]]]
 
The discovery that DNA was the carrier of genetic information was a process that required many earlier discoveries. The existence of DNA was discovered in the mid 19th century. However, it was only in the early 20th century that researchers began suggesting that it might store genetic information. This was only accepted after the structure of DNA was elucidated by [[James D. Watson]] and [[Francis Crick]] in their 1953 [[Nature (journal)|''Nature'']] publication. Watson and Crick proposed the [[central dogma]] of molecular biology in 1957, describing the process whereby proteins are produced from [[cell nucleus|nucleic]] DNA. In 1962 Watson, Crick, and [[Maurice Wilkins]] jointly received the Nobel Prize for their determination of the structure of DNA. The Nobel Prize would not have been given to them if it hadn't been for [[Rosalind Franklin]] and her famous radiograph, Photo Fifty-One. Franklin, however, did not get much attention until recently, because before the Nobel Prize was given to Watson, Crick, and Wilkins, Franklin died of ovarian cancer. The most probable reason Franklin contracted cancer was her exposure to X-ray radiation.
 
  
===First isolation of DNA===
+
====Polymerases====
Working in the 19th century, biochemists initially isolated DNA and RNA (mixed together) from cell nuclei. They were relatively quick to appreciate the polymeric nature of their "nucleic acid" isolates, but realized only later that nucleotides were of two types—one containing [[ribose]] and the other [[deoxyribose]]. It was this subsequent discovery that led to the identification and naming of DNA as a substance distinct from RNA.
+
[[Polymerase]]s are [[enzyme]]s that synthesise polynucleotide chains from [[nucleoside triphosphate]]s. They function by adding nucleotides onto the 3′ [[hydroxyl|hydroxyl group]] of the previous nucleotide in the DNA strand. As a consequence, all polymerases work in a 5′ to 3′ direction.<ref name=Joyce>C. Joyce and T. Steitz, "Polymerase structures and function: variations on a theme?" ''J Bacteriol'' 177(11) (1995): 6321&ndash;6329. PMID 7592405. </ref> In the [[active site]] of these enzymes, the nucleoside triphosphate substrate base-pairs to a single-stranded polynucleotide template: this allows polymerases to accurately synthesise the complementary strand of this template. Polymerases are classified according to the type of template that they use.
  
[[Friedrich Miescher]] (1844-1895) discovered a substance he called "nuclein" in 1869. Somewhat later, he isolated a pure sample of the material now known as DNA from the sperm of salmon, and in 1889 his pupil, [[Richard Altmann]], named it  "nucleic acid". This substance was found to exist only in the chromosomes.
+
In [[DNA replication]], a DNA-dependent [[DNA polymerase]] makes a DNA copy of a DNA sequence. Accuracy is vital in this process, so many of these polymerases have a [[Proofreading#Proofreading in biology|proofreading]] activity. Here, the polymerase recognizes the occasional mistakes in the synthesis reaction by the lack of base pairing between the mismatched nucleotides. If a mismatch is detected, a 3′ to 5′ [[exonuclease]] activity is activated and the incorrect base removed.<ref>U. Hubscher, G. Maga, and S. Spadari, "Eukaryotic DNA polymerases," ''Annu Rev Biochem'' 71 (2002): 133&ndash;163. PMID 12045093.</ref> In most organisms, DNA polymerases function in a large complex called the [[replisome]] that contains multiple accessory subunits, such as the [[DNA clamp]] or [[helicase]]s.<ref>A. Johnson, and M. O'Donnell, "Cellular DNA replicases: components and dynamics at the replication fork," ''Annu Rev Biochem'' 74 (2005): 283&ndash;315. PMID 15952889.</ref>
  
In 1929 [[Phoebus Levene]] at the [[Rockefeller Institute]] identified the components (the four bases, the sugar and the phosphate chain) and he showed that the components of DNA were linked in the order phosphate-sugar-base.  He called each of these units a [[nucleotide]] and suggested the DNA molecule consisted of a string of nucleotide units linked together through the phosphate groups, which are the 'backbone' of the molecule.  However Levene thought the chain was short and that the bases repeated in the same fixed order.  [[Torbjorn Oskar Caspersson|Torbjorn Caspersson]] and [[Einar Hammersten]] showed that DNA was a polymer.
+
RNA-dependent DNA polymerases are a specialized class of polymerases that copy the sequence of an RNA strand into DNA. They include [[reverse transcriptase]], which is a [[virus|viral]] enzyme involved in the infection of cells by [[retrovirus]]es, and [[telomerase]], which is required for the replication of [[telomere]]s.<ref>L. Tarrago-Litvak, M. Andréola, G. Nevinsky, L. Sarih-Cottin, and S. Litvak, "The reverse transcriptase of HIV-1: from enzymology to therapeutic intervention," ''FASEB J'' 8(8) (1994): 497–503. PMID 7514143.</ref><ref name=Greider/> Telomerase is an unusual polymerase because it contains its own RNA template as part of its structure.<ref name=Nugent/>
  
===Chromosomes and inherited traits===
+
Transcription is carried out by a DNA-dependent [[RNA polymerase]] that copies the sequence of a DNA strand into RNA. To begin transcribing a gene, the RNA polymerase binds to a sequence of DNA called a [[promoter]] and separates the DNA strands. It then copies the gene sequence into a [[messenger RNA]] transcript until it reaches a region of DNA called the [[terminator (genetics)|terminator]], where it halts and detaches from the DNA. As with human DNA-dependent DNA polymerases, RNA polymerase II, the enzyme that transcribes most of the genes in the human genome, operates as part of a large protein complex with multiple regulatory and accessory subunits.<ref>E. Martinez, "Multi-protein complexes in eukaryotic gene transcription," ''Plant Mol Biol'' 50(6) (2002): 925&ndash;947. PMID 12516863.</ref>
[[Max Delbrück]], [[Nikolai V. Timofeeff-Ressovsky]], and [[Karl G. Zimmer]] published results in 1935 suggesting that chromosomes are very large molecules the structure of which can be changed by treatment with [[X-ray]]s, and that by so changing their structure it was possible to change the heritable characteristics governed by those chromosomes. In 1937 [[William Astbury]] produced the first [[X-ray diffraction]] patterns from DNA. He was not able to propose the correct structure but the patterns showed that DNA had a regular structure and therefore it might be possible to deduce what this structure was.
 
  
In 1943, [[Oswald Theodore Avery]] and a team of scientists discovered that traits proper to the "smooth" form of the ''Pneumococcus'' could be transferred to the "rough" form of the same bacteria merely by making the killed "smooth" (S) form available to the live "rough" (R) form. Quite unexpectedly, the living R ''Pneumococcus'' bacteria were transformed into a new strain of the S form, and the transferred S characteristics turned out to be heritable. Avery called the medium of transfer of traits the [[transforming principle]]; he identified DNA as the transforming principle, and not [[protein]] as previously thought. He essentially redid [[Fredrick Griffith]]'s experiment. In 1953, [[Alfred Hershey]] and [[Martha Chase]] did an experiment ([[Hershey-Chase experiment]]) that showed, in [[T2 phage]], that DNA is the [[genetic material]] (Hershey shared the Nobel prize with Luria).
+
==Genetic recombination==
 +
[[Image:Chromosomal Recombination.svg|thumb|400px|right|Recombination involves the breakage and rejoining of two chromosomes (M and F) to produce two re-arranged chromosomes (C1 and C2).]]
 +
<div class="thumb tright" style="background-color: #f9f9f9; border: 1px solid #CCCCCC; margin:0.5em;">
 +
{|border="0" width=300px border="0" cellpadding="0" cellspacing="0" style="font-size: 85%; border: 1px solid #CCCCCC; margin: 0.3em;"
 +
|[[Image:Holliday Junction cropped.png|300px]]
 +
|-
 +
|[[Image:Holliday junction coloured.png|300px]]
 +
|}
 +
<div style="border: none; width:300px;"><div class="thumbcaption">Structure of the [[Holliday junction]] intermediate in [[genetic recombination]]. The four separate DNA strands are coloured red, blue, green and yellow.<ref>Created from [https://www.rcsb.org/structure/1M6G PDB 1M6G]. Retrieved January 23, 2023.</ref></div></div></div>
 +
{{further|[[Genetic recombination]]}}
  
[[Image:FirstSketchOfDNADoubleHelix.jpg|thumb|200px|[[Francis Crick]]'s first sketch of the [[deoxyribonucleic acid]] double-helix pattern]]
+
A DNA helix usually does not interact with other segments of DNA, and in human cells the different chromosomes even occupy separate areas in the nucleus called "chromosome territories."<ref>T. Cremer and C. Cremer, "Chromosome territories, nuclear architecture and gene regulation in mammalian cells," ''Nat Rev Genet'' 2(4) (2001): 292–301. PMID 11283701.</ref> This physical separation of different chromosomes is important for the ability of DNA to function as a stable repository for information, as one of the few times chromosomes interact is during [[chromosomal crossover]] when they [[genetic recombination|recombine]]. Chromosomal crossover is when two DNA helices break, swap a section and then rejoin.
In 1944, the renowned physicist, [[Erwin Schrödinger]], published a brief book entitled ''[[What is Life? (Schrödinger)| What is Life?]]'', where he maintained that chromosomes contained what he called the "hereditary code-script" of life. He added: "But the term code-script is, of course, too narrow. The chromosome structures are at the same time instrumental in bringing about the development they foreshadow. They are law-code and executive power — or, to use another simile, they are architect's plan and builder's craft — in one." He conceived of these dual functional elements as being woven into the molecular structure of chromosomes.  By understanding the exact molecular structure of the chromosomes one could hope to understand both the "architect's plan" and also how that plan was carried out through the "builder's craft."  Three groups took up Schrödinger's challenge to work out the structure of the chromosomes and the question of how the segments of the chromosomes that were conceived to relate to specific traits could
 
possibly do their jobs.
 
  
Just how the presence of specific features in the molecular structure of chromosomes could produce traits and behaviors in living organisms was unimaginable at the time. Because chemical dissection of DNA samples always yielded the same four nucleotides, the chemical composition of DNA appeared simple, perhaps even uniform. Organisms, on the other hand, are fantastically complex individually and widely diverse collectively. Geneticists did not speak of genes as conveyors of "information" in such words, but if they had, they would not have hesitated to quantify the amount of information that genes need to convey as vast. The idea that information might reside in a chemical in the same way that it exists in text—as a finite alphabet of letters arranged in a sequence of unlimited length--had not yet been conceived. It would emerge upon the discovery of DNA's structure, but few researchers imagined that DNA's structure had much to say about genetics.
+
Recombination allows chromosomes to exchange genetic information and produces new combinations of genes, which can be important for variability added into a population, and thus evolution, and can be important in the rapid evolution of new proteins.<ref>C. Pál, B. Papp, and M. Lercher, "An integrated view of protein evolution," ''Nat Rev Genet'' 7(5) (2006): 337&ndash;348. PMID 16619049.</ref> Genetic recombination can also be involved in DNA repair, particularly in the cell's response to double-strand breaks.<ref>M. O'Driscoll and P. Jeggo, "The role of double-strand break repair—insights from human genetics," ''Nat Rev Genet'' 7(1) (2006): 45&ndash;54. PMID 16369571.</ref>
  
===Discovery of the structure of DNA===
+
The most common form of chromosomal crossover is [[homologous recombination]], where the two chromosomes involved share very similar sequences. Non-homologous recombination can be damaging to cells, as it can produce [[chromosomal translocation]]s and genetic abnormalities. The recombination reaction is catalyzed by enzymes known as ''recombinases,'' such as [[RAD51]].<ref>S. Vispé and M. Defais, "Mammalian Rad51 protein: a RecA homologue with pleiotropic functions," ''Biochimie'' 79(9-10) (1997): 587-592. PMID 9466696.</ref>  The first step in recombination is a double-stranded break either caused by an [[endonuclease]] or damage to the DNA.<ref>M. J. Neale, and S. Keeney, "Clarifying the mechanics of DNA strand exchange in meiotic recombination," ''Nature'' 442 (7099) (2006): 153-158. PMID 2006.</ref> A series of steps catalyzed in part by the recombinase then leads to joining of the two helices by at least one [[Holliday junction]], in which a segment of a single strand in each helix is annealed to the complementary strand in the other helix. The Holliday junction is a tetrahedral junction structure that can be moved along the pair of chromosomes, swapping one strand for another. The recombination reaction is then halted by cleavage of the junction and re-ligation of the released DNA.<ref>M. Dickman, S. Ingleston, S. Sedelnikova, J. Rafferty, R. Lloyd, J. Grasby, and D. Hornby, "The RuvABC resolvasome," ''Eur J Biochem'' 269(22) (2002): 5492&ndash;5501. PMID 12423347.</ref>
In the 1950s, three groups made it their goal to determine the structure of DNA. The first group to start was at [[King's College London]] and was led by [[Maurice Wilkins]] and was later joined by [[Rosalind Franklin]]. Another group consisting of [[Francis Crick]] and [[James D. Watson]] was at [[University of Cambridge|Cambridge]].  A third group was at [[Caltech]] and was led by [[Linus Pauling]].  Crick and Watson built physical models using metal rods and balls, in which they incorporated the known chemical structures of the nucleotides, as well as the known position of the linkages joining one nucleotide to the next along the polymer. At King's College Maurice Wilkins and Rosalind Franklin examined [[crystallography|X-ray diffraction]] patterns of DNA fibers. Of the three groups, only the London group was able to produce good quality diffraction patterns and thus produce sufficient quantitative data about the structure.
 
  
[[Image:DNA-labels.png|thumb|200px|The chemical structure of DNA]]
+
==Evolution of DNA metabolism==
 +
DNA contains the genetic information that allows all modern living things to function, grow, and reproduce. However, it is unclear how long in the 4-billion-year [[Timeline of evolution|history of life]] DNA has performed this function, as it has been proposed that the earliest forms of life may have used RNA as their genetic material.<ref>G. Joyce, "The antiquity of RNA-based evolution," ''Nature'' 418(6894) (2002): 214-221. PMID 12110897.</ref> RNA may have acted as the central part of early cell metabolism as it can both transmit genetic information and carry out [[catalysis]] as part of [[ribozyme]]s.<ref>R. Davenport, "Ribozymes. Making copies in the RNA world," ''Science'' 292(5520) (2001): 1278. PMID 11360970</ref> This ancient [[RNA world hypothesis|RNA world]], where nucleic acid would have been used for both catalysis and genetics, may have influenced the development of the current genetic code based on four nucleotide bases. This would occur since the number of unique bases in such an organism is a trade-off between a small number of bases increasing replication accuracy and a large number of bases increasing the catalytic efficiency of ribozymes.<ref>E. Szathmáry, [https://www.pnas.org/doi/pdf/10.1073/pnas.89.7.2614 "What is the optimum size for the genetic alphabet?"] ''Proc Natl Acad Sci U S A'' 89(1992, issue 7): 2614&ndash;1618. PMID 1372984. Retrieved January 23, 2023.</ref>
  
====Helix structure====
+
Unfortunately, there is no direct evidence of ancient genetic systems, as recovery of DNA from most [[fossil]]s is impossible. This is because DNA will survive in the environment for less than one million years and slowly degrades into short fragments in solution.<ref>T. Lindahl, "Instability and decay of the primary structure of DNA," ''Nature'' 362(6422) (1993): 709&ndash;715. PMID 8469282.</ref> Although claims for older DNA have been made, most notably a report of the isolation of a viable bacterium from a salt crystal 250-million years old,<ref>R. Vreeland, W. Rosenzweig, and D. Powers, "Isolation of a 250 million-year-old halotolerant bacterium from a primary salt crystal," ''Nature'' 407(6806) (2000): 897&ndash;900. PMID 11057666.</ref> these claims are controversial and have been disputed.<ref>M. Hebsgaard, M. Phillips, and E. Willerslev, "Geologically ancient DNA: fact or artefact?" ''Trends Microbiol'' 13(5) (2005): 212&ndash;220. PMID 15866038.</ref><ref>D. Nickle, G. Learn, M. Rain, J. Mullins, and J. Mittler, "Curiously modern DNA for a '250 million-year-old' bacterium," ''J Mol Evol'' 54(1) (2002): 134&ndash;137. PMID 11734907.</ref>
In 1948 Pauling discovered that many proteins included helical (see [[alpha helix]]) shapes. Pauling had deduced this structure from X-ray patterns. (Pauling was also later to suggest an incorrect three chain helical structure based on Astbury's data.)  Even in the initial diffraction data from DNA by Maurice Wilkins, it was evident that the structure involved helices. But this insight was only a beginning. There remained the questions of how many strands came together, whether this number was the same for every helix, whether the bases pointed toward the helical axis or away, and ultimately what were the explicit angles and coordinates of all the bonds and atoms. Such questions motivated the modeling efforts of Watson and Crick.
 
  
====Complementary nucleotides====
+
==Uses in technology==
In their modeling, Watson and Crick restricted themselves to what they saw as chemically and biologically reasonable. Still, the breadth of possibilities was very wide. A breakthrough occurred in 1952, when [[Erwin Chargaff]] visited Cambridge and inspired Crick with a description of experiments Chargaff had published in 1947. Chargaff had observed that the proportions of the four nucleotides vary between one DNA sample and the next, but that for particular pairs of nucleotides — adenine and thymine, guanine and cytosine — the two nucleotides are always present in equal proportions.
+
===Genetic engineering===
 +
Modern [[biology]] and [[biochemistry]] make intensive use of recombinant DNA technology. [[Recombinant DNA]] is a man-made DNA sequence that has been assembled from other DNA sequences. They can be [[transformation (genetics)|transformed]] into organisms in the form of [[plasmid]]s or in the appropriate format, by using a [[viral vector]].<ref>S. P. Goff and P. Berg, "Construction of hybrid viruses containing SV40 and lambda phage DNA segments and their propagation in cultured monkey cells," ''Cell'' 9(4 Pt2) (1976): 695–705. PMID 189942.</ref> The [[genetic engineering|genetically modified]] organisms produced can be used to produce products such as recombinant [[protein]]s, used in medical research,<ref>L. Houdebine, "Transgenic animal models in biomedical research," ''Methods Mol Biol'' 360 (2007): 163&ndash;202. PMID 17172731.</ref> or be grown in [[agriculture]].<ref>H. Daniell, and A. Dhingra, "Multigene engineering: dawn of an exciting new era in biotechnology," ''Curr Opin Biotechnol'' 13(2) (2002): 136&ndash;141. PMID 11950565.</ref><ref>D. Job, "Plant  biotechnology in agriculture," ''Biochimie'' 84(11) (2002): 1105&ndash;1110. PMID 12595138.</ref>Recombinant DNA technology allows scientists to transplant a [[gene]] for a particular protein into a rapidly reproducing [[bacteria]] to mass produce the protein. As a result of this technology, bacteria have been used to produce human [[insulin]] beginning in 1978.  
  
====Watson and Crick's model====
+
===Forensics ===
[[Image:DNA Model Crick-Watson.jpg|thumb|200px|right|Crick and Watson DNA model built in 1953, currently on display at the [[National Science Museum]] in London.]]
+
[[Forensic science|Forensic scientists]] can use DNA in [[blood]], [[semen]], [[skin]], [[saliva]], or [[hair]] at a crime scene to identify a perpetrator. This process is called [[genetic fingerprinting]], or more accurately, DNA profiling. In DNA profiling, the lengths of variable sections of repetitive DNA, such as [[short tandem repeat]]s and [[minisatellite]]s, are compared between people. This method is usually an extremely reliable technique for identifying a criminal.<ref>A. Collins and N. Morton, [https://www.pnas.org/doi/pdf/10.1073/pnas.91.13.6007 "Likelihood ratios for DNA identification,"] ''Proc Natl Acad Sci U S A'' 91(13) (1994): 6007&ndash;6011. PMID 8016106. Retrieved January 23, 2023.</ref> However, identification can be complicated if the scene is contaminated with DNA from several people.<ref>B. Weir, C. Triggs, L. Starling, L. Stowell, K. Walsh, and J. Buckleton, "Interpreting DNA mixtures," ''J Forensic Sci'' 42(2) (1997): 213&ndash;222. PMID 9068179.</ref> DNA profiling was developed in 1984 by British geneticist Sir [[Alec Jeffreys]],<ref>A. Jeffreys, V. Wilson, and S. Thein, "Individual-specific 'fingerprints' of human DNA," ''Nature'' 316(6023) (1985): 76&ndash;79. PMID 2989708.</ref> and first used in forensic science to convict Colin Pitchfork in the 1988 [[Enderby murders]] case. Some criminal investigations have been solved when DNA from crime scenes has matched relatives of the guilty individual, rather than the individual himself or herself.<ref>S. Bhattacharya, "Killer convicted thanks to relative's DNA," ''Newscientist.com'', April 20, 2004. </ref>
  
[[James D. Watson|Watson]] and [[Francis Crick|Crick]] had begun to contemplate double helical arrangements, but they lacked information about the amount of twist (pitch) and the distance between the two strands. [[Rosalind Franklin]] had to disclose some of her findings for the [[Medical Research Council]] and Crick saw this material through [[Max Perutz|Max Perutz's]] links to the MRC. Franklin's work confirmed a double helix that was on the outside of the molecule and also gave an insight into its symmetry, in particular that the two helical strands ran in opposite directions.
+
People convicted of certain types of crimes may be required to provide a sample of DNA for a database. This has helped investigators solve old cases where only a DNA sample was obtained from the scene. DNA profiling can also be used to identify victims of mass casualty incidents.
  
Watson and Crick were again greatly assisted by more of Franklin's data. This is controversial because Franklin's critical X-ray pattern was shown to Watson and Crick without Franklin's knowledge or permission. Wilkins showed the famous Photo 51 to Watson at his lab immediately after Watson had been unsuccessful in asking Franklin to collaborate to beat Pauling in finding the structure.
+
===Bioinformatics===
 +
[[Bioinformatics]] involves the manipulation, searching, and [[data mining]] of DNA sequence data. The development of techniques to store and search DNA sequences have led to widely-applied advances in [[computer science]], especially [[string searching algorithm]]s, [[machine learning]], and [[database theory]].<ref>P. Baldi and S. Brunak, ''Bioinformatics: The Machine Learning Approach'' (MIT Press, 2001, ISBN 978-0262025065).</ref> String searching or matching algorithms, which find an occurrence of a sequence of letters inside a larger sequence of letters, were developed to search for specific sequences of nucleotides.<ref>D. Gusfield, ''Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology'' (Cambridge University Press 1997, ISBN 978-0521585194).</ref> In other applications such as [[text editor]]s, even simple algorithms for this problem usually suffice, but DNA sequences cause these algorithms to exhibit near-worst-case behaviour due to their small number of distinct characters. The related problem of [[sequence alignment]] aims to identify [[homology (biology)|homologous]] sequences and locate the specific [[mutation]]s that make them distinct.  
  
From the data in photograph 51 Watson and Crick were able to discern that not only was the distance between the two strands constant, but also to measure its exact value of 2 nanometres. The same photograph also gave them the 3.4 nanometre-per-10 bp "pitch" of the helix.
+
These techniques, especially [[multiple sequence alignment]], are used in studying [[phylogenetics|phylogenetic]] relationships and protein function.<ref>K. Sjölander, [https://academic.oup.com/bioinformatics/article/20/2/170/205067?login=false "Phylogenomic inference of protein molecular function: Advances and challenges,"] ''Bioinformatics'' 20(2) (2004): 170-179. PMID 14734307. Retrieved January 23, 2023.</ref> Data sets representing entire genomes' worth of DNA sequences, such as those produced by the [[Human Genome Project]], are difficult to use without annotations, which label the locations of genes and regulatory elements on each chromosome. Regions of DNA sequence that have the characteristic patterns associated with protein- or RNA-coding genes can be identified by [[gene finding]] algorithms, which allow researchers to predict the presence of particular [[gene product]]s in an organism even before they have been isolated experimentally.<ref name="Mount">D. M. Mount, ''Bioinformatics: Sequence and Genome Analysis,'' 2nd edition (Cold Spring, NY: Cold Spring Harbor Laboratory Press 2004, ISBN 0879697121).</ref>
  
The final insight came when Crick and Watson saw that a complementary pairing of the bases could provide an explanation for Chargaff's puzzling finding. However the structure of the bases had been incorrectly guessed in the textbooks as the [[enol]] [[tautomer]] when they were more likely to be in the [[keto]] form. When [[Jerry Donohue]] pointed this fallacy out to Watson, Watson quickly realised that the pairs of adenine and thymine, and guanine and cytosine were almost identical in shape and so would provide equally sized 'rungs' between the two strands. With the base-pairing, the Watson and Crick quickly converged upon a model, which they announced before Franklin herself had published any of her work.
+
===DNA nanotechnology===
 +
[[Image:DNA nanostructures.png|thumb|400px|The DNA structure at left (schematic shown) will self-assemble into the structure visualized by [[Atomic force microscope|atomic force microscopy]] at right. [[DNA nanotechnology]] is the field which seeks to design nanoscale structures using the [[molecular recognition]] properties of DNA molecules. Image from Strong, 2004. {{doi-inline|10.1371/journal.pbio.0020073}}]]
  
Franklin was two steps away from the solution. She had not guessed the base-pairing and had not appreciated the implications of the symmetry that she had described. However she had been working almost alone and did not have regular contact with a partner like Crick and Watson, and with other experts such as Jerry Donohue.  Her notebooks show that she was aware both of Jerry Donohue's work concerning tautomeric forms of bases (she had used the keto forms for three of the bases) and of Chargaff's work.
+
DNA nanotechnology uses the unique [[molecular recognition]] properties of DNA and other nucleic acids to create self-assembing branched DNA complexes with useful properties. DNA is thus used as a structural material rather than as a carrier of biological information. This has lead to the creation of two-dimensional periodic lattices (both tile-based as well as using the "[[DNA origami]]" method) as well as three-dimensional structures in the shapes of [[Polyhedron|polyhedra]]. [[DNA machine|Nanomechanical devices]] and [[DNA computing|algorithmic self-assembly]] have also been demonstrated, and these DNA structures have been used to template the arrangement of other molecules such as [[Colloidal gold|gold nanoparticles]] and [[streptavidin]] proteins.
  
The disclosure of Franklin's data to Watson has angered some people who believe Franklin did not receive due credit at the time and that she might have discovered the structure on her own before Crick and Watson. In Crick and Watson's famous paper in Nature in 1953, they said that their work had been stimulated by the work of Wilkins and Franklin, whereas it had been the basis of their work. However they had agreed with Wilkins and Franklin that they all should publish papers in the same issue of Nature in support of the proposed structure.
+
===DNA and computation ===
 +
DNA was first used in computing to solve a small version of the directed [[Hamiltonian path problem]], an [[NP-complete]] problem.<ref>L. Adleman, "Molecular computation of solutions to combinatorial problems," ''Science'' 266(5187 (1994): 1021&ndash;1024. PMID 7973651.</ref> [[DNA computing]] is advantageous over electronic computers in power use, space use, and efficiency, due to its ability to compute in a highly parallel fashion. A number of other problems, including simulation of various [[abstract machine]]s, the [[boolean satisfiability problem]], and the bounded version of the [[traveling salesman problem]], have since been analysed using DNA computing.<ref>J. Parker, "Computing with DNA," ''EMBO Rep'' 4(1) (2003): 7&ndash;10. PMID 12524509. </ref> Due to its compactness, DNA also has a theoretical role in [[cryptography]].
  
===="Central Dogma"====
+
===History and anthropology===
Watson and Crick's model attracted great interest immediately upon its presentation. Arriving at their conclusion on [[February 21]] [[1953]], Watson and Crick made their first announcement on [[February 28]]. Their paper ''A Structure for Deoxyribose Nucleic Acid''<ref>Watson and Crick, 1953</ref> was published on April 25. In an influential presentation in 1957, Crick laid out the "[[Central Dogma]]", which foretold the relationship between DNA, RNA, and proteins, and articulated the "sequence hypothesis." A critical confirmation of the replication mechanism that was implied by the double-helical structure followed in 1958 in the form of the [[Meselson-Stahl experiment]]. Work by Crick and coworkers showed that the genetic code was based on non-overlapping triplets of bases, called codons, and [[Har Gobind Khorana]] and others deciphered the [[genetic code]] not long afterward. These findings represent the birth of [[molecular biology]].
+
Because DNA collects mutations over time, which are then inherited, it contains historical information and by comparing DNA sequences, geneticists can infer the evolutionary history of organisms, their [[phylogeny]].<ref>G. Wray, "Dating branches on the tree of life using DNA," ''Genome Biol'' 3(1) (2002). PMID 11806830. </ref> This field of [[phylogenetics]] is a powerful tool in [[evolutionary biology]]. If DNA sequences within a species are compared, [[population genetics|population geneticists]] can learn the history of particular populations. This can be used in studies ranging from [[ecological genetics]] to [[anthropology]]; for example, DNA evidence is being used to try to identify the [[Ten Lost Tribes of Israel]].<ref>Y. Kleiman, "The Cohanim/DNA Connection: The fascinating story of how DNA studies confirm an ancient biblical tradition," ''Aish.com'', January 13, 2000.</ref>
  
[[James D. Watson|Watson]], [[Francis Crick|Crick]], and [[Maurice Wilkins|Wilkins]] were awarded the 1962 [[Nobel Prize for Physiology or Medicine]] for discovering the molecular structure of DNA, by which time [[Rosalind Franklin|Franklin]] had died from cancer at 37. Nobel prizes are not awarded posthumously; had she lived, the difficult decision over whom to jointly award the prize would have been complicated as the prize can only be shared between a maximum of three; but because their work could be considered to be chemistry, it is conceivable that [[Maurice Wilkins|Wilkins]] and [[Rosalind Franklin|Franklin]] could have been awarded the [[Nobel Prize for Chemistry]] instead; see Graeme Hunter's biography of Sir Lawrence Bragg for more information on how scientists were nominated for Nobel Prizes.
+
DNA has also been used to look at modern family relationships, such as establishing family relationships between the descendants of [[Sally Hemings]] and [[Thomas Jefferson]]. This usage is closely related to the use of DNA in criminal investigations detailed above.
  
==References==
+
==Notes==
===Citations===
 
 
<references/>
 
<references/>
  
===General references===
+
==References==
 +
* Alberts, B., A. Johnson, J. Lewis, M. Raff, K. Roberts, and P. Walters. ''Molecular Biology of the Cell'' 4th edition. New York: Garland Science, 2002. ISBN 0815332181
 +
* Baldi, P., and S. Brunak. ''Bioinformatics: The Machine Learning Approach''. MIT Press, 2001. ISBN 978-0262025065
 +
* Berg, J., J. Tymoczko, and L. Stryer. ''Biochemistry''. (W. H. Freeman and Company, 2002. ISBN 0716749556
 +
* Butler, J. ''Forensic DNA Typing''.  San Diego: Academic Press, 2001. ISBN 978-0121479510
 +
* Calladine, C. R., H. R. Drew, B. F. Luisi, and A. A. Travers. ''Understanding DNA''. Elsevier Academic Press, 2003. ISBN 978-0121550899
 +
* Clayton, J., and C. Dennis (eds.). ''50 Years of DNA''. Palgrave MacMillan Press, 2003. ISBN 978-1403914798
 +
* Gusfield, D. ''Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology''. Cambridge University Press, 1997. ISBN 978-0521585194
 +
* Judson, H. F. ''The Eighth Day of Creation: Makers of the Revolution in Biology''. Cold Spring Harbor Laboratory Press, 1996. ISBN 978-0879694784
 +
*Mount, D. M. ''Bioinformatics: Sequence and Genome Analysis''. 2nd edition. Cold Spring, NY: Cold Spring Harbor Laboratory Press 2004. ISBN 0879697121
 +
* Olby, R. ''The Path to The Double Helix: Discovery of DNA''. MacMillan, 1974. ISBN 978-0486681177
 +
* Ridley, M. ''Francis Crick: Discoverer of the Genetic Code (Eminent Lives)''. HarperCollins Publishers, 2006. ISBN 978-0060823337
 +
* Watson, J. D. and F. H. C. Crick. "A structure for deoxyribose nucleic acid." ''Nature'' 171 (1953): 737&ndash;738.
 +
* Watson, J. D. ''Avoid Boring People and Other Lessons From a Life in Science''. New York: Knopf, 2007. ISBN 0375412840
 +
* Watson, J. D. ''DNA: The Secret of Life'' New York: Alfred A. Knopf, 2003. ISBN 978-0375415463
 +
* Watson, J. D. ''The Double Helix: A Personal Account of the Discovery of the Structure of DNA''. New York: Norton, 1980. ISBN 978-0393950755
  
* Watson, James D. and Francis H.C. Crick. [http://www.nature.com/nature/dna50/watsoncrick.pdf A structure for Deoxyribose Nucleic Acid] (PDF). ''[[Nature (journal)|Nature]]'' 171, 737&ndash;738, [[25 April]] [[1953]].
+
==External links==
* Watson, James D. ''DNA: The Secret of Life''  ISBN 0375415467.
+
All links retrieved January 12, 2024.
* Watson, James D. [[The Double Helix|The Double Helix: A Personal Account of the Discovery of the Structure of DNA (Norton Critical Editions)]].  ISBN 0393950751
 
* Chomet, S. (Ed.), DNA Genesis of a Discovery, ''Newman-Hemisphere Press, London, 1994.
 
* Delmonte, C.S. and Mann, L.R.B. [http://www.ias.ac.in/currsci/dec102003/1564.pdf Variety in DNA secondary structure]. Current Science, 85 (11), 1564&ndash;1570, 10 December 2003.
 
*Miller, Kenneth R., and Levin, Joseph. ''Biology''. Upper Saddle River, New Jersey: Prentice Hall, 2002.
 
  
<!-- Not sure if we need this long document as a reference ? If yes, please provide a reference; the PDF looks unpublished.
+
* [http://www.dnai.org/ DNA Interactive]
* Delmonte, C. S., http://www.notahelix.com/delmonte/new_struct_mol_biol.pdf
+
* [http://www.dnaftb.org/ DNA from the Beginning]
>
+
* [https://medlineplus.gov/genetics/understanding/basics/dna/ What is DNA?] ''Medline Plus''
 +
* [https://www.genome.gov/genetics-glossary/Deoxyribonucleic-Acid Deoxyribonucleic Acid (DNA)] ''National Human Research Genome Institute''
 +
* [https://www.nature.com/scitable/topicpage/introduction-what-is-dna-6579978/ Introduction: What Is DNA?] ''Scitable''
  
==External links==
+
{{Nucleic acids}}
*[http://www.dnahack.com/index.html DNA hack: The website for Amateur Genetic Engineering]
 
*[http://www.packer34.freeserve.co.uk/selectedTATAwebsites.htm First press stories on DNA]
 
*[http://en.wikipedia.org/wiki/Image:Rosalindfranklinsjokecard.jpg 'Death' of DNA Helix (Crystaline) joke funeral card].
 
*[http://www.nature.com/nature/dna50/archive.html Double helix: 50 years of DNA], [[Nature (journal)|Nature]].
 
*[http://www.genome.gov/10506367 U.S. National DNA Day] Watch videos and participate in real-time chat with top scientists
 
*[http://www.genome.gov/10506718 Genetic Education Modules for Teachers] ''DNA from the Beginning'' Study Guide
 
*[http://www.genome.gov/glossary.cfm Talking Glossary of Genetic Terms] In Spanish, too
 
*[http://osulibrary.oregonstate.edu/specialcollections/coll/pauling/dna/index.html Linus Pauling and the Race for DNA]
 
*Listen to Francis Crick and James Watson talking on the BBC in 1962, 1972, and 1974:http://www.bbc.co.uk/bbcfour/audiointerviews/profilepages/crickwatson1.shtml
 
*[http://news.bbc.co.uk/1/hi/sci/tech/2949629.stm 17 April, 2003, BBC News: Most ancient DNA ever?]
 
*[http://www.whatsnextnetwork.com/health/index.php?cat=61 Latest Advances In Gene Research]
 
*[http://www.dna-research.org DNA Research News]
 
*[http://www.dnai.org DNA Interactive] (requires [[Macromedia Flash]])
 
*[http://3dscience.com/3d_dna_models.asp Free 3d DNA model Images]
 
*[http://nist.rcsb.org/pdb/molecules/pdb23_1.html DNA: PDB molecule of the month]
 
*[http://www.fidelitysystems.com/Unlinked_DNA.html DNA under electron microscope]
 
*[http://www.ccrnp.ncifcrf.gov/~toms/LeftHanded.DNA.html Left-handed DNA Hall of Fame]
 
*[http://www.myfirstbookaboutdna.com My First Book About DNA] Designed for children to learn more about DNA.
 
*{{dmoz|Science/Biology/Biochemistry_and_Molecular_Biology/Biomolecules/Nucleic_Acids/|Nucleic Acids}}
 
*[http://www.zytologie-online.net/dna.php DNA Replication and Translation / Cell Biology]
 
  
{{credit|51121850}}
+
{{credit|DNA|179323032}}
 
[[Category:Life sciences]]
 
[[Category:Life sciences]]
 +
[[Category:Genetics]]
 +
[[Category:Molecular biology]]

Latest revision as of 07:36, 12 January 2024


The structure of part of a DNA double helix

Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms. The main role of DNA molecules is the long-term storage of information. DNA is often compared to a set of blueprints, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules. The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information.

Chemically, DNA is a long polymer of simple units called nucleotides, with a backbone made of sugars (deoxyribose) and phosphate groups joined by ester bonds. Attached to each sugar is one of four types of molecules called bases. It is the sequence of these four bases along the backbone that encodes information. This information is read using the genetic code, which specifies the sequence of the amino acids within proteins. The code is read by copying stretches of DNA into the related nucleic acid RNA, in a process called transcription. Most of these RNA molecules are used to synthesize proteins, but others are used directly in structures such as ribosomes and spliceosomes. RNA also serves as a a genetic blueprint for certain viruses.

Within cells, DNA is organized into structures called chromosomes. These chromosomes are duplicated before cells divide, in a process called DNA replication. Eukaryotic organisms such as animals, plants, and fungi store their DNA inside the cell nucleus, while in prokaryotes such as bacteria, which lack a cell nucleus, it is found in the cell's cytoplasm. Within the chromosomes, chromatin proteins such as histones compact and organize DNA, which helps control its interactions with other proteins and thereby control which genes are transcribed. Some eukaryotic cell organelles, mitochondria and chloroplasts, also contain DNA, giving rise to the endosymbionic theory that these organelles may have arisen from prokaryotes in a symbionic relationship.

The identification of DNA, combined with human creativity, has been of tremendous importance not only for understanding life but for practical applications in medicine, agriculture, and other areas. Technologies have been developed using recombinant DNA to mass produce medically important proteins, such as insulin, and have found application in agriculture to make plants with desirable qualities. Through understanding the alleles that one is carrying for particular genes, one can gain an understanding of the probability that one's offspring may inherent certain genetic disorders, or one's own predisposition for a particular disease. DNA technology is used in forensics, anthropology, and many other areas as well.

DNA and the biological processes centered on its activities (translation, transcription, replication, genetic recombination, and so forth) are amazing in their complexity and coordination. The presence of DNA also reflects on the unity of life, since organisms share nucleic acids as genetic blueprints and share a nearly universal genetic code. On the other hand, the discovery of DNA has at times led to an overemphasis on DNA to the point of believing that life can be totally explained by physico-chemical processes alone.

History

Francis Crick
James Watson in 2012

DNA was first isolated by the Swiss physician Friedrich Miescher who, in 1869, discovered a microscopic substance in the pus of discarded surgical bandages. As it resided in the nuclei of cells, he called it "nuclein."[1] In 1919, this discovery was followed by Phoebus Levene's identification of the base, sugar, and phosphate nucleotide unit.[2] Levene suggested that DNA consisted of a string of nucleotide units linked together through the phosphate groups. However, Levene thought the chain was short and the bases repeated in a fixed order. In 1937, William Astbury produced the first X-ray diffraction patterns that showed that DNA had a regular structure.[3]

In 1928, Frederick Griffith discovered that traits of the "smooth" form of the Pneumococcus bacteria could be transferred to the "rough" form of the same bacteria by mixing killed "smooth" bacteria with the live "rough" form.[4] This system provided the first clear suggestion that DNA carried genetic information, when Oswald Theodore Avery, along with coworkers Colin MacLeod and Maclyn McCarty, identified DNA as the transforming principle in 1943.[5] DNA's role in heredity was confirmed in 1953, when Alfred Hershey and Martha Chase, in the Hershey-Chase experiment, showed that DNA is the genetic material of the T2 phage.[6]

In 1953, based on X-ray diffraction images taken by Rosalind Franklin and the information that the bases were paired, James D. Watson and Francis Crick suggested what is now accepted as the first accurate model of DNA structure in the journal Nature.[7] Experimental evidence for Watson and Crick's model were published in a series of five articles in the same issue of Nature.[8] Of these, Franklin and Raymond Gosling's paper was the first publication of X-ray diffraction data that supported the Watson and Crick model,[9] This issue also contained an article on DNA structure by Maurice Wilkins and his colleagues.[10] In 1962, after Franklin's death, Watson, Crick, and Wilkins jointly received the Nobel Prize in Physiology or Medicine. However, speculation continues on who should have received credit for the discovery, as it was based on Franklin's data.

In an influential presentation in 1957, Crick laid out the "Central Dogma" of molecular biology, which foretold the relationship between DNA, RNA, and proteins, and articulated the "adaptor hypothesis". Final confirmation of the replication mechanism that was implied by the double-helical structure followed in 1958 through the Meselson-Stahl experiment.[11] Further work by Crick and coworkers showed that the genetic code was based on non-overlapping triplets of bases, called codons, allowing Har Gobind Khorana, Robert W. Holley, and Marshall Warren Nirenberg to decipher the genetic code.[12] These findings represent the birth of molecular biology.

Physical and chemical properties

The chemical structure of DNA.

DNA is a long polymer made from repeating units called nucleotides.[13][14] The DNA chain is 22 to 26 Ångströms wide (2.2 to 2.6 nanometres), and one nucleotide unit is 3.3 Ångstroms (0.33 nanometres) long.[15] Although each individual repeating unit is very small, DNA polymers can be enormous molecules containing millions of nucleotides. For instance, the largest human chromosome, chromosome number 1, is 220 million base pairs long.[16]

In living organisms, DNA does not usually exist as a single molecule, but instead as a tightly-associated pair of molecules.[7][17] These two long strands entwine like vines, in the shape of a double helix. The nucleotide repeats contain both the segment of the backbone of the molecule, which holds the chain together, and a base, which interacts with the other DNA strand in the helix. In general, a base linked to a sugar is called a nucleoside and a base linked to a sugar and one or more phosphate groups is called a nucleotide. If multiple nucleotides are linked together, as in DNA, this polymer is referred to as a polynucleotide.

The backbone of the DNA strand is made from alternating phosphate and sugar residues.[18] The sugar in DNA is 2-deoxyribose, which is a pentose (five-carbon) sugar. The sugars are joined together by phosphate groups that form phosphodiester bonds between the third and fifth carbon atoms of adjacent sugar rings. These asymmetric bonds mean a strand of DNA has a direction. In a double helix, the direction of the nucleotides in one strand is opposite to their direction in the other strand. This arrangement of DNA strands is called antiparallel. The asymmetric ends of DNA strands are referred to as the 5′ (five prime) and 3′ (three prime) ends. One of the major differences between DNA and RNA is the sugar, with 2-deoxyribose being replaced by the alternative pentose sugar ribose in RNA.[17]

The DNA double helix is stabilized by hydrogen bonds between the bases attached to the two strands. The four bases found in DNA are adenine (abbreviated A), cytosine (C), guanine (G), and thymine (T). These four bases are shown below and are attached to the sugar/phosphate to form the complete nucleotide, as shown for adenosine monophosphate.

These bases are classified into two types; adenine and guanine are fused five- and six-membered heterocyclic compounds called purines, while cytosine and thymine are six-membered rings called pyrimidines.[17] A fifth pyrimidine base, called uracil (U), usually takes the place of thymine in RNA and differs from thymine by lacking a methyl group on its ring. Uracil is not usually found in DNA, occurring only as a breakdown product of cytosine, but a very rare exception to this rule is a bacterial virus called PBS1 that contains uracil in its DNA.[19] In contrast, following synthesis of certain RNA molecules, a significant number of the uracils are converted to thymines by the enzymatic addition of the missing methyl group. This occurs mostly on structural and enzymatic RNAs like transfer RNAs and ribosomal RNA.[20]

Major and minor grooves

Animation of the structure of a section of DNA. The bases lie horizontally between the two spiraling strands. Created from PDB 1D65.

The double helix is a right-handed spiral. As the DNA strands wind around each other, they leave gaps between each set of phosphate backbones, revealing the sides of the bases inside (see animation). There are two of these grooves twisting around the surface of the double helix: one groove, the major groove, is 22 Å wide and the other, the minor groove, is 12 Å wide.[21] The narrowness of the minor groove means that the edges of the bases are more accessible in the major groove. As a result, proteins like transcription factors that can bind to specific sequences in double-stranded DNA usually make contacts to the sides of the bases exposed in the major groove.[22]

Base pairing

GC DNA base pair.svg
AT DNA base pair.svg
At top, a GC base pair with three hydrogen bonds. At the bottom, AT base pair with two hydrogen bonds. Hydrogen bonds are shown as dashed lines.

Each type of base on one strand forms a bond with just one type of base on the other strand. This is called complementary base pairing. Here, purines form hydrogen bonds to pyrimidines, with A bonding only to T, and C bonding only to G. This arrangement of two nucleotides binding together across the double helix is called a base pair. In a double helix, the two strands are also held together via forces generated by the hydrophobic effect and pi stacking, which are not influenced by the sequence of the DNA.[23] As hydrogen bonds are not covalent, they can be broken and rejoined relatively easily. The two strands of DNA in a double helix can therefore be pulled apart like a zipper, either by a mechanical force or high temperature.[24] As a result of this complementarity, all the information in the double-stranded sequence of a DNA helix is duplicated on each strand, which is vital in DNA replication. Indeed, this reversible and specific interaction between complementary base pairs is critical for all the functions of DNA in living organisms.[13]

The two types of base pairs form different numbers of hydrogen bonds, AT forming two hydrogen bonds, and GC forming three hydrogen bonds (see figures, left). The GC base pair is therefore stronger than the AT base pair. As a result, it is both the percentage of GC base pairs and the overall length of a DNA double helix that determine the strength of the association between the two strands of DNA. Long DNA helices with a high GC content have stronger-interacting strands, while short helices with high AT content have weaker-interacting strands.[25] Parts of the DNA double helix that need to separate easily, such as the TATAAT Pribnow box in bacterial promoters, tend to have sequences with a high AT content, making the strands easier to pull apart.[26] In the laboratory, the strength of this interaction can be measured by finding the temperature required to break the hydrogen bonds, their melting temperature (also called Tm value). When all the base pairs in a DNA double helix melt, the strands separate and exist in solution as two entirely independent molecules. These single-stranded DNA molecules have no single common shape, but some conformations are more stable than others.[27]

Sense and antisense

A DNA sequence is called "sense" if its sequence is the same as that of a messenger RNA copy that is translated into protein. The sequence on the opposite strand is complementary to the sense sequence and is therefore called the "antisense" sequence. Since RNA polymerases work by making a complementary copy of their templates, it is this antisense strand that is the template for producing the sense messenger RNA. Both sense and antisense sequences can exist on different parts of the same strand of DNA (that is, both strands contain both sense and antisense sequences).

In both prokaryotes and eukaryotes, antisense RNA sequences are produced, but the functions of these RNAs are not entirely clear.[28] One proposal is that antisense RNAs are involved in regulating gene expression through RNA-RNA base pairing.[29]

A few DNA sequences in prokaryotes and eukaryotes, and more in plasmids and viruses, blur the distinction made above between sense and antisense strands by having overlapping genes.[30] In these cases, some DNA sequences do double duty, encoding one protein when read 5′ to 3′ along one strand, and a second protein when read in the opposite direction (still 5′ to 3′) along the other strand. In bacteria, this overlap may be involved in the regulation of gene transcription,[31] while in viruses, overlapping genes increase the amount of information that can be encoded within the small viral genome.[32] Another way of reducing genome size is seen in some viruses that contain linear or circular single-stranded DNA as their genetic material.[33][34]

Supercoiling

DNA can be twisted like a rope in a process called DNA supercoiling. With DNA in its "relaxed" state, a strand usually circles the axis of the double helix once every 10.4 base pairs, but if the DNA is twisted the strands become more tightly or more loosely wound.[35] If the DNA is twisted in the direction of the helix, this is positive supercoiling, and the bases are held more tightly together. If they are twisted in the opposite direction, this is negative supercoiling, and the bases come apart more easily.

In nature, most DNA has slight negative supercoiling that is introduced by enzymes called topoisomerases.[36] These enzymes are also needed to relieve the twisting stresses introduced into DNA strands during processes such as transcription and DNA replication.[37]

From left to right, the structures of A, B and Z DNA

Alternative double-helical structures

DNA exists in several possible conformations. The conformations so far identified are: A-DNA, B-DNA, C-DNA, D-DNA,[38] E-DNA,[39] H-DNA,[40] L-DNA,[38] P-DNA,[41] and Z-DNA.[18][42] However, only A-DNA, B-DNA, and Z-DNA have been observed in naturally occurring biological systems.

Which conformation DNA adopts depends on the sequence of the DNA, the amount and direction of supercoiling, chemical modifications of the bases, and also solution conditions, such as the concentration of metal ions and polyamines.[43] Of these three conformations, the "B" form described above is most common under the conditions found in cells.[44] The two alternative double-helical forms of DNA differ in their geometry and dimensions.

The A form is a wider right-handed spiral, with a shallow, wide minor groove and a narrower, deeper major groove. The A form occurs under non-physiological conditions in dehydrated samples of DNA, while in the cell it may be produced in hybrid pairings of DNA and RNA strands, as well as in enzyme-DNA complexes.[45][46] Segments of DNA where the bases have been chemically-modified by methylation may undergo a larger change in conformation and adopt the Z form. Here, the strands turn about the helical axis in a left-handed spiral, the opposite of the more common B form.[47] These unusual structures can be recognized by specific Z-DNA binding proteins and may be involved in the regulation of transcription.[48]

Structure of a DNA quadruplex formed by telomere repeats. The conformation of the DNA backbone diverges significantly from the typical helical structure.

Quadruplex structures

At the ends of the linear chromosomes are specialized regions of DNA called telomeres. The main function of these regions is to allow the cell to replicate chromosome ends using the enzyme telomerase, as the enzymes that normally replicate DNA cannot copy the extreme 3′ ends of chromosomes.[49] As a result, if a chromosome lacked telomeres it would become shorter each time it was replicated. These specialized chromosome caps also help protect the DNA ends from exonucleases and stop the DNA repair systems in the cell from treating them as damage to be corrected.[50] In human cells, telomeres are usually lengths of single-stranded DNA containing several thousand repeats of a simple TTAGGG sequence.[51]

These guanine-rich sequences may stabilize chromosome ends by forming very unusual structures of stacked sets of four-base units, rather than the usual base pairs found in other DNA molecules. Here, four guanine bases form a flat plate and these flat four-base units then stack on top of each other, to form a stable G-quadruplex structure.[52] These structures are stabilized by hydrogen bonding between the edges of the bases and chelation of a metal ion in the centre of each four-base unit. The structure shown to the left is a top view of the quadruplex formed by a DNA sequence found in human telomere repeats. The single DNA strand forms a loop, with the sets of four bases stacking in a central quadruplex three plates deep. In the space at the center of the stacked bases are three chelated potassium ions.[53] Other structures can also be formed, with the central set of four bases coming from either a single strand folded around the bases, or several different parallel strands, each contributing one base to the central structure.

In addition to these stacked structures, telomeres also form large loop structures called telomere loops, or T-loops. Here, the single-stranded DNA curls around in a long circle stabilized by telomere-binding proteins.[54] At the very end of the T-loop, the single-stranded telomere DNA is held onto a region of double-stranded DNA by the telomere strand disrupting the double-helical DNA and base pairing to one of the two strands. This triple-stranded structure is called a displacement loop or D-loop.[52]

Chemical modifications

Cytosine chemical structure.png 5-Methylcytosine.png Thymine chemical structure.png
cytosine 5-methylcytosine thymine
Structure of cytosine with and without the 5-methyl group. After deamination the 5-methylcytosine has the same structure as thymine

Base modifications

The expression of genes is influenced by the chromatin structure of a chromosome and regions of heterochromatin (low or no gene expression) correlate with the methylation of cytosine. For example, cytosine methylation, to produce 5-methylcytosine, is important for X-chromosome inactivation.[55] The average level of methylation varies between organisms, with Caenorhabditis elegans lacking cytosine methylation, while vertebrates show higher levels, with up to 1% of their DNA containing 5-methylcytosine.[56] Despite the biological role of 5-methylcytosine it is susceptible to spontaneous deamination to leave the thymine base, and methylated cytosines are therefore mutation hotspots.[57] Other base modifications include adenine methylation in bacteria and the glycosylation of uracil to produce the "J-base" in kinetoplastids.[58][59]

DNA damage

Further information: Mutation
Benzopyrene, the major mutagen in tobacco smoke, in an adduct to DNA.

DNA can be damaged by many different sorts of mutagens. These include oxidizing agents, alkylating agents, and also high-energy electromagnetic radiation such as ultraviolet light and x-rays. The type of DNA damage produced depends on the type of mutagen. For example, UV light mostly damages DNA by producing thymine dimers, which are cross-links between adjacent pyrimidine bases in a DNA strand.[60] On the other hand, oxidants such as free radicals or hydrogen peroxide produce multiple forms of damage, including base modifications, particularly of guanosine, as well as double-strand breaks.[61] It has been estimated that in each human cell, about 500 bases suffer oxidative damage per day.[62][63] Of these oxidative lesions, the most dangerous are double-strand breaks, as these lesions are difficult to repair and can produce point mutations, insertions and deletions from the DNA sequence, as well as chromosomal translocations.[64]

Many mutagens intercalate into the space between two adjacent base pairs. Intercalators are mostly aromatic and planar molecules, and include ethidium, daunomycin, doxorubicin, and thalidomide. In order for an intercalator to fit between base pairs, the bases must separate, distorting the DNA strands by unwinding of the double helix. These structural changes inhibit both transcription and DNA replication, causing toxicity and mutations. As a result, DNA intercalators are often carcinogens, with benzopyrene diol epoxide, acridines, aflatoxin, and ethidium bromide being well-known examples.[65][66][67] Nevertheless, due to their properties of inhibiting DNA transcription and replication, they are also used in chemotherapy to inhibit rapidly-growing cancer cells.[68]

Overview of biological functions

DNA usually occurs as linear chromosomes in eukaryotes, and circular chromosomes in prokaryotes. The set of chromosomes in a cell makes up its genome. The human genome has approximately 3 billion base pairs of DNA arranged into 46 chromosomes.[69]

The information carried by DNA is held in the sequence of pieces of DNA called genes. Transmission of genetic information in genes is achieved via complementary base pairing. For example, in transcription, when a cell uses the information in a gene, the DNA sequence is copied into a complementary RNA sequence through the attraction between the DNA and the correct RNA nucleotides. Usually, this RNA copy is then used to make a matching protein sequence in a process called translation, which depends on the same interaction between RNA nucleotides. Alternatively, a cell may simply copy its genetic information in a process called DNA replication. The details of these functions are covered in other articles; here we focus on the interactions between DNA and other molecules that mediate the function of the genome.

Genome structure

Further information: Chromosome, Gene

Genomic DNA is located in the cell nucleus of eukaryotes, as well as small amounts in mitochondria and chloroplasts. In prokaryotes, the DNA is held within an irregularly shaped body in the cytoplasm called the nucleoid.[70]

The genetic information in a genome is held within genes. A gene is a unit of heredity and is a region of DNA that influences a particular characteristic in an organism. Genes contain an open reading frame that can be transcribed, as well as regulatory sequences such as promoters and enhancers, which control the expression of the open reading frame.

In many species, only a small fraction of the total sequence of the genome encodes protein. For example, only about 1.5% of the human genome consists of protein-coding exons, with over 50% of human DNA consisting of non-coding repetitive sequences.[71] The reasons for the presence of so much non-coding DNA in eukaryotic genomes and the extraordinary differences in genome size, or C-value, among species represent a long-standing puzzle known as the "C-value enigma."[72]

However, DNA sequences that do not code protein may still encode functional non-coding RNA molecules, which are involved in the regulation of gene expression.[73]

T7 RNA polymerase (blue) producing a mRNA (green) from a DNA template (orange). Created from PDB 1MSW.

Some non-coding DNA sequences play structural roles in chromosomes. Telomeres and centromeres typically contain few genes, but are important for the function and stability of chromosomes.[74] An abundant form of non-coding DNA in humans are pseudogenes, which are copies of genes that have been disabled by mutation.[75] These sequences are usually just molecular fossils, although they can occasionally serve as raw genetic material for the creation of new genes through the process of gene duplication and divergence.[76]

Transcription and translation

A gene is a sequence of DNA that contains genetic information and can influence the phenotype of an organism. Within a gene, the sequence of bases along a DNA strand defines a messenger RNA sequence, which then defines one or more protein sequences. The relationship between the nucleotide sequences of genes and the amino-acid sequences of proteins is determined by the rules of translation, known collectively as the genetic code. The genetic code consists of three-letter "words" called codons formed from a sequence of three nucleotides (e.g. ACT, CAG, TTT).

In transcription, the codons of a gene are copied into messenger RNA by RNA polymerase. This RNA copy is then decoded by a ribosome that reads the RNA sequence by base-pairing the messenger RNA to transfer RNA, which carries amino acids. Since there are 4 bases in 3-letter combinations, there are 64 possible codons ( combinations). These encode the twenty standard amino acids, giving most amino acids more than one possible codon. There are also three "stop" or "nonsense" codons signifying the end of the coding region; these are the TAA, TGA and TAG codons.

DNA replication. The double helix is unwound by a helicase and topoisomerase. Next, one DNA polymerase produces the leading strand copy. Another DNA polymerase binds to the lagging strand. This enzyme makes discontinuous segments (called Okazaki fragments) before DNA ligase joins them together.

Replication

Cell division is essential for an organism to grow, but when a cell divides it must replicate the DNA in its genome so that the two daughter cells have the same genetic information as their parent.

The double-stranded structure of DNA provides a simple mechanism for DNA replication. Here, the two strands are separated and then each strand's complementary DNA sequence is recreated by an enzyme called DNA polymerase. This enzyme makes the complementary strand by finding the correct base through complementary base pairing, and bonding it onto the original strand. As DNA polymerases can only extend a DNA strand in a 5′ to 3′ direction, different mechanisms are used to copy the antiparallel strands of the double helix.[77] In this way, the base on the old strand dictates which base appears on the new strand, and the cell ends up with a perfect copy of its DNA.

Interactions with proteins

All the functions of DNA depend on interactions with proteins. These protein interactions can be non-specific, or the protein can bind specifically to a single DNA sequence. Enzymes can also bind to DNA and of these, the polymerases that copy the DNA base sequence in transcription and DNA replication are particularly important.

DNA-binding proteins

Nucleosome 2.jpg
Interaction of DNA with histones (shown in white, top). These proteins' basic amino acids (below left, blue) bind to the acidic phosphate groups on DNA (below right, red).

Structural proteins that bind DNA are well-understood examples of non-specific DNA-protein interactions. Within chromosomes, DNA is held in complexes with structural proteins. These proteins organize the DNA into a compact structure called chromatin. In eukaryotes, this structure involves DNA binding to a complex of small basic proteins called histones, while in prokaryotes multiple types of proteins are involved.[78][79] The histones form a disk-shaped complex called a nucleosome, which contains two complete turns of double-stranded DNA wrapped around its surface. These non-specific interactions are formed through basic residues in the histones making ionic bonds to the acidic sugar-phosphate backbone of the DNA, and are therefore largely independent of the base sequence.[80] Chemical modifications of these basic amino acid residues include methylation, phosphorylation, and acetylation.[81] These chemical changes alter the strength of the interaction between the DNA and the histones, making the DNA more or less accessible to transcription factors and changing the rate of transcription.[82] Other non-specific DNA-binding proteins found in chromatin include the high-mobility group proteins, which bind preferentially to bent or distorted DNA.[83] These proteins are important in bending arrays of nucleosomes and arranging them into more complex chromatin structures.[84]

A distinct group of DNA-binding proteins are the single-stranded-DNA-binding proteins that specifically bind single-stranded DNA. In humans, replication protein A is the best-characterized member of this family and is essential for most processes where the double helix is separated, including DNA replication, recombination, and DNA repair.[85] These binding proteins seem to stabilize single-stranded DNA and protect it from forming stem loops or being degraded by nucleases.

The lambda repressor helix-turn-helix transcription factor bound to its DNA target[86]

In contrast, other proteins have evolved to specifically bind particular DNA sequences. The most intensively studied of these are the various classes of transcription factors, which are proteins that regulate transcription. Each one of these proteins bind to one particular set of DNA sequences and thereby activates or inhibits the transcription of genes with these sequences close to their promoters. The transcription factors do this in two ways. Firstly, they can bind the RNA polymerase responsible for transcription, either directly or through other mediator proteins; this locates the polymerase at the promoter and allows it to begin transcription.[87] Alternatively, transcription factors can bind enzymes that modify the histones at the promoter; this will change the accessibility of the DNA template to the polymerase.[88]

As these DNA targets can occur throughout an organism's genome, changes in the activity of one type of transcription factor can affect thousands of genes.[89] Consequently, these proteins are often the targets of the signal transduction processes that mediate responses to environmental changes or cellular differentiation and development. The specificity of these transcription factors' interactions with DNA come from the proteins making multiple contacts to the edges of the DNA bases, allowing them to "read" the DNA sequence. Most of these base-interactions are made in the major groove, where the bases are most accessible.[90]

The restriction enzyme EcoRV (green) in a complex with its substrate DNA[91]

DNA-modifying enzymes

Nucleases and ligases

Nucleases are enzymes that cut DNA strands by catalyzing the hydrolysis of the phosphodiester bonds. Nucleases that hydrolyse nucleotides from the ends of DNA strands are called exonucleases, while endonucleases cut within strands. The most frequently-used nucleases in molecular biology are the restriction endonucleases, which cut DNA at specific sequences. For instance, the EcoRV enzyme shown to the left recognizes the 6-base sequence 5′-GAT|ATC-3′ and makes a cut at the vertical line.

In nature, these enzymes protect bacteria against phage infection by digesting the phage DNA when it enters the bacterial cell, acting as part of the restriction modification system.[92] In technology, these sequence-specific nucleases are used in molecular cloning and DNA fingerprinting.

Enzymes called DNA ligases can rejoin cut or broken DNA strands, using the energy from either adenosine triphosphate or nicotinamide adenine dinucleotide.[93] Ligases are particularly important in lagging strand DNA replication, as they join together the short segments of DNA produced at the replication fork into a complete copy of the DNA template. They are also used in DNA repair and genetic recombination.[93]

Topoisomerases and helicases

Topoisomerases are enzymes with both nuclease and ligase activity. These proteins change the amount of supercoiling in DNA. Some of these enzyme work by cutting the DNA helix and allowing one section to rotate, thereby reducing its level of supercoiling; the enzyme then seals the DNA break.[36] Other types of these enzymes are capable of cutting one DNA helix and then passing a second strand of DNA through this break, before rejoining the helix.[94] Topoisomerases are required for many processes involving DNA, such as DNA replication and transcription.[37]

Helicases are proteins that are a type of molecular motor. They use the chemical energy in nucleoside triphosphates, predominantly ATP, to break hydrogen bonds between bases and unwind the DNA double helix into single strands.[95] These enzymes are essential for most processes where enzymes need to access the DNA bases.

Polymerases

Polymerases are enzymes that synthesise polynucleotide chains from nucleoside triphosphates. They function by adding nucleotides onto the 3′ hydroxyl group of the previous nucleotide in the DNA strand. As a consequence, all polymerases work in a 5′ to 3′ direction.[96] In the active site of these enzymes, the nucleoside triphosphate substrate base-pairs to a single-stranded polynucleotide template: this allows polymerases to accurately synthesise the complementary strand of this template. Polymerases are classified according to the type of template that they use.

In DNA replication, a DNA-dependent DNA polymerase makes a DNA copy of a DNA sequence. Accuracy is vital in this process, so many of these polymerases have a proofreading activity. Here, the polymerase recognizes the occasional mistakes in the synthesis reaction by the lack of base pairing between the mismatched nucleotides. If a mismatch is detected, a 3′ to 5′ exonuclease activity is activated and the incorrect base removed.[97] In most organisms, DNA polymerases function in a large complex called the replisome that contains multiple accessory subunits, such as the DNA clamp or helicases.[98]

RNA-dependent DNA polymerases are a specialized class of polymerases that copy the sequence of an RNA strand into DNA. They include reverse transcriptase, which is a viral enzyme involved in the infection of cells by retroviruses, and telomerase, which is required for the replication of telomeres.[99][49] Telomerase is an unusual polymerase because it contains its own RNA template as part of its structure.[50]

Transcription is carried out by a DNA-dependent RNA polymerase that copies the sequence of a DNA strand into RNA. To begin transcribing a gene, the RNA polymerase binds to a sequence of DNA called a promoter and separates the DNA strands. It then copies the gene sequence into a messenger RNA transcript until it reaches a region of DNA called the terminator, where it halts and detaches from the DNA. As with human DNA-dependent DNA polymerases, RNA polymerase II, the enzyme that transcribes most of the genes in the human genome, operates as part of a large protein complex with multiple regulatory and accessory subunits.[100]

Genetic recombination

Recombination involves the breakage and rejoining of two chromosomes (M and F) to produce two re-arranged chromosomes (C1 and C2).
Holliday Junction cropped.png
Holliday junction coloured.png
Structure of the Holliday junction intermediate in genetic recombination. The four separate DNA strands are coloured red, blue, green and yellow.[101]
Further information: Genetic recombination

A DNA helix usually does not interact with other segments of DNA, and in human cells the different chromosomes even occupy separate areas in the nucleus called "chromosome territories."[102] This physical separation of different chromosomes is important for the ability of DNA to function as a stable repository for information, as one of the few times chromosomes interact is during chromosomal crossover when they recombine. Chromosomal crossover is when two DNA helices break, swap a section and then rejoin.

Recombination allows chromosomes to exchange genetic information and produces new combinations of genes, which can be important for variability added into a population, and thus evolution, and can be important in the rapid evolution of new proteins.[103] Genetic recombination can also be involved in DNA repair, particularly in the cell's response to double-strand breaks.[104]

The most common form of chromosomal crossover is homologous recombination, where the two chromosomes involved share very similar sequences. Non-homologous recombination can be damaging to cells, as it can produce chromosomal translocations and genetic abnormalities. The recombination reaction is catalyzed by enzymes known as recombinases, such as RAD51.[105] The first step in recombination is a double-stranded break either caused by an endonuclease or damage to the DNA.[106] A series of steps catalyzed in part by the recombinase then leads to joining of the two helices by at least one Holliday junction, in which a segment of a single strand in each helix is annealed to the complementary strand in the other helix. The Holliday junction is a tetrahedral junction structure that can be moved along the pair of chromosomes, swapping one strand for another. The recombination reaction is then halted by cleavage of the junction and re-ligation of the released DNA.[107]

Evolution of DNA metabolism

DNA contains the genetic information that allows all modern living things to function, grow, and reproduce. However, it is unclear how long in the 4-billion-year history of life DNA has performed this function, as it has been proposed that the earliest forms of life may have used RNA as their genetic material.[108] RNA may have acted as the central part of early cell metabolism as it can both transmit genetic information and carry out catalysis as part of ribozymes.[109] This ancient RNA world, where nucleic acid would have been used for both catalysis and genetics, may have influenced the development of the current genetic code based on four nucleotide bases. This would occur since the number of unique bases in such an organism is a trade-off between a small number of bases increasing replication accuracy and a large number of bases increasing the catalytic efficiency of ribozymes.[110]

Unfortunately, there is no direct evidence of ancient genetic systems, as recovery of DNA from most fossils is impossible. This is because DNA will survive in the environment for less than one million years and slowly degrades into short fragments in solution.[111] Although claims for older DNA have been made, most notably a report of the isolation of a viable bacterium from a salt crystal 250-million years old,[112] these claims are controversial and have been disputed.[113][114]

Uses in technology

Genetic engineering

Modern biology and biochemistry make intensive use of recombinant DNA technology. Recombinant DNA is a man-made DNA sequence that has been assembled from other DNA sequences. They can be transformed into organisms in the form of plasmids or in the appropriate format, by using a viral vector.[115] The genetically modified organisms produced can be used to produce products such as recombinant proteins, used in medical research,[116] or be grown in agriculture.[117][118]Recombinant DNA technology allows scientists to transplant a gene for a particular protein into a rapidly reproducing bacteria to mass produce the protein. As a result of this technology, bacteria have been used to produce human insulin beginning in 1978.

Forensics

Forensic scientists can use DNA in blood, semen, skin, saliva, or hair at a crime scene to identify a perpetrator. This process is called genetic fingerprinting, or more accurately, DNA profiling. In DNA profiling, the lengths of variable sections of repetitive DNA, such as short tandem repeats and minisatellites, are compared between people. This method is usually an extremely reliable technique for identifying a criminal.[119] However, identification can be complicated if the scene is contaminated with DNA from several people.[120] DNA profiling was developed in 1984 by British geneticist Sir Alec Jeffreys,[121] and first used in forensic science to convict Colin Pitchfork in the 1988 Enderby murders case. Some criminal investigations have been solved when DNA from crime scenes has matched relatives of the guilty individual, rather than the individual himself or herself.[122]

People convicted of certain types of crimes may be required to provide a sample of DNA for a database. This has helped investigators solve old cases where only a DNA sample was obtained from the scene. DNA profiling can also be used to identify victims of mass casualty incidents.

Bioinformatics

Bioinformatics involves the manipulation, searching, and data mining of DNA sequence data. The development of techniques to store and search DNA sequences have led to widely-applied advances in computer science, especially string searching algorithms, machine learning, and database theory.[123] String searching or matching algorithms, which find an occurrence of a sequence of letters inside a larger sequence of letters, were developed to search for specific sequences of nucleotides.[124] In other applications such as text editors, even simple algorithms for this problem usually suffice, but DNA sequences cause these algorithms to exhibit near-worst-case behaviour due to their small number of distinct characters. The related problem of sequence alignment aims to identify homologous sequences and locate the specific mutations that make them distinct.

These techniques, especially multiple sequence alignment, are used in studying phylogenetic relationships and protein function.[125] Data sets representing entire genomes' worth of DNA sequences, such as those produced by the Human Genome Project, are difficult to use without annotations, which label the locations of genes and regulatory elements on each chromosome. Regions of DNA sequence that have the characteristic patterns associated with protein- or RNA-coding genes can be identified by gene finding algorithms, which allow researchers to predict the presence of particular gene products in an organism even before they have been isolated experimentally.[126]

DNA nanotechnology

The DNA structure at left (schematic shown) will self-assemble into the structure visualized by atomic force microscopy at right. DNA nanotechnology is the field which seeks to design nanoscale structures using the molecular recognition properties of DNA molecules. Image from Strong, 2004. Template:Doi-inline

DNA nanotechnology uses the unique molecular recognition properties of DNA and other nucleic acids to create self-assembing branched DNA complexes with useful properties. DNA is thus used as a structural material rather than as a carrier of biological information. This has lead to the creation of two-dimensional periodic lattices (both tile-based as well as using the "DNA origami" method) as well as three-dimensional structures in the shapes of polyhedra. Nanomechanical devices and algorithmic self-assembly have also been demonstrated, and these DNA structures have been used to template the arrangement of other molecules such as gold nanoparticles and streptavidin proteins.

DNA and computation

DNA was first used in computing to solve a small version of the directed Hamiltonian path problem, an NP-complete problem.[127] DNA computing is advantageous over electronic computers in power use, space use, and efficiency, due to its ability to compute in a highly parallel fashion. A number of other problems, including simulation of various abstract machines, the boolean satisfiability problem, and the bounded version of the traveling salesman problem, have since been analysed using DNA computing.[128] Due to its compactness, DNA also has a theoretical role in cryptography.

History and anthropology

Because DNA collects mutations over time, which are then inherited, it contains historical information and by comparing DNA sequences, geneticists can infer the evolutionary history of organisms, their phylogeny.[129] This field of phylogenetics is a powerful tool in evolutionary biology. If DNA sequences within a species are compared, population geneticists can learn the history of particular populations. This can be used in studies ranging from ecological genetics to anthropology; for example, DNA evidence is being used to try to identify the Ten Lost Tribes of Israel.[130]

DNA has also been used to look at modern family relationships, such as establishing family relationships between the descendants of Sally Hemings and Thomas Jefferson. This usage is closely related to the use of DNA in criminal investigations detailed above.

Notes

  1. R. Dahm, "Friedrich Miescher and the discovery of DNA," Dev Biol 278 (2008): 274-88. PMID 15680349.
  2. P. Levene, "The structure of yeast nucleic acid," J Biol Chem 40(2) (1919):415-424.
  3. W. Astbury, "Nucleic acid," Symp. SOC. Exp. Bbl 1 (1947).
  4. M. G. Lorenze and W. Wackernagel, "Bacterial gene transfer by natural genetic transformation in the environment," Microbiol. Rev. 58(1994): 563–602. PMID 7968924.
  5. O. Avery, C. MacLeod, and M. McCarty, "Studies on the chemical nature of the substance inducing transformation of pneumococcal types. Inductions of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III," J Exp Med 79 (1944): 137-158.
  6. A. Hershey and M. Chase, "Independent functions of viral protein and nucleic acid in growth of bacteriophage," J Gen Physiol 36(1952): 39-56. PMID 12981234.
  7. 7.0 7.1 J. D. Watson and F. H. C. Crick, "Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid," Nature 171 (1953): 737-738.
  8. Nature Archives, "Double helix of DNA: 50 Years," Nature (2003).
  9. R. Franklin, and R. G. Gosling, "Molecular configuration in sodium thymonucleate," Nature 171 (1953):740-741.
  10. M. H. F. Wilkins, A. R. Stokes, and H. R. Wilson, "Molecular structure of deoxypentose nucleic acids," Nature 171 (1953): 738-740.
  11. M. Meselson, and F. Stahl, "The replication of DNA in Escherichia coli, Proc Natl Acad Sci U S A 44 (1958): 671-682. PMID 16590258.
  12. Nobel Foundation, "The Nobel Prize in Physiology or Medicine 1968," Nobelprize.org. Retrieved January 23, 2023.
  13. 13.0 13.1 B. Alberts, A. Johnson, J. Lewis, M. Raff, K. Roberts, and P. Walters, Molecular Biology of the Cell, 4th edition. (New York: Garland Science, 2002, ISBN 0815332181).
  14. J. Butler, Forensic DNA Typing (San Diego: Academic Press, 2001, ISBN 9780121479510).
  15. M. Mandelkern, J. Elias, D. Eden, and D. Crothers, "The dimensions of DNA in solution," J Mol Biol 152 (1981): 153–161.
  16. S. Gergory, et al., "The DNA sequence and biological annotation of human chromosome," Nature 441(7091) (2006):315-321. PMID 16710414.
  17. 17.0 17.1 17.2 J. Berg, J. Tymoczko, and L. Stryer, Biochemistry (W. H. Freeman and Company, 2002, ISBN 0716749556).
  18. 18.0 18.1 A. Ghosh, and M. Basal, "A glossary of DNA structures from A to Z," Acta Crystallogr D Biol Crystallogr 59 (2003): 620–626. PMID 12657780.
  19. I. Takahashi, and J. Marmur, "Replacement of thymidylic acid by deoxyuridylic acid in the deoxyribonucleic acid of a transducing phage for Bacillus subtilis," Nature 197 (1963): 794–795. PMID 13980287.
  20. P. Agris, "Decoding the genome: a modified view," Nucleic Acids Res 32 (2004): 223–238. PMID 14715921.
  21. R. Wing, H. Drew, T. Takano, C. Broka, S. Tanaka, K. Itakura, and R. Dickerson, "Crystal structure analysis of a complete turn of B-DNA," Nature 287 (1980): 755–758. PMID 7432492.
  22. C. Pabo, and R. Sauer, "Protein-DNA recognition," Annu Rev Biochem 53 (1984): 293–321. PMID 6236744.
  23. P. Ponnuswamy, and M. Gromiha, "On the conformational stability of oligonucleotide duplexes and tRNA molecules," J Theor Biol 169(4): 419–432. PMID 7526075.
  24. H. Clausen-Schaumann, M. Rief, C. Tolksdorf, and H. Gaub, "Mechanical stability of single DNA molecules," Biophys J 78 (2000, issue 4): 1997–2007. PMID 10733978.
  25. T. Chalikian, J. Völker, G. Plum, and K. Breslauer, "A more unified picture for the thermodynamics of nucleic acid duplex melting: a characterization by calorimetric and volumetric techniques," Proc Natl Acad Sci U S A 96(14) (1999): 7853–7858. PMID 10393911.
  26. P. deHaseth and J. Helmann, "Open complex formation by Escherichia coli RNA polymerase: the mechanism of polymerase-induced strand separation of double helical DNA," Mol Microbiol 16(5) (1995): 817–824. PMID 7476180.
  27. J. Isaksson, S. Acharya, J. Barman, P. Cheruku, and J. Chattopadhyaya, "Single-stranded adenine-rich DNA and RNA retain structural characteristics of their respective double-stranded conformations and show directional differences in stacking pattern," Biochemistry 43(51) (2004): 15996–16010. PMID 15609994.
  28. A. Hüttenhofer, P. Schattner, and N. Polacek, "Non-coding RNAs: hope or hype?" Trends Genet 21(5) (2005): 289–297. PMID 15851066.
  29. S. Munroe, "Diversity of antisense regulation in eukaryotes: multiple mechanisms, emerging patterns," J Cell Biochem 93(4) (2004): 664–671. PMID 15389973.
  30. I. Makalowska, C. Lin, and W. Makalowski, "Overlapping genes in vertebrate genomes," Comput Biol Chem 29 (1) (2005): 1-12. PMID 15680581.
  31. Z. Johnson and S. Chisholm, "Properties of overlapping genes are conserved across microbial genomes," Genome Res 14(11) (2004): 2268–2272. PMID 15520290.
  32. R. Lamb and C. Horvath, "Diversity of coding strategies in influenza viruses," Trends Genet 7(8) (1991): 261–266. PMID 1771674.
  33. J. Davies and J. Stanley, "Geminivirus genes and vectors," Trends Genet 5(3) (1989): 77–81. PMID 2660364.
  34. K. Berns, "Parvovirus replication," Microbiol Rev 54(3) (1990): 316–329. PMID 2215424.
  35. C. Benham and S. Mielke, "DNA mechanics," Annu Rev Biomed Eng 7 (2005): 21–53. PMID 16004565.
  36. 36.0 36.1 J. Champoux, "DNA topoisomerases: structure, function, and mechanism," Annu Rev Biochem 70 (2001): 369–413. PMID 11395412.
  37. 37.0 37.1 J. Wang, "Cellular roles of DNA topoisomerases: a molecular perspective," Nat Rev Mol Cell Biol 3(6) (2002): 430–440. PMID 12042765.
  38. 38.0 38.1 G. Hayashi, M. Hagihara, and K. Nakatani, "Application of L-DNA as a molecular tag," Nucleic Acids Symp Ser (Oxf) 49 (2005): 261–262. PMID 17150733.
  39. J. M. Vargason, B. F. Eichman, and P. S. Ho, "The extended and eccentric E-DNA structure induced by cytosine methylation or bromination," Nature Structural Biology 7 (2000): 758-761. PMID 10966645.
  40. G. Wang and K. M. Vasquez, "Non-B DNA structure-induced genetic instability," Mutat Res 598(1-2) (2006): 103-119. PMID 16516932.
  41. Allemand, et al, "Stretched and overwound DNA forms a Pauling-like structure with exposed bases," PNAS 24(1998): 14152-14157. PMID 9826669.
  42. E. Palecek, "Local supercoil-stabilized DNA structures," Critical Reviews in Biochemistry and Molecular Biology 26(2) (1991): 151-226. PMID 1914495.
  43. H. Basu, B. Feuerstein, D. Zarling, R. Shafer, and L. Marton, "Recognition of Z-RNA and Z-DNA determinants by polyamines in solution: experimental and theoretical studies," J Biomol Struct Dyn 6 (2) (1988): 299-309. PMID 2482766.
  44. A. G. Leslie, S. Arnott, R. Chandrasekaran, and R. L. Ratliff, "Polymorphism of DNA double helices," J. Mol. Biol. 143(1) (1980): 49–72. PMID 7441761.
  45. M. Wahl and M. Sundaralingam, "Crystal structures of A-DNA duplexes," Biopolymers 44(1) (1997): 45-63. PMID 9097733.
  46. X. J. Lu, Z. Shakked, and W. K. Olson, "A-form conformational motifs in ligand-bound DNA structures," J. Mol. Biol. 300(4) (2000): 819-840. PMID.
  47. S. Rothenburg, F. Koch-Nolte, and F. Haag, "DNA methylation and Z-DNA formation as mediators of quantitative differences in the expression of alleles," Immunol Rev 184 (2001): 286-298. PMID 12086319.
  48. D. Oh, Y. Kim, and A. Rich, "Z-DNA-binding proteins can act as potent effectors of gene expression in vivo," Proc. Natl. Acad. Sci. U.S.A. 99(26) (2002): 16666-16671. PMID 12486233.
  49. 49.0 49.1 C. Greider, and E. Blackburn, "Identification of a specific telomere terminal transferase activity in Tetrahymena extracts," Cell 43(2 pt 1) (1985): 405-413. PMID 3907856.
  50. 50.0 50.1 C. Nugent and V. Lundblad, "The telomerase reverse transcriptase: components and regulation," Genes Dev 12(8) (1998): 1073-1085. PMID 9553037. Retrieved January 23, 2023.
  51. W. Wright, V. Tesmer, K. Huffman, S. Levene, and J. Shay, "Normal human chromosomes have long G-rich telomeric overhangs at one end," Genes Dev 11(21) (1997): 2801-2809. PMID 9353250. Retrieved January 23, 2023.
  52. 52.0 52.1 S. Burge, G. Parkinson, P. Hazel, A. Todd, and S. Neidle, "Quadruplex DNA: sequence, topology and structure," Nucleic Acids Res 34(19) (2006): 5402-5415. PMID 17012276.
  53. G. Parkinson, M. Lee, and S. Neidle, "Crystal structure of parallel quadruplexes from human telomeric DNA," Nature 417(6891) (2002): 876-880. PMID 12050675.
  54. J. Griffith, L. Comeau, S. Rosenfield, R. Stansel, A. Bianchi, H. Moss and T. de Lange, "Mammalian telomeres end in a large duplex loop," Cell 97(4) (1999): 503-514. PMID 10338214.
  55. R. Klose, and A. Bird, "Genomic DNA methylation: the mark and its mediators," Trends Biochem Sci 31(2) (2006): 89-97. PMID 16403636.
  56. A. Bird, "DNA methylation patterns and epigenetic memory," Genes Dev 16(1) (2002): 6-21. PMID 11782440.
  57. C. Walsh, and G. Xu, "Cytosine methylation and DNA repair," Curr Top Microbiol Immunol 301 (2006): 283-315. PMID 16570853.
  58. D. Ratel, J. Ravanat, F. Berger, and D. Wion, "N6-methyladenine: the other methylated base of DNA," Bioessays 28(3) (2006): 309-315. PMID 16479578.
  59. J. Gommers-Ampt, F. Van Leeuwen, A. de Beer, J. Vliegenthart, M. Dizdaroglu, J. Kowalak, P. Crain, and P. Borst, "Beta-D-glucosyl-hydroxymethyluracil: a novel modified base present in the DNA of the parasitic protozoan T. brucei," Cell 75(6) (1993): 1129-1136. PMID 8261512.
  60. T. Douki, A. Reynaud-Angelin, J. Cadet, and E. Sage, "Bipyrimidine photoproducts rather than oxidative lesions are the main type of DNA damage involved in the genotoxic effect of solar UVA radiation," Biochemistry 42(30) (2003): 9221-9226. PMID 12885257.
  61. J. Cadet, T. Delatour, T. Douki, D. Gasparutto, J. Pouget, J. Ravanat, and S. Sauvaigo, "Hydroxyl radicals and DNA base damage," Mutat Res 424(1-2) (1999): 9-21. PMID 10064846.
  62. M. Shigenaga, C. Gimeno, and B. Ames, "Urinary 8-hydroxy-2′-deoxyguanosine as a biological marker of in vivo oxidative DNA damage," Proc Natl Acad Sci U S A 86(24) (1989): 9697-9701. PMID 2602371.
  63. R. Cathcart, E. Schwiers, R. Saul, and B. Ames, "Thymine glycol and thymidine glycol in human and rat urine: A possible assay for oxidative DNA damage," Proc Natl Acad Sci U S A 81(18) (1984): 5633-5637. PMID 6592579.
  64. K. Valerie and L. Povirk, "Regulation and mechanisms of mammalian double-strand break repair," Oncogene 22(37) (2003): 5792-5812. PMID 12947387.
  65. L. Ferguson and W. Denny, "The genetic toxicology of acridines," Mutat Res 258(2) (1991): 123-160. PMID 1881402.
  66. A. Jeffrey, "DNA modification by chemical carcinogens," Pharmacol Ther 28(2) (1985): 237–272. PMID 3936066.
  67. T. Stephens, C. Bunde, and B. Fillmore, "Mechanism of action in thalidomide teratogenesis," Biochem Pharmacol 59(12) (2000): 1489–1499. PMID 10799645.
  68. M. Braña, M. Cacho, A. Gradillas, B. Pascual-Teresa, and A. Ramos, "Intercalators as anticancer drugs," Curr Pharm Des 7(17) (2001): 1745–1780. PMID 11562309.
  69. J. Venter, et al., "The sequence of the human genome," Science 291(5507) (2001): 1304–1351. PMID 11181995.
  70. M. Thanbichler, S. Wang, and L. Shapiro, "The bacterial nucleoid: a highly organized and dynamic structure," J Cell Biochem 96(3) (2005): 506–521. PMID 15988757.
  71. T. Wolfsberg, J. McEntyre, and G. Schuler, "Guide to the draft human genome," Nature 409(6822) (2001): 824–826. PMID 11236998.
  72. T. Gregory, "The C-value enigma in plants and animals: a review of parallels and an appeal for partnership," Ann Bot (Lond) 95(1) (2005): 133–146. PMID 15596463. Retrieved January 23, 2023.
  73. The ENCODE Project Consortium, "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project," Nature 447(7146) (2007): 799-816. doi:10.1038/nature05874
  74. A. Pidoux, and R. Allshire, "The role of heterochromatin in centromere function," Philos Trans R Soc Lond B Biol Sci 360(1455) (2005): 569–579. PMID 15905142.
  75. P. Harrison, H. Hegyi, S. Balasubramanian, N. Luscombe, P. Bertone, N. Echols, T. Johnson, and M. Gerstein, "Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22," Genome Res 12(2) (2002): 272–280. PMID 11827946. Retrieved January 23, 2023.
  76. P. Harrison, and M. Gerstein, "Studying genomes through the aeons: protein families, pseudogenes and proteome evolution," J Mol Biol 318(5) (2002): 1155–1174. PMID 12083509.
  77. M. M. Albà, "Replicative DNA polymerases," Genome Biol 2(1) (2001). PMID 11178285.
  78. K. Sandman, S. Pereira, and J. Reeve, "Diversity of prokaryotic chromosomal proteins and the origin of the nucleosome," Cell Mol Life Sci 54(12) (1998): 1350–1364. PMID 9893710.
  79. R. T. Dame, "The role of nucleoid-associated proteins in the organization and compaction of bacterial chromatin," Mol. Microbiol. 56(4) (2005): 858-870. PMID 15853876.
  80. K. Luger, A. Mäder, R. Richmond, D. Sargent, and T. Richmond, "Crystal structure of the nucleosome core particle at 2.8 A resolution," Nature 389(6648) (1997): 251–260. PMID 9305837.
  81. T. Jenuwein, and C. Allis, "Translating the histone code," Science 293(5532) (2001): 1074-1080. PMID 11498575.
  82. T. Ito, "Nucleosome assembly and remodelling," Curr Top Microbiol Immunol 274(2003): 1–22. PMID 12596902.
  83. J. Thomas, "HMG1 and 2: architectural DNA-binding proteins," Biochem Soc Trans 29(4) (2001): 395–401. PMID 11497996.
  84. R. Grosschedl, K. Giese, and J. Pagel, "HMG domain proteins: architectural elements in the assembly of nucleoprotein structures," Trends Genet 10(3) (1994): 94–100. PMID 8178371.
  85. C. Iftode, Y. Daniely, and J. Borowiec, "Replication protein A (RPA): the eukaryotic SSB," Crit Rev Biochem Mol Biol 34(3) (1999): 141–180. PMID 10473346.
  86. Created from PDB 1LMB Retrieved January 23, 2023.
  87. L. Myers, and R. Kornberg, "Mediator of transcriptional regulation," Annu Rev Biochem 69 (2000): 729–749. PMID 10966474.
  88. B. Spiegelman, and R. Heinrich, "Biological control through regulated transcriptional coactivators," Cell 119(2) (2004): 157-167. PMID 15479634.
  89. Z. Li, S. Van Calcar, C. Qu, W. Cavenee, M. Zhang, and B. Ren, "A global transcriptional regulatory role for c-Myc in Burkitt's lymphoma cells," Proc Natl Acad Sci U S A 100(14) (2003): 8164–8169. PMID 12808131. Retrieved November 29, 2017.
  90. C. Pabo, and R. Sauer, "Protein-DNA recognition," Annu Rev Biochem 53 (1984): 293–321. PMID 6236744.
  91. Created from PDB 1RVA. Retrieved January 23, 2023.
  92. T. Bickle, and D. Krüger, "Biology of DNA restriction," Microbiol Rev 57(2) (1993): 434–450. PMID 8336674.
  93. 93.0 93.1 A. J. Doherty, and S. W. Suh, "Structural and mechanistic conservation in DNA ligases," Nucleic Acids Res 28(21) (2000): 4051–4058. PMID 11058099.
  94. A. Schoeffler, and J. Berger, "Recent advances in understanding structure-function relationships in the type II topoisomerase mechanism," Biochem Soc Trans 33(6) (2005): 1465–1470. PMID 16246147.
  95. N. Tuteja, and R. Tuteja, "Unraveling DNA helicases. Motif, structure, mechanism and function," Eur J Biochem 271(10) (2004): 1849–1863. PMID 15128295.
  96. C. Joyce and T. Steitz, "Polymerase structures and function: variations on a theme?" J Bacteriol 177(11) (1995): 6321–6329. PMID 7592405.
  97. U. Hubscher, G. Maga, and S. Spadari, "Eukaryotic DNA polymerases," Annu Rev Biochem 71 (2002): 133–163. PMID 12045093.
  98. A. Johnson, and M. O'Donnell, "Cellular DNA replicases: components and dynamics at the replication fork," Annu Rev Biochem 74 (2005): 283–315. PMID 15952889.
  99. L. Tarrago-Litvak, M. Andréola, G. Nevinsky, L. Sarih-Cottin, and S. Litvak, "The reverse transcriptase of HIV-1: from enzymology to therapeutic intervention," FASEB J 8(8) (1994): 497–503. PMID 7514143.
  100. E. Martinez, "Multi-protein complexes in eukaryotic gene transcription," Plant Mol Biol 50(6) (2002): 925–947. PMID 12516863.
  101. Created from PDB 1M6G. Retrieved January 23, 2023.
  102. T. Cremer and C. Cremer, "Chromosome territories, nuclear architecture and gene regulation in mammalian cells," Nat Rev Genet 2(4) (2001): 292–301. PMID 11283701.
  103. C. Pál, B. Papp, and M. Lercher, "An integrated view of protein evolution," Nat Rev Genet 7(5) (2006): 337–348. PMID 16619049.
  104. M. O'Driscoll and P. Jeggo, "The role of double-strand break repair—insights from human genetics," Nat Rev Genet 7(1) (2006): 45–54. PMID 16369571.
  105. S. Vispé and M. Defais, "Mammalian Rad51 protein: a RecA homologue with pleiotropic functions," Biochimie 79(9-10) (1997): 587-592. PMID 9466696.
  106. M. J. Neale, and S. Keeney, "Clarifying the mechanics of DNA strand exchange in meiotic recombination," Nature 442 (7099) (2006): 153-158. PMID 2006.
  107. M. Dickman, S. Ingleston, S. Sedelnikova, J. Rafferty, R. Lloyd, J. Grasby, and D. Hornby, "The RuvABC resolvasome," Eur J Biochem 269(22) (2002): 5492–5501. PMID 12423347.
  108. G. Joyce, "The antiquity of RNA-based evolution," Nature 418(6894) (2002): 214-221. PMID 12110897.
  109. R. Davenport, "Ribozymes. Making copies in the RNA world," Science 292(5520) (2001): 1278. PMID 11360970
  110. E. Szathmáry, "What is the optimum size for the genetic alphabet?" Proc Natl Acad Sci U S A 89(1992, issue 7): 2614–1618. PMID 1372984. Retrieved January 23, 2023.
  111. T. Lindahl, "Instability and decay of the primary structure of DNA," Nature 362(6422) (1993): 709–715. PMID 8469282.
  112. R. Vreeland, W. Rosenzweig, and D. Powers, "Isolation of a 250 million-year-old halotolerant bacterium from a primary salt crystal," Nature 407(6806) (2000): 897–900. PMID 11057666.
  113. M. Hebsgaard, M. Phillips, and E. Willerslev, "Geologically ancient DNA: fact or artefact?" Trends Microbiol 13(5) (2005): 212–220. PMID 15866038.
  114. D. Nickle, G. Learn, M. Rain, J. Mullins, and J. Mittler, "Curiously modern DNA for a '250 million-year-old' bacterium," J Mol Evol 54(1) (2002): 134–137. PMID 11734907.
  115. S. P. Goff and P. Berg, "Construction of hybrid viruses containing SV40 and lambda phage DNA segments and their propagation in cultured monkey cells," Cell 9(4 Pt2) (1976): 695–705. PMID 189942.
  116. L. Houdebine, "Transgenic animal models in biomedical research," Methods Mol Biol 360 (2007): 163–202. PMID 17172731.
  117. H. Daniell, and A. Dhingra, "Multigene engineering: dawn of an exciting new era in biotechnology," Curr Opin Biotechnol 13(2) (2002): 136–141. PMID 11950565.
  118. D. Job, "Plant biotechnology in agriculture," Biochimie 84(11) (2002): 1105–1110. PMID 12595138.
  119. A. Collins and N. Morton, "Likelihood ratios for DNA identification," Proc Natl Acad Sci U S A 91(13) (1994): 6007–6011. PMID 8016106. Retrieved January 23, 2023.
  120. B. Weir, C. Triggs, L. Starling, L. Stowell, K. Walsh, and J. Buckleton, "Interpreting DNA mixtures," J Forensic Sci 42(2) (1997): 213–222. PMID 9068179.
  121. A. Jeffreys, V. Wilson, and S. Thein, "Individual-specific 'fingerprints' of human DNA," Nature 316(6023) (1985): 76–79. PMID 2989708.
  122. S. Bhattacharya, "Killer convicted thanks to relative's DNA," Newscientist.com, April 20, 2004.
  123. P. Baldi and S. Brunak, Bioinformatics: The Machine Learning Approach (MIT Press, 2001, ISBN 978-0262025065).
  124. D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology (Cambridge University Press 1997, ISBN 978-0521585194).
  125. K. Sjölander, "Phylogenomic inference of protein molecular function: Advances and challenges," Bioinformatics 20(2) (2004): 170-179. PMID 14734307. Retrieved January 23, 2023.
  126. D. M. Mount, Bioinformatics: Sequence and Genome Analysis, 2nd edition (Cold Spring, NY: Cold Spring Harbor Laboratory Press 2004, ISBN 0879697121).
  127. L. Adleman, "Molecular computation of solutions to combinatorial problems," Science 266(5187 (1994): 1021–1024. PMID 7973651.
  128. J. Parker, "Computing with DNA," EMBO Rep 4(1) (2003): 7–10. PMID 12524509.
  129. G. Wray, "Dating branches on the tree of life using DNA," Genome Biol 3(1) (2002). PMID 11806830.
  130. Y. Kleiman, "The Cohanim/DNA Connection: The fascinating story of how DNA studies confirm an ancient biblical tradition," Aish.com, January 13, 2000.

References
ISBN links support NWE through referral fees

  • Alberts, B., A. Johnson, J. Lewis, M. Raff, K. Roberts, and P. Walters. Molecular Biology of the Cell 4th edition. New York: Garland Science, 2002. ISBN 0815332181
  • Baldi, P., and S. Brunak. Bioinformatics: The Machine Learning Approach. MIT Press, 2001. ISBN 978-0262025065
  • Berg, J., J. Tymoczko, and L. Stryer. Biochemistry. (W. H. Freeman and Company, 2002. ISBN 0716749556
  • Butler, J. Forensic DNA Typing. San Diego: Academic Press, 2001. ISBN 978-0121479510
  • Calladine, C. R., H. R. Drew, B. F. Luisi, and A. A. Travers. Understanding DNA. Elsevier Academic Press, 2003. ISBN 978-0121550899
  • Clayton, J., and C. Dennis (eds.). 50 Years of DNA. Palgrave MacMillan Press, 2003. ISBN 978-1403914798
  • Gusfield, D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997. ISBN 978-0521585194
  • Judson, H. F. The Eighth Day of Creation: Makers of the Revolution in Biology. Cold Spring Harbor Laboratory Press, 1996. ISBN 978-0879694784
  • Mount, D. M. Bioinformatics: Sequence and Genome Analysis. 2nd edition. Cold Spring, NY: Cold Spring Harbor Laboratory Press 2004. ISBN 0879697121
  • Olby, R. The Path to The Double Helix: Discovery of DNA. MacMillan, 1974. ISBN 978-0486681177
  • Ridley, M. Francis Crick: Discoverer of the Genetic Code (Eminent Lives). HarperCollins Publishers, 2006. ISBN 978-0060823337
  • Watson, J. D. and F. H. C. Crick. "A structure for deoxyribose nucleic acid." Nature 171 (1953): 737–738.
  • Watson, J. D. Avoid Boring People and Other Lessons From a Life in Science. New York: Knopf, 2007. ISBN 0375412840
  • Watson, J. D. DNA: The Secret of Life New York: Alfred A. Knopf, 2003. ISBN 978-0375415463
  • Watson, J. D. The Double Helix: A Personal Account of the Discovery of the Structure of DNA. New York: Norton, 1980. ISBN 978-0393950755

External links

All links retrieved January 12, 2024.

Nucleic acids edit
Nucleobases: Adenine - Thymine - Uracil - Guanine - Cytosine - Purine - Pyrimidine
Nucleosides: Adenosine - Uridine - Guanosine - Cytidine - Deoxyadenosine - Thymidine - Deoxyguanosine - Deoxycytidine
Nucleotides: AMP - UMP - GMP - CMP - ADP - UDP - GDP - CDP - ATP - UTP - GTP - CTP - cAMP - cGMP
Deoxynucleotides: dAMP - dTMP - dUMP - dGMP - dCMP - dADP - dTDP - dUDP - dGDP - dCDP - dATP - dTTP - dUTP - dGTP - dCTP
Nucleic acids: DNA - RNA - LNA - PNA - mRNA - ncRNA - miRNA - rRNA - siRNA - tRNA - mtDNA - Oligonucleotide

Credits

New World Encyclopedia writers and editors rewrote and completed the Wikipedia article in accordance with New World Encyclopedia standards. This article abides by terms of the Creative Commons CC-by-sa 3.0 License (CC-by-sa), which may be used and disseminated with proper attribution. Credit is due under the terms of this license that can reference both the New World Encyclopedia contributors and the selfless volunteer contributors of the Wikimedia Foundation. To cite this article click here for a list of acceptable citing formats.The history of earlier contributions by wikipedians is accessible to researchers here:

The history of this article since it was imported to New World Encyclopedia:

Note: Some restrictions may apply to use of individual images which are separately licensed.