Human genome is the genome of Homo sapiens; that is, the hereditary information that genetically characterizes human beings as encoded on the DNA of one set of the 23 chromosome pairs of the somatic cells. Twenty-two of these are autosomal chromosome pairs, while the remaining pair is sex-determining. As the complete genetic sequence of one of the two sets of chromosomes, the human genome includes both the genes and the non-coding sequences of DNA.
The Human Genome Project produced a reference sequence of the human genome, which is used worldwide in biomedical sciences. The haploid human genome occupies a total of just over 3 billion DNA base pairs and has a data size of approximately 750 megabytes (Overbye 2007). This haploid human genome contains an estimated 20,000 to 25,000 protein-coding genes, far fewer than had been expected before its sequencing (IHGSC 2004). In fact, only about 1.5 percent of the genome codes for proteins, while the rest is comprised of RNA genes, regulatory sequences, introns, and (controversially) "junk" DNA (IHGSC 2001).
The tremendous breakthrough in resolving the genomes of many species, including humans, has been of great value in understanding organisms and their connectedness over time. However, this does not imply that mapping every gene that makes up a person will allow one to explain that person. In addition to the importance of environmental factors, various religious perspectives hold that life cannot be explained by physico-chemical processes alone and that human beings are more than just physical beings, possessing also a spiritual aspect.
Understanding the human genome is helpful in understanding and working toward a resolution of genetic diseases. Some attention also must be given to lifestyle choices and environmental factors, since they can contribute to genetic damage within one's own cells, such as through exposure to harmful chemicals or radiation, drug use, or infection with a pathogen. Recently, an active area of research has been epigenetics, including to what extent DNA can be modified or imprinted by one's experiences, such as via diet, smoking, or obesity (Leake 2008).
There are 24 distinct human chromosomes: 22 autosomal chromosomes, plus the sex-determining X and Y chromosomes. Chromosomes 1–22 are numbered roughly in order of decreasing size. Somatic cells usually have 23 chromosome pairs: One copy of chromosomes 1–22 from each parent, plus an X chromosome from the mother, and either an X or Y chromosome from the father, for a total of 46 chromosomes.
There are estimated 20,000 to 25,000 human protein-coding genes (IHSGC 2004). The estimate of the number of human genes has been repeatedly revised down from initial predictions of 100,000 or more as genome sequence quality and gene finding methods have improved, and could continue to drop further.
Surprisingly, the number of human genes seems to be less than a factor of two greater than that of many much simpler organisms, such as the roundworm and the fruit fly. However, human cells make extensive use of alternative splicing to produce several different proteins from a single gene, and the human proteome (entire complement of proteins expressed by a genome) is thought to be much larger than those of the aforementioned organisms. In addition, most human genes have multiple exons, and human introns are frequently much longer than the flanking exons.
Human genes are distributed unevenly across the chromosomes. Each chromosome contains various gene-rich and gene-poor regions, which seem to be correlated with chromosome bands and GC-content. The significance of these nonrandom patterns of gene density is not well understood. In addition to protein coding genes, the human genome contains thousands of RNA genes, including tRNA, ribosomal RNA, microRNA, and other non-coding RNA genes.
The human genome has many different regulatory sequences that are crucial to controlling gene expression. These are typically short sequences that appear near or within genes. A systematic understanding of these regulatory sequences and how they together act as a gene regulatory network is only beginning to emerge from computational, high-throughput expression and comparative genomics studies.
Identification of regulatory sequences relies in part on the concept of evolutionary conservation. The evolutionary branch between the human and mouse, for example, is considered to have occurred 70 to 90 million years ago (Nei et al. 2001). So computer comparisons of gene sequences that identify conserved non-coding sequences will be an indication of their importance in duties such as gene regulation (Loots et al. 2000).
Another comparative genomic approach to locating regulatory sequences in humans is the gene sequencing of the puffer fish. These vertebrates have essentially the same genes and regulatory gene sequences as humans, but with only one-eighth the "junk" DNA. The compact DNA sequence of the puffer fish makes it much easier to locate the regulatory genes (Meunier 2001).
Protein-coding sequences (specifically, coding exons) comprise less than 1.5 percent of the human genome (IHSGC 2001). Aside from genes and known regulatory sequences, the human genome contains vast regions of DNA the function of which, if any, remains unknown. These regions in fact comprise the vast majority, by some estimates 97 percent, of the human genome size. Much of this is composed of:
- Tandem repeats
- Satellite DNA
- Interspersed repeats
- DNA Transposons
However, there is also a large amount of sequence that does not fall under any known classification.
Much of this sequence may be an evolutionary artifact that serves no present-day purpose, and these regions are sometimes collectively referred to as "junk" DNA. There are, however, a variety of emerging indications that many sequences within likely are functional but in ways that are not fully understood. Recent experiments using microarrays have revealed that a substantial fraction of non-genic DNA is, in fact, transcribed into RNA (Claverie 2005), which leads to the possibility that the resulting transcripts may have some unknown function. Also, the evolutionary conservation across the mammalian genomes of much more sequence than can be explained by protein-coding regions indicates that many, and perhaps most, functional elements in the genome remain unknown (MGSC 2002). The investigation of the vast quantity of sequence information in the human genome whose function remains unknown is currently a major avenue of scientific inquiry (ENCODE 2007).
The mitochondria of human beings also contain genetic material within their membranes, separate and distinct from the nuclear DNA. Generally, the term "human genome" carries the connotation of only information on chromosomal DNA. Thus, the genes in the mitochondrial DNA are not considered part of the human genome, although such may be referred to as the "mitochondrial genome."
The human mitochondrial genome, while usually not included when referring to the "human genome," is of tremendous interest to geneticists, since it undoubtedly plays a role in mitochondrial disease. It also sheds light on human evolution; for example, analysis of variation in the human mitochondrial genome has led to the postulation of a recent common ancestor for all humans on the maternal line of descent.
Due to the lack of a system for checking for copying errors, Mitochondrial DNA (mtDNA) has a more rapid rate of variation than nuclear DNA. This 20-fold increase in the mutation rate allows mtDNA to be used for more accurate tracing of maternal ancestry. Studies of mtDNA in populations have allowed ancient migration paths to be traced, such as the migration of Native Americans from Siberia or Polynesians from southeastern Asia. It has also been used to show that there is no trace of Neanderthal DNA in the European gene mixture inherited through purely maternal lineage (Wright 2019).
Most studies of human genetic variation have focused on single nucleotide polymorphisms (SNPs), which are substitutions in individual bases along a chromosome. Most analyses estimate that SNPs occur on average somewhere between every 1 in 100 and 1 in 1,000 base pairs in the euchromatic human genome, although they do not occur at a uniform density. Thus follows the popular statement that "all human beings, genetically, are 99.9 percent the same" (Clinton 2000), although this would be somewhat qualified by most geneticists. For example, a much larger fraction of the genome is now thought to be involved in copy number variation (Redon et al. 2006). A large-scale collaborative effort to catalog SNP variations in the human genome is being undertaken by the International HapMap Project.
The genomic loci and length of certain types of small repetitive sequences are highly variable from person to person, which is the basis of DNA fingerprinting and DNA paternity testing technologies. The heterochromatic portions of the human genome, which total several hundred million base pairs, are also thought to be quite variable within the human population (they are so repetitive and so long that they cannot be accurately sequenced with current technology). These regions contain few genes, and it is unclear whether any significant phenotypic effect results from typical variation in repeats or heterochromatin.
Most gross genomic mutations in germ cells probably result in inviable embryos; however, a number of human diseases are related to large-scale genomic abnormalities. Down syndrome, Turner Syndrome, and a number of other diseases result from nondisjunction of entire chromosomes. Cancer cells frequently have aneuploidy of chromosomes and chromosome arms, although a cause and effect relationship between aneuploidy and cancer has not been established.
Most aspects of human biology involve both genetic (inherited) and non-genetic (environmental) factors. Some inherited variation influences aspects of our biology that are not medical in nature (height, eye color, ability to taste or smell certain compounds, and so on). Moreover, some genetic disorders only cause disease in combination with the appropriate environmental factors (such as diet).
With these caveats, genetic disorders may be described as clinically defined diseases caused by genomic DNA sequence variation. In the most straightforward cases, the disorder can be associated with variation in a single gene. For example, cystic fibrosis is caused by mutations in the CFTR gene, and is the most common recessive disorder in Caucasian populations with over 1300 different mutations known. Disease-causing mutations in specific genes are usually severe in terms of gene function, and are fortunately rare, thus genetic disorders are similarly individually rare. However, since there are many genes that can vary to cause genetic disorders, in aggregate they comprise a significant component of known medical conditions, especially in pediatric medicine. Molecularly characterized genetic disorders are those for which the underlying causal gene has been identified, with over 3,000 such disorders annotated in the OMIM database (OMIM).
Studies of genetic disorders are often performed by means of family-based studies. In some instances, population based approaches are employed, particularly in the case of so-called founder populations such as those in Finland, French-Canada, Utah, Sardinia, and so forth. Diagnosis and treatment of genetic disorders are usually performed by a geneticist-physician trained in clinical/medical genetics. The results of the Human Genome Project are likely to provide increased availability of genetic testing for gene-related disorders, and eventually improved treatment. Parents can be screened for hereditary conditions and counseled on the consequences, the probability it will be inherited, and how to avoid or ameliorate it in their offspring.
As noted above, there are many different kinds of DNA sequence variation, ranging from complete extra or missing chromosomes down to single nucleotide changes. It is generally presumed that much naturally occurring genetic variation in human populations is phenotypically neutral, that is, has little or no detectable effect on the physiology of the individual. Genetic disorders can be caused by any or all known types of sequence variation. To molecularly characterize a new genetic disorder, it is necessary to establish a causal link between a particular genomic sequence variant and the clinical disease under investigation. Such studies constitute the realm of human molecular genetics.
With the advent of the Human Genome and International HapMap Project, it has become feasible to explore subtle genetic influences on many common disease conditions such as diabetes, asthma, migraine, schizophrenia, and so forth. Although some causal links have been made between genomic sequence variants in particular genes and some of these diseases, often with much publicity in the general media, these are usually not considered to be genetic disorders per se, as their causes are complex, involving many different genetic and environmental factors. Thus, there may be disagreement in particular cases whether a specific medical condition should be termed a genetic disorder.
Of course, as beings that are not just physical, but also are mental, social, and spiritual in nature, many factors interplay with genetic disorders, not just physical factors. A person who leads an unhealthy life, physically or spiritually, either by choice or ignorance, can contribute to the genetic damage within his or her own cells. Damage to germ cells can be passed down to one's descendants in the form of mutations or chromosomal disorders. For example, a person may be exposed to harmful chemicals or radiation, perhaps as a result of warfare or careless disposal of radioactive materials (environmental pollution). A person may engage in careless or promiscuous sex and become infected with a pathogen that can lead to genetic damage. Drug use is another correlate of genetic damage. Sometimes a person may act conscientiously, yet be infected because of societal failure. An example of this is the use of thalidomide, a prescribed drug that later was found to cause birth defects when taken during pregnancy.
Similarly, a person's actions can impact the expression of certain genetic disorders. For example, phenylketonuria (PKU) is a genetic disorder characterized by a deficiency in the enzyme phenylalanine hydroxylase (PAH), which is necessary to metabolize the amino acid phenylalanine to tyrosine. However, PKU can be controlled by diet. A diet low in phenylalanine and high in tyrosine can bring about a nearly total cure.
Comparative genomics studies of mammalian genomes suggest that approximately 5 percent of the human genome has been conserved by evolution since the divergence of those species approximately 200 million years ago, containing the vast majority of genes (MGSC 2004; ENCODE 2007). Intriguingly, since genes and known regulatory sequences probably comprise less than 2 percent of the genome, this suggests that there may be more unknown functional sequence than known functional sequence.
A smaller, yet large, fraction of human genes seem to be shared among most known vertebrates. The chimpanzee genome is 95 percent identical to the human genome. On average, a typical human protein-coding gene differs from its chimpanzee ortholog by only two amino acid substitutions; nearly one third of human genes have exactly the same protein translation as their chimpanzee orthologs. A major difference between the two genomes is human chromosome 2, which is equivalent to a fusion product of chimpanzee chromosomes 12 and 13 (CSAC 2005; Olson and Varki 2003).
ReferencesISBN links support NWE through referral fees
- Claverie, J. 2005. Fewer genes, more noncoding RNA. Science 309(5740): 1529–30. Retrieved September 18, 2020.
- Chimpanzee Sequencing and Analysis Consortium (CSAC). 2005. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437(7055): 69–87. Retrieved September 18, 2020.
- Clinton, W.J. 2000. 2000 State of the Union address (January 27, 2000). Retrieved September 18, 2020.
- ENCODE Project Consortium. 2007. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799–816. Retrieved September 18, 2020.
- International Human Genome Sequencing Consortium (IHGSC). 2001. Initial sequencing and analysis of the human genome. Nature 409(6822): 860–921. Retrieved September 18, 2020.
- International Human Genome Sequencing Consortium (IHGSC). 2004. Finishing the euchromatic sequence of the human genome. Nature 431(7011): 931–945. Retrieved September 18, 2020.
- Lindblad-Toh, K., C.M. Wade, T.S. Mikkelsen, et al. 2005. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438(7069): 803–19. Retrieved September 18, 2020.
- Loots, G., R. Locksley, C. Blankespoor, Z. Wang, W. Miller, E. Rubin, and K. Frazer. 2000. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288(5463): 136–140. Retrieved September 18, 2020.
- Meunier, M. 2001. Genoscope and Whitehead announce a high sequence coverage of the Tetraodon nigroviridis genome. Genoscope. Retrieved September 18, 2020.
- Mouse Genome Sequencing Consortium (MGSC). 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420(6915): 520–62. Retrieved September 18, 2020.
- Nei, M., P. Xu, and G. Glazko. 2001. Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms. Proc Natl Acad Sci U S A 98(5): 2497–2502. Retrieved September 18, 2020.
- Olson, M., and A. Varki. 2003. Sequencing the chimpanzee genome: Insights into human evolution and disease. Nat Rev Genet 4(1): 20–28. Retrieved September 18, 2020.
- Online Mendelian Inheritance in Man (OMIM). OMIM Disorders. Online Mendelian Inheritance in Man. Retrieved September 18, 2020.
- Overbye, D. 2007. Human DNA, the ultimate spot for secret messages (Are some there now?) New York Times June 26, 2007. Retrieved September 18, 2020.
- Redon, R., S. Ishikawa, K.R. Fitch, L. Feuk, et al. 2006. Global variation in copy number in the human genome. Nature 444: 444-454. Retrieved September 18, 2020.
- Wright, Joseph. 2019. Gene Control. ISBN 978-1788821940
All links retrieved September 18, 2020.
- The National Human Genome Research Institute.
- National Library of Medicine human genome viewer.
- UCSC Genome Browser.
New World Encyclopedia writers and editors rewrote and completed the Wikipedia article in accordance with New World Encyclopedia standards. This article abides by terms of the Creative Commons CC-by-sa 3.0 License (CC-by-sa), which may be used and disseminated with proper attribution. Credit is due under the terms of this license that can reference both the New World Encyclopedia contributors and the selfless volunteer contributors of the Wikimedia Foundation. To cite this article click here for a list of acceptable citing formats.The history of earlier contributions by wikipedians is accessible to researchers here:
The history of this article since it was imported to New World Encyclopedia:
Note: Some restrictions may apply to use of individual images which are separately licensed.