Concept


Each organism has a defining set of chromosomes that contain all of its genetic information. The human genome, for example, is the set of genetic information encoded in 46 chromosomes found in the nucleus of each cell. The chromosomes are organized into 23 pairs — one chromosome of each pair is inherited from the mother and one from the father. One pair of chromosomes — X and Y — determine sex; the other 22 pairs are called autosomes. So, it is fruitful to think of the human genome as being made up of very long DNA molecules corresponding to each chromosome. Arrayed along these molecules are an estimated 60,000 genes. The object of the Human Genome Project is to determine the entire nucleotide sequence of each of these DNA molecules — and the location and identity of each of the human genes. To a great extent, sequencing the human genome has relied on automated machines that sequence the DNA and computer programs that search and identify genes. A rough draft of the human genome was completed in the summer of 2000.

Animation


Hi, I’m Jim Watson. You may remember me from concept 19. In 1953, Francis Crick and I discovered the structure of DNA. Prior to having a lot of genomic sequences, "finding" a gene was difficult and involved looking for markers to fix a gene's position on a chromosome. The earliest chromosome maps used visible banding patterns, from stains, as markers. However, there are millions of nucleotides and thousands of genes within each band. As the first director of the government sponsored Human Genome Project, one of my goals was to put more markers on these chromosomes. These markers would be based on unique DNA sequences. We tried to find markers about 150,000 nucleotides from each other. Since an average gene is around 10,000 nucleotides, it's much easier to find a gene within 150,000 nucleotides than within a chromosome's worth of DNA. Ultimately, we generated a map with 30,000 distinct markers. Each marker was a unique 100 to 200 base pair sequence within the genome. Hi, I’m Craig Venter. When I was at the National Institutes of Health (NIH) with Jim Watson, I was also interested in locating human genes. Genes only make up about 3% of our genome, and I figured out a faster and cheaper method of finding them. I first used my method to find genes expressed in the human brain. I used brain cDNAs that others had stored in a phage library. The library was made by first extracting mRNAs from human brain tissue. The mRNAs are reverse transcribed, and the cDNAs are stored as a library of clones in phage particles. We used the polymerase chain reaction (PCR) to generate small 150 to 400 base pair DNA segments from these cDNAs. A non-specific primer was used in these PCR reactions. Starting from the primer, taq polymerase made short copies of the brain cDNAs. I called the resulting short sequences Expressed Sequence Tags (ESTs). They vary in length, and are all unique. Each EST identifies a gene expressed in the brain, because they are derived from brain mRNA. We sequenced 2,375 ESTs from the human brain, and compared the sequences with genes already in public databases. Only 17% of our ESTs matched previously known gene sequences. Some of the ESTs in this 17% matched up with known genes expressed throughout the body, like beta-actin. Others were more specific to brain tissue, like the big-brain gene from Drosophila. Most excitingly, 83% of the ESTs represented previously unknown genes! We continued making ESTs, and, in a few years, tagged over 30,000 new genes. To some, this is a treasure trove of information. Several biotech companies, such as Human Genome Sciences and Incyte Genomics, were started to capitalize on this gene bank. It seemed to me that Craig and his industry buddies just wanted to find genes they could patent. I was interested in the entire genome and how it all works. EST libraries are fine, but genes that are not highly expressed will not be represented. Also, there won't be any upstream control regions in an EST library. To get information from the entire genome, we had to break it down into smaller manageable pieces. First, each chromosome is cut with rare-cutting restriction enzymes to generate pieces approximately 150,000 base pairs in length. These fragments are cloned into bacterial artificial chromosomes (BACs). We cut copies of the same chromosome with different enzymes to make sure we had overlapping fragments representing the entire chromosome. Our physical markers identified some of the BACs. BACs lacking markers were further mapped using more common restriction enzymes. These restriction enzyme sites, also markers, are used to identify and line up the BACs. The BACs have now been assembled in an orderly manner. After ordering, we selected the minimal number of BACs needed to span the entire chromosome. However, BACs are still too big for automated sequencing. We repeated the breakdown process, and randomly broke each BAC into pieces about 1,500 base pairs long. Each piece was subcloned into phage. For simplicity, we are only showing two copies. Again, we made sure to get overlapping pieces. One end of each subclone was sequenced in the automated sequencer. It wasn't really necessary to sequence the entire piece, because the sequenced end of one subclone will overlap the unknown end of another. When we finished sequencing, the subclones from each BAC were lined up in the proper order by aligning matching sequences. Using the sequences and the marker maps, all the subclones were assembled into BACs, and the BACs were assembled into chromosomes. When each chromosome was assembled, we had sequenced the entire human genome! Hi, I’m back. After my EST technology was commercialized, it didn't take long to make a number of good expression libraries. I began to look for other projects. I thought the federally-funded Human Genome Project was plodding along too slowly so I came up with – you guessed it – a cheaper and faster method. This method, which I called shotgun sequencing, starts by first making random, overlapping DNA pieces of about 2,000 base pairs. This was done by physically forcing a solution of DNA through a syringe. We also made 10,000 base pair pieces. All the pieces were stored individually in bacterial plasmids. The ends of each piece were sequenced in the automated sequencer. As you learned before, when many pieces overlap each other, it isn't necessary to sequence the entire piece. Then, we entered the data into a computer program developed specifically for ordering all the pieces. The program assembled the entire genome by matching overlapping sequences. With shotgun sequencing, there is no need to bother with markers or assembling and subcloning BACs. I first tested this method by sequencing the 1.8 million base pair genome of the Haemophilus influenzae bacteria in 1995 after I had left NIH. Then, in 1999, as a dry run for the human genome, my own company, Celera, sequenced and assembled the Drosophila melanogaster fruit fly genome in collaboration with the Drosophila Genome Project. On June 26, 2000 the current director of the Human Genome Project, Francis Collins, and I, announced at the White House that we had both completed sequencing a rough draft of the human genome. The next step is to figure out which parts of the sequence encode genes and how many genes exist. Despite this big announcement, the complete sequence of our genome isn’t really finished. Both methods have left small holes in the sequence that will be patched in the next few years. Some of these holes may never be completely resolved because they occur near centromeres that contain multiple repeated sequences. As you can see below, pieces with repeated sequences can be lined up in several ways. Only one will be correct. Also, the hard work of understanding what all the 3.2 billion base pairs of DNA represent and the kinds of information encoded into our sequences has really just begun.

Gallery


James Watson, President, Cold Spring Harbor Laboratory, 1993.
Craig Venter, cofounder of Celera Genomics Corporation.

Audio/Video


Audio Glossary

Human Genome Project, CDNA library, Genetic map, Genetic marker, Genome, Human artificial chromosome (HAC), Physical map, Polymerase chain reaction (PCR), Primer

Video Interviews

Francis S. Collins

Dr. Francis Collins is the Director of the National Human Genome Research Institute (NHGRI). The video, courtesy of NHGRI, comes from the June 2000 press conference announcing the completion of the Project.

Clip 1 (0:27)
Initial objections to the Human Genome Project.

Clip 2 (1:09)
The NHGRI's sequencing strategy.

Clip 3 (0:38)
How many genes do we have?

Clip 4 (0:52)
Using the Project's data to find genetic variations among people.

Clip 5 (0:47)
What we need to do to understand our genome.

Clip 6 (1:20)
Medical advances that will spring from the Project.

Clip 7 (0:43)
Your genome fits on a DVD, or you can access it from the Internet.

J. Craig Venter

Dr. Venter is the founder, president, and chief scientific officer of Celera Genomics.

Clip 1 (0:34)
Venter's experiences as a medic in the Vietnam War.

Clip 2 (0:42)
Origins of the EST project.

Clip 3 (0:54)
What are ESTs used for?

Clip 4 (0:43)
Shotgun sequencing.

Clip 5 (1:13)
Problems using shotgun method on Drosophila.

Clip 6 (1:06)
How much computing power is needed to crunch genome data?

Clip 7 (1:09)
Venter's views on patenting.

Biography


 

Craig Venter and Francis Collins represent the commercial and the federally-funded efforts of the Human Genome Project.

JOHN CRAIG VENTER (1946-)

J. Craig Venter began the race to sequence the human genome when he unexpectedly announced to a room full of genome researchers that they could just quit now, thank you, because his company would finish the job. People who like him say he never filters his thoughts and he shoots from the hip. Others have been less diplomatic, calling him an egomaniac, an idiot, and a shallow man.

John Craig Venter was born on October 14, 1946 in Salt Lake City, the youngest son of an excommunicated Mormon who drank too much, smoked too much, and died at 59. The family moved to a working class suburb south of San Francisco and lived in a house next to the train tracks. Venter enjoyed playing chicken with the trains and surfing the chilly waves in nearby Half Moon Bay.

In high school, Venter excelled in shop class. After graduating, he moved to Newport Beach to surf warmer waves, and then enlisted in the Navy during the Vietnam War. Detecting more intelligence in him than his high school record indicated, the Navy trained him as a medical corpsman and shipped him to the Da Nang hospital.

"I was there during the Tet offensive," he said. "I got introduced to medicine in probably the toughest way possible. I just got fascinated with the lack of knowledge we had and had a desire to do something more."

After finishing his tour - which included two stints in the brig for disobeying orders - Venter went to the University of California, San Diego to become a doctor. He was deflected from that path by a class with Gordon Sato and a project with Nate Kaplan. "I got so fascinated with science," he said, "I decided to heck with medical school."

Venter breezed through his undergraduate and graduate schooling in six years, worked at the State University of New York, Buffalo, and was recruited to the National Institutes of Health in 1984.

In the early 1990s, Venter developed the EST method of finding genes, and promoted it as cheaper and faster than the Human Genome Project that was just getting started. Project administrators disagreed, but in the meantime, the NIH decided to patent Venter's gene fragments. The Patent Office eventually rejected the patents, but the applications sparked an international controversy over patenting genes whose functions were still unknown. The Human Genome Project's director, James Watson, opposed patenting and quit. Venter left NIH to form his own non-profit institute, The Institute for Genomic Research (TIGR).

Venter continued EST work at TIGR, but also began thinking about sequencing entire genomes. Again, he came up with a quicker and faster method: whole genome shotgun sequencing. He applied for an NIH grant to use the method on Hemophilus influenzae, but started the project before the funding decision was returned. When the genome was nearly complete, NIH rejected his proposal saying the method would not work.

As he turned his focus to the human genome, Venter left TIGR and started the for-profit company Celera, a division of PE Biosystems, the company that makes the latest and greatest sequencing machines. Using these machines, and the world's largest civilian supercomputer, Venter finished assembling the human genome in just three years.

Venter lives with his wife - Claire Fraser, president and director of TIGR - outside Washington, D.C. where he keeps his tablesaws in the garage, safely away from his new Porsche. Venter relaxes by sailing his 80 foot yacht, The Sorcerer, across the Atlantic (www.tigr.org/journey/).

FRANCIS COLLINS (1950-)

If sequencing the human genome is the Holy Grail of biology, then Francis Collins is its King Arthur. Collins has overseen the mapping, the sequencing, and the funding of biology's first "big science" project as the Director of the National Human Genome Research Institute since 1993.

For someone so intimately connected with the hottest topic in biology, Francis Collins oddly had no interest in biology as he grew up on a farm in the Shenandoah Valley of Virginia. Both parents were involved in the arts - his father was a drama professor at Mary Baldwin College - and produced plays on the stage they built on the farm.

Collins's mind was elsewhere and frequently filled with numbers as he contemplated the infinite outcomes of dividing by zero. In high school, his mathematical interests turned to chemistry, but biology held no appeal. "There didn't seem to be any logic to it - all we did was dissect things and memorize body parts," he said in an interview with Arts and Sciences Magazine.

Collins entered the University of Virginia as an Echols Scholar after graduating from high school at 16. He played the guitar too much in the first year, but afterward became "one of those science nerds you would not enjoy being in class with." In 1970, he left Virginia with a degree in chemistry, and headed to Yale for graduate school.

There, Collins finally learned that biology could be logical when he was "blown away" by a course in molecular biology. Combined with a drive to do something more obviously meaningful than theoretical physics, Collins went to medical school at UNC-Chapel Hill after completing his doctorate, then returned to Yale for a post-doc in human genetics.

At Yale, Collins began working on ways to search the genome for genes that cause human disease. He continued this work, which he dubbed "positional cloning," after moving to the University of Michigan as a professor in 1984. Five years later, Collins had his first big success with the method when he pinpointed the gene that causes cystic fibrosis. He continues to search for disease genes at NIH, and he pastes a new sticker onto the back of his motorcycle helmet every time he finds one.

Factoid

Links


 

Links

National Human Genome Research Institute

The latest news from the National Human Genome Research Institute, the organization that directed the publicly funded sequencing effort.

Celera

The company that directed the commercial venture to sequence the human genome.

Homo sapiens Genome Viewer

This NCBI (National Center for Biotechnology Information) viewer has up-to-date information on the location of genes, markers, and BAC clones on the human genome.

Ethical, Legal, and Social Issues

3% of the Genome Project's budget is spent researching the ethical, legal, and social issues surrounding the Project. This site from the Department of Energy describes some of these issues, including gene testing, patenting, and behavioral genetics.

Online Mendelian Inheritance in Man

This database is a catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere.

Bibliography

  • Arts and Sciences: Magazine Summer 2000, Tracking the Man Behind the Map, http://www.virginia.edu/artsandsciences/alumni/magazine/summer00/tracking.htm

  • Trivedi, B.P., Sequencing the Genome, http://www.celera.com/celerascience/news/articles/06_00/sequence_primer.cfm#citations

  • Adams, M.D., Dubnick, M., et al., 1992, Sequence identification of 2,375 human brain genes, Nature, 355: 632-634.

  • Adams, M.D., Kelley, J.M., et al., 1991, Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project, Science, 252: 1651-1656.

  • Beardsley, T., 1998, Profile: Where Science and Religion Meet, Scientific American, February, 28-29.

  • Deloukas, P., Schuler, G.D., et al., 1998, A Physical Map of 30,000 Human Genes, Science, 282: 744-746

  • Fleischmann, R.D., Adams, M.D. et al., 1995, Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae Rd, Science, 269: 496-512.

  • Nash, J.M., 1994, Riding the DNA Trail: Francis Collins leads an international drive to track down all the genes and take their measure, Time, 143.

  • Papadopoulos, N., Nicolaides, N.C., et.al,, 1994, Mutation of a mutL Homolog in Hereditary Colon Cancer, Science, 263: 1625-1629.

  • Roberts, L., 1991, Gambling on a Shortcut to Genome Sequencing, Science, 252: 1618-1619.

  • Rowe, P.M., 1995, Patenting Genes: J. Craig and the Human Genome Project, Molecular Medicine Today, 1: 12-14.

  • Venter, J.C., Adams, M.D., et al., 1998, Shotgun Sequencing of the Human Genome, Science, 280: 1540-1542.

  • Venter, J.C., Smith, H.O., Hood, L., 1996, A New Strategy for Genome Sequencing, Nature, 381: 364-366.

  • Wicking, C. and Williamson, B., 1991, From linked marker to gene, Trends in Genetics, 7: 288-293.

  • Cooke, R., Man in Love with Magic of Genetics, Newsday, June 27, 2000, 15.

  • Preston, R., Profiles: The Genome Warrior, The New Yorker, June 12, 2000, 66-83.

  • Unger, M., Venter, Celera Built for Speed, The New York Times, June 27, 2000, 14.

  • Wade, N., Craig Venter: A Maverick Making Waves, The New York Times, June 27, 2000.

Glossary


Human Genome Project -
CDNA library -
Genetic map - (Also known as a linkage map) a chromosome map of a species that shows the position of its known genes and/or markers relative to each other, rather than as specific physical points on each chromosome.
Genetic marker - A segment of DNA with an identifiable physical location on a chromosome and whose inheritance can be followed. A marker can be a gene, or it can be some section of DNA with no known function. Because DNA segments that lie near each other on a chromosome tend to be inherited together, markers are often used as indirect ways of tracking the inheritance pattern of a gene that has not yet been identified, but whose approximate location is known.
Genome - All the DNA contained in an organism or a cell, which includes both the chromosomes within the nucleus and the DNA in mitochondria.
Human artificial chromosome (HAC) -
Physical map - A chromosome map of a species that shows the specific physical locations of its genes and/or markers on each chromosome. Physical maps are particularly important when searching for disease genes by positional cloning strategies and for DNA sequencing.
Polymerase chain reaction (PCR) -
Primer - A short oligonucleotide sequence used in a polymerase chain reaction.

Children resemble their parents.
Genes come in pairs.
Genes don't blend.
Some genes are dominant.
Genetic inheritance follows rules.
Genes are real things.
All cells arise from pre-existing cells.
Sex cells have one set of chromosomes; body cells have two.
Specialized chromosomes determine gender.
Chromosomes carry genes.
Genes get shuffled when chromosomes exchange pieces.
Evolution begins with the inheritance of gene variation.
Mendelian laws apply to human beings.
Mendelian genetics cannot fully explain human health and behavior.
DNA and proteins are the molecules of the cell nucleus.
One gene makes one protein.
A gene is made of DNA.
Bacteria and viruses have DNA too.
The DNA molecule is shaped like a twisted ladder.
A half DNA ladder is a template for copying the whole.
RNA is an intermediary between DNA and protein.
DNA words are three letters long.
A gene is a discrete sequence of DNA nucleotides.
The RNA message is sometimes edited.
Some viruses store genetic information in RNA.
RNA was the first genetic molecule.
Mutations are changes in genetic information.
Some types of mutations are automatically repaired.
A chromosome is a package for DNA.
Higher cells incorporate an ancient chromosome.
Some DNA does not encode protein.
Some DNA can jump.
Genes can be turned on and off.
Genes can be moved between species.
DNA responds to signals from outside the cell.
Different genes are active in different kinds of cells.
Master genes control basic body plans.
Development balances cell growth and death.
Living things share common genes.
DNA is only the starting point for understanding human biology.
adi_at_dnaftb