DNA sequencing is a process for determining the precise order of letters (nucleotides – A, C, G, and T) strung together in a DNA molecule. In the 1990s, the thought of sequencing the whole human genome presented a tremendous scientific challenge: The human genome contains more than 3 billion letters, and DNA sequencing methods at that time were slow and expensive. It eventually took more than a decade, a billion dollars, and international scientists in 20 large sequencing centers around the world to complete the first human genome sequence in 2003.
One key to the phenomenal success of the Human Genome Project was the ongoing technological innovation. Scientists and engineers developed powerful computational software to assemble data that could keep up with the new DNA-sequencing technologies. Today, faster and less costly DNA-sequencing is used as a powerful tool for diagnosing diseases – and these tools continue to improve. Next-generation sequencing machines can now sequence an entire human genome in a few days, and this capability has inspired a flood of new projects aimed at sequencing the genomes of thousands of individual humans and a broad range of animal and plant species. Soon, having your whole genome sequenced will cost $1,000 or less – no more than many other common medical tests.
Given today’s increasingly rapid accumulation of DNA sequencing data, how will we manage to interpret and store the extraordinary amount of data being produced? Although sequencing an individual human genome may take just a few days, it may take researchers weeks and thousands of dollars to examine the mountains of data from a single human genome. In addition, genomic data must be combined with information from a patient’s medical records. Biologists, computer scientists, and statisticians are working tirelessly to make DNA data management, storage, and analysis more efficient.
Book image courtesy of Darryl Leja, NHGRI