You may know that the cost to sequence a human genome is dropping, but you probably have no idea how fast that price is coming down. The National Human Genome Research Institute, part of the US National Institute of Health, has compiled extensive data on the costs of sequencing DNA over the past decade and used that information to create two truly jaw-dropping graphs. NHGRI's research shows that not only are sequencing costs plummeting, they are outstripping the exponential curves of Moore's Law. By a big margin. You have to see this information to really understand the changes that have occurred. Check out the original NHGRI graphs below. With costs falling so quickly we will soon be able to afford to produce a monumental flood of DNA data. The question is, will we know what to do with it once it arrives?
The Costs per Megabase (million base pairs) graph reflects the production costs of generating raw, unassembled sequence data. For The Costs per Genome, NHGRI considered a 3000 Mb genome (i.e. humans) with appropriate levels of redundancy necessary to assemble the long strain in its entirety. Both graphs show some amazing drop offs.
Keep in mind that these graphs use a logarithmic scale on the Y-axis so that steady decline in the beginning of each graph represents
accelerating exponential change in the field. From 2001 to 2007, first generation techniques (dideoxy chain termination or 'Sanger' sequencing) for sequencing DNA were already following an exponential curve. Starting around January of 2008, however, things go nuts. That's the time when, according to the NHGRI, a significant part of production switched to second generation techniques. Look how quickly costs plummet. We've discussed before how retail costs for genome sequencing are dropping thanks to efforts from companies like Complete Genomics and Illumina. What these graphs show is that those retail prices reflect more than the genius of these companies - it's a general crash of prices in the industry as a whole. Things are going to continue in this fashion. We've already seen newer sequencing technologies start to emerge, and there are institutions all over the world that are dedicated to pursuing genetic information in all its forms.
Just so you know, there's no sleight of hand here. NHGRI's explanation of their cost calculations reveal they did a very thorough job, especially when parsing what should be included in production costs and what shouldn't. You can see all the details on their site.
**UPDATE 3.6.11 To answer questions in the comments section, here are the values used to calculate genome costs:
"The following 'sequence coverage' values were used in calculating the cost per genome:
Sanger-based sequencing (average read length=500-600 bases): 6-fold coverage
454 sequencing (average read length=300-400 bases): 10-fold coverage
Illumina and SOLiD sequencing (average read length=50-100 bases): 30-fold coverage"
We have to keep in mind for these calculations that "...the 'Cost per Genome' graph was generated using the same underlying data as that used to generate the 'Cost per Megabase of DNA Sequence' graph; the former thus reflects an estimate of the cost of sequencing a human-sized genome rather than the actual costs for specific genome-sequencing projects." (Emphasis mine). We know that companies like Complete Genomics are offering genomes for significantly less than $30k (the estimate in the graph). Yet the general cost for sequencing 'a genome' is still averaging around $30k according to the NHGRI estimates. Companies that can beat this price in retail are 'ahead of the curve' so to speak (which is part of the reason we like them).**
One of the costs that NHGRI (rightfully) doesn't include in production is the massive amounts of research investments the industry needs to fund to make any of this DNA data worthwhile. With the falling costs of sequencing we will have enormous amounts of raw data, but still very little understanding of what it means. As Daniel MacArthur (lately of Genetic Futures, now of Wired) points out in his discussion, production is outstripping research. What does the 897324989th base pair of your genome do? If you don't know, why do you care if it reads A, C, T, or G? As we've seen with retail sequencing of single nucleotide polymorphisms (SNPs), giving a customer a look at parts of their DNA can be fun, but it's not particularly enlightening.
With falling genome prices we should be able to perform ever larger studies to correlate genes with medical histories. Already we've seen insurance companies and universities assemble large stores of genetic information in preparation for the days when such research will be financially possible. The stage is set for a great revolution in genetics fueled by plummeting sequencing prices. One day soon we should have an understanding of our genomes such that getting everyone sequenced will make medical sense. But that day hasn't arrived yet. As we look at these amazing graphs we should keep in mind that falling prices are simply the first step in generating the future of medicine that genetics has promised us. The best science is still ahead of us. Get ready for it.
[image credits: NHGRI at NIH]
[citation: Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Large-Scale Genome Sequencing Program Available at: http://www.genome.gov/sequencingcosts/. Accessed March 1, 2011]