Leveraging the Crowd to Understand Your Genome
Earlier this week Peter Aldhous of NewScientist magazine recounted an unusual experience with DTC genomics provider Decode Genetics. In reviewing his genetic data on the deCODEme website, Aldhous uncovered what appeared to be significant and bizarre errors in his mitochondrial DNA. Aldhous turned to Blaine Bettinger, The Genetic Genealogist, for help in diagnosing the problem with his mitochondrial DNA. Bettinger’s response: “This is a strange question, but are you sure this is Homo sapiens?”
Aldous, Bettinger and Decode investigated the problem and ultimately determined that the “errors” in the mitochondrial DNA were actually being introduced by a bug in the deCODEme software interface that allows users to browse their data. (Aldhous carefully points out that the software glitch was a rare one and that it did not seem to affect deCODEme’s disease-risk summaries or analysis.)
More than a simple software error, Aldhous’s experience highlights the complexity inherent in consumer genomes. Translating an individual’s saliva sample into a description of genetically influenced traits and risks is a multi-stage process with potential for error at every step in the chain. Or, as Daniel MacArthur of Genetic Future cleverly puts it, “There’s many a slip ‘twixt spit and SNP.”
Know thyself, but how?
Both Aldhous and MacArthur recognize the larger significance and offer sensible guidance to a problem that seems certain to become more prevalent as the number of personal genomics customers and patients increases. Aldhous argues that “meticulous bug-checking will be needed to ensure that health IT delivers on its promise of improving clinical decision-making and reducing human errors.” Error-free health IT is surely a quixotic goal, but Aldhous is quick to recognize that even if it were to be attained, it would be insufficient. Even a genome perfectly sequenced and displayed, without errors introduced by human or computer, is rife with errors of another sort—genetic mutations in the form of single nucleotide polymorphisms (SNPS), as well as other copying and structural changes—the majority of which have no clinical significance whatsoever. What Zac Kohane of Harvard Medical School has termed “the incidentalome.”
When errors abound, and from so many possible sources, what is the individual to do? MacArthur suggests a genomic adaptation of the ancient Greek aphorism “Know Thyself”:
…rather than being a passive recipient of genetic forecasts, dig into your data and see if it makes sense, and keep asking questions until it does. In addition to making it more likely that you’ll pick up any errors in your results, you’ll also develop a much deeper understanding both of the nature of genetics and of your own genome.
While progress in genomic research and the falling cost of genomic sequencing daily bring raw genomic data within reach of increasing numbers of individuals, comparatively few have the time, inclination or ability to dig as deeply into their own genomes as MacArthur suggests—to know themselves by themselves.
The logical, and traditional, source of guidance in this area remains medical professionals. But with a well-recognized deficiency in genetic understanding by many general practitioners, and a paucity of specially trained genetic counselors, DTC genomics companies (including deCODE) have emerged to provide consumers access to and interpretation of their genomic data.
Aldhous’s experience—when his own genetic data threw him an apparent curveball, he consulted a genetic genealogist-lawyer-blogger and bioinformatics experts, and not his physician—demonstrates that these traditional sources of guidance may come to represent a complementary part in a much larger and varied infrastructure available to individuals seeking personal genomic understanding.
A Future of Shared Genomes?
Beyond physicians and genetic counselors, and even beyond the emerging DTC market with its proprietary bioinformatics (that led Peter Aldhous to briefly wonder if he was “the product of some twisted genetic experiment”), there is the crowd.
I wrote last week that the primary difference between the concepts of crowd-sourcing and open-source in genomic research has to do with data availability:
“Crowd-sourcing” refers to using a large, often varied or undefined group or population to undertake a defined task. In the case of genomic research, this might entail using web-driven or other distributed modes of interaction to identify research populations, recruit participants and, ultimately, collect the data necessary to produce meaningful scientific research.
“Open-source,” although similar, means something different. It refers to the public accessibility of data, traditionally the source code for a particular piece of software. In genomics, this means public access to research data—whether collected through crowd-sourcing or other means. Those data can then be used by individuals, by companies or by scientists for whatever purposes they desire.
One of those purposes? Error-checking. Whether its verifying that you’re a member of the species Homo sapiens or attempting to determine the significance of a particular SNP, opening up your genome to public analysis is a novel but potentially powerful way to better know your genomic self.
It’s what Peter Aldhous did when he took his genetic data and handed it to the Genetic Genealogist. It’s an idea that personal genomics companies (if not yet a majority of their customers) have begun to embrace in the form of genomic data-sharing features, including the development of Illumina’s highly-publicized iPhone app. In a much more expansive way it is one of the motivating principles and identifying features of the Personal Genome Project.
It’s difficult to say whether open-source genomics resources, especially for the interpretation and validation of individual genomic data, will develop and, if so, exactly what shape they will take. Still, some early clues are arriving. The Wikipedia-styled SNPedia, a free, publicly editable database of SNPs and their effects, provides genomic data generated by the (also free) interpretive tool Promethease from 34 public genomes, including the Experimental Man and the PGP-10. The Personal Genome Project, which is enrolling its next 100 participants and has over 15,000 potential participants in its enrollment pipeline, is also developing an open-access interpretive tool, Traitomatic (pdf), that will be used to analyze the genomic and phenotypic data supplied by its participants. (The raw data itself is made available using the Creative Commons CC0 universal waiver.)
From unraveling bioinformatics errors, as Aldhous did, to adjusting medications, to uncovering unknown genetic variants, the upside of utilizing an open-access approach to personalized genomic interpretation is the ability to allow an untold number of eyes to comb over your data in search of something important (or perhaps just interesting). It seems highly improbable that any combination of DTC genomics companies and open-source genomics resources will ever completely supplant a one-on-one consultation with a trained medical professional, particularly where clinical genetic guidance is required. And concerns over privacy and misuse of data may inhibit many from sharing their own genomic data, at least at present. But there appears to be a significant role for open-source genomics resources to play in the continuing expansion and democratization of personal genomic inquiry.
Comments
9 Responses to “Leveraging the Crowd to Understand Your Genome”Trackbacks
Check out what others are saying about this post...-
[...] The article was mentioned by GenomeWeb – “You Are Human, Right?” – and there are two extremely good blog posts about the article and the situation by Daniel MacArthur at Genetic Future – “There’s many a slip ‘twixt spit and SNP: errors in personal genomics data” and Dan Vorhaus at Genomics Law Report – “Leveraging the Crowd to Understand Your Genome.” [...]
-
[...] is always nice to have a bit of genomics in Scientia; there is a very interesting article over at Genomics Law Report. Dan Vorhause, inspired by an odd error by a personal genomics company, [...]





Dang you’re just a few hours fast! SNPedia now includes all of the Trait-o-matic profiles.
http://www.snpedia.com/files/promethease/outputs/traitomatic-ns-8.html
http://www.snpedia.com/files/promethease/outputs/traitomatic-ns-12.html
Your readers may also be interested in
http://www.snpedia.com/files/promethease/outputs/cloud.html
I’ll be presenting this at just a few days at
http://askja.gene.le.ac.uk/hgv2009/
Dan,
People don’t know what high blood pressure is, nor do they understand what the bad cholesterol means. How in the heck do we expect the average person to \dig deep\ into their genomic data. It is with pure lack of insight that one would think of such a thing occurring in the next 10 years. Most people will blindly trust the interpreter’s interpretation.
Which puts more onus on the interpreter to take legal responsibility for their work, rather than try to escape it with SB482……
-steve
http://www.thegenesherpa.blogspot.com
I agree with you Steve, as I indicated in the post:
But the interpretations of traditional interpreters (medical professionals and, more recently, DTC) have their own limitations. You’ve pointed out the limitations of the interpretations provided by DTC genomic service providers many times, and I think that most would agree that many medical professionals suffer from limitations of their own including, at least at the moment, a dearth of qualified individuals.
So I’m not exactly sure what your objection is. If most people will “blindly trust” whatever interpretation they receive then is it wrong to seek out or attempt to develop additional interpretive resources?
If I make my data available through SNPedia or the PGP, the odds are good that most people that could view that data won’t be able to offer me any real insight. But some qualified people may take a look and offer valuable interpretive guidance, and in unexpected or novel ways, which I think is the real point here.
Thanks.
- Dan
Dan,
I suspect that most people will have neither the time nor the inclination to dig deep into their genomic data, unless they are compelled to by results that suggests something is seriously wrong. However, you are right that the DTC companies and the scientific establishment have barely scratched the surface of interpreting all of the data available for an individual. Thus we have a huge gap right now between the availability of raw and mostly meaningless data and what it means for me and you. This gap only gets bigger as technology plunges ahead to create ever more raw information and our means to interpret it and understand it moves slowly and in fits and starts.
This is why I believe a comprehensive, well-funded is necessary to systematically and speedily make sense of all of this information — a Personalized Health Project along the lines of the Human Genome Project in terms of funding and focus.
As the Experimental Man I have over 6,000 traits annotated on SNPedia’s Promethease program — which I have downloaded onto my iPhone for easy reference. But I have very little idea what this confusing and contradictory hodge-podge of self-information is telling me. Therefore, I largely ignore it, with a big shrug of my shoulders.
– David
David -
I fully agree that one of the most pressing needs is not necessarily more data but the ability to make sense of already available data. However, somewhat paradoxically, it seems to me that one of the crucial elements needed to make sense of the information we already have is…more information. Understanding the significance of all of your SNPs – including more accurate estimations of their phenotypic effect, if any – will require further study.
There are a number of efforts, both private and public, that are aimed at improving interpretation and understanding of the genomic information we already have, including some (PGP, 1000 Genomes, etc.) that are doing so by generating more data. But you seem to be talking about something on a grander scale, a la the Human Genome Project.
The only near-term hope for a project like that, at least to my knowledge, would be a revival of the Genomics and Personalized Medicine Act. The funding proposed for that bill in prior drafts is substantial (although not quite at the level of the HGP) and, at least in the most recently introduced version (2008 in the House: http://www.govtrack.us/congress/billtext.xpd?bill=h110-6498), there is plenty of flexibility to tackle what you are proposing.
Although the appointment of Francis Collins to head the NIH is certainly a positive sign, unfortunately, while issues such as “health care” and “the economy” continue to dominate the attention of Congress, I’m doubtful that the GPMA (or any similarly-sized Federal project) has much of an immediate shot of passage.
Dan,
Totally agree. We need more resources comparing this data. Ideally in a controlled and scientifically valid environment. Crowd sourcing needs to be run by someone who knows what they are doing and who has scientific gravitas. The problem with Patients Like Me is that the data is “suspect” a lot of the time. Professional study and comparative genomics design are needed here. Which is why Francis would do well to bring Kari back from Hell to help him run an Iceland here……In the end though, we need more Sherpas……
-Steve
http://www.thegenesherpa.blogspot.com
Steve: Fully agree that we need more sherpas. The question is, can only doctors be sherpas?
No. It is not the realm of just doctors. But the road to be a Sherpa requires acceptance of the importance of your roll and acceptance of third party verification of what you are doing,
I wouldn’t climb with a Sherpa who has only “read” about the climb. Would you?
-Steve
I meant role….