New Fuel for the Genomic Privacy Debate
The growth of prominent genomics research and direct-to-consumer (DTC) commercial services that combine genomic data with phenotypic data, environmental data and personal health surveys continues to spur debate over the appropriate privacy safeguards and expectations for individuals who participate in such research or enroll in such services. From large-scale genomic research projects such as the Coriell Personalized Medicine Collaborative (pdf) and the UK BioBank to popular DTC genomics services such as those offered by Navigenics, many influential players in the public genomics space continue to strongly emphasize their commitment to absolute data privacy. Prominent skeptics, including geneticist George Church and lawyer and ethicist Hank Greely, argue that any such privacy promise is impossible to keep because of the inherent nature of such genomic data, particularly when paired with phenotypic data or other potentially personally identifying information.
Two recent developments may add further fuel to this debate. First, California recently issued a report on the first five months of results from a new state law (effective January 1, 2009) requiring health care organizations in California to report breaches in the security of personally identifiable health information. In publishing the report the California Department of Public Health was surprised at the high volume of reports and confirmed 116 privacy breaches during the five-month period, most of which were inadvertent. Given the early results, the agency expects the number of reported breaches to increase dramatically as organizations become more familiar with their reporting obligations.
Second, two Carnegie Mellon researchers recently published research that calls into question the commonly accepted practice of using social security numbers as a presumably anonymous identifier of user data sets. Previously, much of the concern surrounding this practice centered around ensuring that careless data security practices did not result in the misappropriation of social security numbers. But the Carnegie Mellon report identified a new concern: patterns in the assignment of social security numbers that permitted the researchers to use publicly available information to accurately predict the social security numbers of 8.5 percent of people born after 1988 in no more than 1000 guesses, a feat easily within the capability of many automated data intrusion programs.
These recent developments illustrate the difficult and multi-faceted nature of preserving privacy in an increasingly wired world, in both genomics and in the much broader context of medical and other personal information. While the most common source of problems continues to be human error — the failure to carefully and systematically follow practices and procedures that promote security in handling data — there is an underlying and growing threat to data privacy that is dependent not on human error, but on the very nature of the data itself. In the case of human genomics, the declining cost of genomic sequencing and the rising demand for large-scale databases combining genomic and phenotypic information is driving a proliferation of potentially sensitive data. With at least 19 states providing specific penalties for genetic privacy violations, governments, academics and corporations alike would be well advised to take a critical look at their privacy policies to ensure that they can and do live up to their promises.













