Today’s CNET story, “AOL, Netflix and the end of open access to research data”, describes how two large so-called “anonymized” databases have been re-identified, compromising the privacy of everyone in them. This provides yet another example of why “anonymized” data is a myth — and reinforces the need to avoid the release of large datasets of medical records, even if they are supposedly “de-identified.”
The first incident described involves the release of 500,000 people’s movie ratings by Netflix in 2006. To protect the privacy of their subscribers, Netflix carefully removed all personal information. They offered $1 million to anyone who could develop an algorithm that would improve their movie recommendation system — a worthy goal. However, this week researchers announced that they successfully re-identified the data using publicly available information.
A similar scenario occurred when AOL publicly released “de-identified” search data for 500,000 of its users. Some were re-identified within days.
The lesson in this is simple: THERE IS NO SUCH THING AS ANONYMIZED DATA. To some extent, it can always be re-identified. For those who are interested in more details, computer scientist Dr. Latanya Sweeney’s Data Privacy Lab at Carnegie-Mellon has been studying this issue for years and developing the theory needed to understand it.
So what are the implications for medical data? As previously described in this space (Protecting Privacy While Searching Health Record Banks), each person’s complete health records need to be stored in a central location with all access under the control of that individual (or whomever they designate). To provide the tremendous research benefits available from searching this data, queries should be submitted to health record banks, but NO DATA SHOULD EVER BE RELEASED. Instead, the result of a query would be a count of the number of matches and a carefully controlled demographic summary. In this way, re-identification is prevented since no actual data is available. This allows all of us to have the fruits of medical research WITHOUT having to give up our privacy.
Let’s hope Netflix and AOL have learned their lesson and that other organizations — especially health care institutions — are paying close attention.