Archive for November, 2007

The Myth of Anonymized Data

Friday, November 30th, 2007

Today’s CNET story, “AOL, Netflix and the end of open access to research data”, describes how two large so-called “anonymized” databases have been re-identified, compromising the privacy of everyone in them. This provides yet another example of why “anonymized” data is a myth — and reinforces the need to avoid the release of large datasets of medical records, even if they are supposedly “de-identified.”

The first incident described involves the release of 500,000 people’s movie ratings by Netflix in 2006. To protect the privacy of their subscribers, Netflix carefully removed all personal information. They offered $1 million to anyone who could develop an algorithm that would improve their movie recommendation system — a worthy goal. However, this week researchers announced that they successfully re-identified the data using publicly available information.

A similar scenario occurred when AOL publicly released “de-identified” search data for 500,000 of its users. Some were re-identified within days.

The lesson in this is simple: THERE IS NO SUCH THING AS ANONYMIZED DATA. To some extent, it can always be re-identified. For those who are interested in more details, computer scientist Dr. Latanya Sweeney’s Data Privacy Lab at Carnegie-Mellon has been studying this issue for years and developing the theory needed to understand it.

So what are the implications for medical data? As previously described in this space (Protecting Privacy While Searching Health Record Banks), each person’s complete health records need to be stored in a central location with all access under the control of that individual (or whomever they designate). To provide the tremendous research benefits available from searching this data, queries should be submitted to health record banks, but NO DATA SHOULD EVER BE RELEASED. Instead, the result of a query would be a count of the number of matches and a carefully controlled demographic summary. In this way, re-identification is prevented since no actual data is available. This allows all of us to have the fruits of medical research WITHOUT having to give up our privacy.

Let’s hope Netflix and AOL have learned their lesson and that other organizations — especially health care institutions — are paying close attention.

First Quantitative Study of Health Information Infrastructure Workforce

Saturday, November 17th, 2007

One of the key unanswered questions about health information infrastructure over the past several years has been, “Do we have enough trained people to build it?” Over the past year, I’ve been privileged to have the opportunity to serve as the principal investigator of a research project sponsored by the U.S. Department of Health and Human Services (Office of the Assistant Secretary for Planning and Evaluation) to begin to address this question. This work represents the first attempt to quantify the workforce requirements for building the health information infrastructure in the U.S. A presentation summarizing the final results was given to the American Health Information Community (AHIC) Electronic Health Record work group in late September, and the complete final report has recently been posted. Here is the Executive Summary:

Nationwide Health Information Network (NHIN) Workforce Study

Executive Summary

For the past several years, the nation has been working to improve health care through the widespread implementation of electronic health records. One clear prerequisite for accomplishing this goal is the availability of a trained workforce to implement the developing Nationwide Health Information Network (NHIN). While it is generally acknowledged that the nation does not have a sufficient number of trained specialists for this purpose, no prior studies have produced any quantitative estimates of the workforce requirements. Accordingly, the current research was designed to further our understanding of NHIN workforce issues by collecting, assessing, and analyzing existing knowledge and data in this domain with the objective of producing an initial estimate of the number of people needed.

This study gathered information through a series of four focus groups, five site visits, and direct communications with health information technology (HIT) vendors. The anticipated NHIN work was divided into three separate categories of activities for the purpose of assessing workforce:

  • 1) electronic health records (EHRs) in physician offices
  • 2) EHRs in hospitals and other health care institutions; and
  • 3) the health information infrastructure (HII) required in communities to link the various sources of records so that each patient’s complete electronic record could be available.
  • Assuming a 5-year time frame for NHIN implementation, results indicated that 7,600 (+/- 3,700) specialists are needed for installation of EHRs for the approximately 400,000 practicing physicians who do not have them already. For the hospitals needing EHRs (about 4,000), approximately 28,600 specialists are needed. Finally, about 420 people are needed to build the HII systems in communities to interconnect all these other systems. These data represent the first ever quantitative estimates of the workforce needed to implement the NHIN.

    These estimates should be considered preliminary and imprecise as they are based on a very small number of reports: eight for physician EHRs, four for hospitals (no data were available for other types of health care institutions), and two for communities. Furthermore, since all reported data was retrospective, the various estimates are based on information collected inconsistently at different times and under varying circumstances. Insufficient information was available to be able to characterize meaningfully the different types of personnel needed, although at least 15 different job titles were identified and defined. There was also inadequate information to allow workforce estimates for different architectures for the three major activities, despite general agreement from the expert panels that differences in architecture may have a significant impact on the personnel needs. Similarly, there was not enough data to assess or categorize the impact of size of practice or institution on workforce. However, there were some indications that the personnel requirements per physician are higher for smaller physician offices (three physicians or less). Also, the workforce data relates only to installation of systems; ongoing support and maintenance were specifically excluded. Finally, it is notable that there is no available data about the current number of specialists working in the three areas, so it is not clear whether these estimates indicate a shortage of personnel.

    Further research is needed to confirm and refine these estimates, as well as overcome the limitations of these results. Nevertheless, these first-ever quantitative estimates of the workforce needed for NHIN implementation will inform such additional studies, lead to an improved understanding of this important domain, and ultimately help ensure that adequate numbers of personnel are available for this critical work.