Archive for December, 2006

Protecting Privacy While Searching Health Record Banks

Sunday, December 10th, 2006

The Value of Health Record Bank Information

Searching electronic health information in health record banks could be incredibly valuable for medical research and public health. Imagine what we might discover if we could rapidly and easily examine the medical records of many thousands of patients across the nation with a specific type of heart disease or cancer to determine which therapies are most effective! Today, such studies take years and cost millions of dollars, while only including relatively small numbers of subjects.

Health record bank information could also be invaluable to protect public health. For example, in the anthrax attacks in Fall 2001, there were seven cases of skin anthrax in the New York City area in the two weeks BEFORE the “first” case was detected in Florida (see Lipton E, Johnson K: The Anthrax Trail: Tracking Bioterror’s Tangled Course. New York Times, Section A, p. 1, 12/26/2001). Monitoring for such unusual events in an electronic health record bank could have found those earlier cases, raising the alarm sooner and allowing lives (and money) to be saved.

While the benefits of such searching are clear, all of us have a legitimate and realistic fear that such activities could seriously compromise the privacy of our sensitive medical information. So is it somehow possible for all of us to benefit from the knowledge that could be extracted from health record banks without having to compromise the privacy of our personal medical information? The answer is “yes” – and in this posting I will describe one approach to accomplishing this.

How Health Record Bank Searching Could Work

Imagine a system of health record banks across the country, with each person having their complete electronic health records stored in the bank of their choice. You control all access to your records, and have given permission for your information to be used for research and public health – as long as your information is not released as part of that use. How would a medical researcher utilize this data?

A query to the health record banks would look something like this: “How many patients are between age 45 and 54, more than 20% above their ideal weight, have ever had an abnormally high blood sugar, and had a blood pressure reading more than 10% above normal in the last 90 days?” This would be sent to all the health record banks (through a coordinating entity) and each bank would produce two results: 1) a count of the number of patients matching those characteristics; 2) some demographic data about those patients (e.g. percentage male/female). The results from all the health record banks would be combined by the coordinating entity and delivered to the researcher.

Note that in this process no one’s individual information has been released. Small alterations would be made in the counts and demographic outputs to be sure that no individual could be indirectly identified with subsequent queries (e.g. two queries with a count differing by “one”). This latter procedure, known as statistical disclosure control, is already done very effectively with data from the U.S. Census Bureau for the same reason.

Recruiting Volunteers for Clinical Trials

If the researcher was trying to recruit volunteers for a clinical trial, a message could be delivered to the patients that match the desired characteristics. The message would explain the clinical trial, the advantages and disadvantages of participation, and provide information about how to contact the researcher. Any further inquiries would be up to the patient, and there would be no obligation to respond to such a message. Note that the researchers would not know to whom their message was sent – they would only have an approximate count of the number of recipients.

If, after the first query, the researcher wanted to know more about this particular patient population (such as what medications they are taking), subsequent queries with additional “matching elements” could be submitted.

Why This Approach Protects Privacy

This methodology allows researchers to get the information needed for studies of various types without the need to release any medical information about individuals. It also eliminates the problem inherent in releasing so-called “de-identified” subsets of data – which is that often such data can be “re-identified” by linking it to other datasets (see L. Sweeney. k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10 (5), 2002; 557-570 PDF). The risk of such re-identification is never zero – while it can be low, there is always some risk. The system described here avoids even that small risk.

Sharing the Benefits with the Owners of the Information

Finally, I believe that the value of the data should be shared with the patients who own it (i.e. you). Those who wish to submit queries should pay fees to do so, and patients who allow their data to be searched in this way should receive the majority of the revenue generated from those searches. In this way, your “deposits” of medical information in your health record bank account can earn “interest.” This is similar to the way grocery store chains compensate you with price discounts for sharing your purchasing information via “affinity cards.”

Of course, participation in such searching should be voluntary, and no one should be forced to allow their data to be used this way without their consent.


By allowing searching with patient consent while limiting the results of such searches to counts and basic demographic information, privacy can be protected. Patients would also receive fair compensation for the value of their information through sharing of the revenue from search fees. In this way, all of us can simultaneously retain the privacy of our sensitive medical information while we collectively enjoy the benefits from knowledge gained through population-based analysis.