Preventing Large-scale Losses of Medical Data

May 15th, 2016

Has your personal information been stolen by hackers? Mine has – at least twice. My records were part of the 80 million record Anthem breach in 2015. My information was also included in the 22 million record breach at the Office of Personnel Management. While I appreciate having my online identity monitored and protected by two separate services as a result of these events, wouldn’t it be better if there were a way to prevent them?

Good news – there is a way! In a paper published last month in the Journal of Biomedical Informatics, I describe the new personal grid architecture that prevents large-scale data losses like these. Here’s how it works: Instead of storing all the records in a single database, each person’s record is stored in a separate file, separately encrypted, with its own complex password. In this way, even if a hacker were somehow able to take all the files, it would be necessary to break through strong encryption (a huge effort!) to get access to just a single record. That same difficult process would be needed to access each and every additional record. The incentive for the hacker is thus removed as the work needed to gain access far outweighs the potential value of each record.

So – problem solved … almost. As with so many things, there is no “free lunch.” While organizing data this way does eliminate the potential for loss of the entire dataset all at once, it creates a new problem: slow searching. Much of the value of medical information comes from searching across many patients – for example, to find patients who need a flu shot or are eligible to participate in a particular clinical trial. Regular (relational) databases have indexes that store “pre-searched” information so that, much like the index of a book, finding a particular item just requires a quick lookup in the index (or multiple indexes). But a personal grid cannot have such indexes because they can be used to reconstruct most (if not all) of the original data for all the patients – thereby creating the same vulnerability to loss of all the data that was eliminated by separately storing each record.

Without indexes, searching involves the slow step-by-step process of retrieving each record, decrypting it, then determining if whatever we are searching for is present. This type of search is called sequential, and much of the work in the field of computer science is devoted to finding ways to avoid this (such as by using relational databases) because it is very slow, even with fast computers.

While the problem of slow searching in the personal grid can’t be totally eliminated, it can be greatly reduced so that this new architecture is feasible for everyday use. Instead of searching sequentially with one computer, we can divide this task among a large number of machines working in parallel. For example, 1,000 machines will complete the search 1,000 times faster, which is fast enough to allow searches of multimillion record personal grid databases in about an hour — fast enough for any purpose when searching across medical records of a population.

Where can we get 1,000 servers that can be used for this? It turns out this is relatively easy and inexpensive in today’s cloud computing environments. Cloud computing services are specifically designed to be able to allocate large numbers of servers on short notice to a computationally difficult problem. So the availability of cloud computing makes the personal grid architecture, and the high level of security it provides, practical and feasible today.

Who’s using the personal grid? No one at the moment, since the idea is new. I’m hoping that we’ll soon see some implementations, which seems likely given the huge costs involved in mitigating large-scale medical information breaches. Senior healthcare executives responsible for protecting our information should be very interested in this new ultra-secure approach to information architecture that removes the hugely expensive potential risk of total data loss.

Finally, I want to mention that the personal grid addresses one of the key objections people have to centralized repositories of medical records – namely, that a hacker might break in and take all the data at once. Readers of this blog know that I’m a long-time advocate for such repositories in communities (known as health record banks), with the records controlled by patients. With the personal grid architecture, worries about losing all the records to a hacker are eliminated. This means that health record banks — community-based, patient-controlled repositories of health records – not only can overcome the obstacles of privacy, stakeholder cooperation, and financial sustainability, but also provide the security needed so that all of us can be confident that our information is truly protected from unauthorized use.

The five minute narrated slideshow posted below has more details.

Lessons Learned from a Health Record Bank Startup

March 5th, 2014

In 2010, I worked with a wonderful group of people to start a health record bank in Phoenix. While the effort was not successful, primarily due to undercapitalization, much was learned from the experience. The details of the effort and the lessons from it have now been published. For anyone thinking about starting a health record bank, this will be very interesting reading.

Owning Our Own Personal Information

June 10th, 2013

A recent article (“Fixing the Digital Economy”) by Jaron Lanier in the New York Times describes how we can fix our current online privacy problems by owning and controlling our personal information. I agree and am very pleased to see this idea being advocated.

As we move into the information age, it is very clear that information truly has value. This is particularly true of our personal information – what we buy, where we go, who we communicate with (via phone, email, text, etc.), and which web sites we browse, not to mention our medical and financial records. At present, we’re in what might be termed the “Wild West” of personal information, where whoever has our data can (in general) use it for whatever they wish. But unlike the cattle rustlers of old who deprived ranchers of purloined cattle, our appropriated personal information is only copied. Because of this, perhaps we haven’t been sensitive enough to the problem.

When I ask people about health information privacy, the concerns seem to fall into two categories. First, there are those who just don’t want their information released because of embarrassment or possible negative consequences for their employability. But, in my experience, an even larger group is concerned because they know their information is being aggregated and sold with no compensation being returned. This seems fundamentally unfair and unreasonable – much like financial banks loaning our collective deposits and returning no interest. This sense of injustice is widespread and permeates discussions of personal privacy.

In this information age, it seems to me that we should own our personal information – just like any other property. Others should only be allowed to use it with our permission – and only for the purpose and time period we specify. If the information is being used to benefit us, then we’d likely approve and perhaps accept the benefit alone as compensation. But if it’s being used to enrich others, then it seems reasonable that some of the financial returns should accrue back to us.

This is not a new concept. In a 1996 article entitled “Markets and Privacy,” Laudon proposed that personal information should be the property of each person. Furthermore, he presented persuasive arguments that doing this would actually increase economic activity and create huge new personal information markets. The oft-cited concerns that imposing consent requirements for the use of personal information would diminish the value of the information aggregator franchises (e.g., Google, Facebook) are overblown and largely incorrect. While it would require substantial changes, it would actually be a huge net benefit for them. Lanier makes this same point in his New York Times article.

Finally, personal ownership and control of medical records is essential to the development of a feasible and sustainable health information infrastructure using health record banks, as has been described in this space before. This is an urgent and important problem for the nation.

So what do you think? Do you think you should be able to control the use of your personal information? If so, what can/should we do to promote this?

Now is the Time for Health Record Banks

June 6th, 2013

In a news report of a recent interview, I describe the current failure of health information exchanges (HIEs) as well as the potential to transform these efforts into a successful health information infrastructure with health record banks. I’ll be giving a presentation detailing my vision for health record banks at the upcoming Digital Healthcare Conference on June 10th in Madison, WI.

The Myth of Distributed Healthcare Queries (or Big Data Gone Bad)

May 21st, 2013

Every day we’re hearing more and more about the exciting new discoveries that are possible with “big data,” including in healthcare. Indeed, a number of activities have been organized to “federate” or connect multiple healthcare databases to facilitate addressing critical research and policy questions: Query Health (Office of the National Coordinator at HHS), Harvard’s i2b2 project, and the FDA Sentinel System (New Eng J Med 364:498-9, 2011). These systems all send healthcare queries to multiple databases and then aggregate the resultant counts into an overall result.

Unfortunately, the results of such queries are PROVABLY INCORRECT. This type of distributed database architecture violates a basic computer science principle: Distributed databases only produce correct query results if the data in each node is independent (Weber G: Federated queries of clinical data repositories: the sum of the parts does not equal the whole. J Am Med Informatics Assn 2013). Of course this sounds like (and is) technical jargon, so let me decode and explain it.

The easiest way to understand this concept is with a simple example. Let’s say you want to know how many patients have both diabetes and high blood pressure. You send a query to multiple databases saying “Tell me how many patients you have with both diabetes and high blood pressure.” The problem is that you don’t know whether or to what extent data about the same patient appears in multiple databases. Each database only reports the count of the number of patients that satisfy the query, but there’s no identification information included. So if John Brown, who has both diabetes and high blood pressure, has been seen in two different institutions with databases, then John Brown will be counted twice. If Mary Jones, who also has diabetes and high blood pressure, has been seen by two different institutions, but one only recorded the fact that she has diabetes and the other that she had high blood pressure, she won’t be counted at all (even though she should be).

So, as you can see, this method of querying multiple databases and adding up the counts results in both over- and under-counting – i.e., INCORRECT results. And the errors can be quite substantial, with large and unpredictable mistakes. This is because most patients receive their medical care in multiple places, leaving data at each. The medical records in one location may be complete (possibly leading to over-counting) or incomplete (possibly leading to under-counting), with no way to know in advance whether these two types of errors will balance or not for any specific query.

So how does the concept of “independence” fit in? In this case, one database is independent of another if all the data for each patient is in one and only one database. In other words, no patient has data in multiple places, so the data in each database is “independent” of all the others. In such a case, where you know that all of each patient’s data is in one and only one database, a query to multiple databases will produce correct results. This is because the decision about whether a given patient meets the conditions of the query is made based on complete data, and each patient’s data is only considered once.

Does this mean that all these systems mentioned above are useless and should be discarded? Not necessarily. The results of queries to these distributed (but not independent) systems may still occasionally provide some helpful insights into medical phenomena. In a few cases, the possibility of over- and under-counting may not be critically important. For example, if we are looking for events that should not be occurring at all (like administering penicillin to patients who are allergic to it), any such events that are found are significant. But we must always realize that the quantitative query results are not accurate (or necessarily even close). Otherwise, we may draw potentially dangerous conclusions, such as finding a disease outbreak when none exists, or misallocating resources based on artificially high or low estimates of the number of people affected by a given disease – in other words, “big data” gone bad.

Finally, how can we avoid these problems? The best approach to avoiding incorrect query results from distributed medical information systems is to compile a comprehensive copy of all the records for each patient from all sources in a single place (but not necessarily the same place for everyone). The institutions that do this are known as health record banks, patient-controlled repositories of electronic health records. This totally avoids the potential for erroneous counts in response to distributed queries. With health record banks, we could actually have an effective and efficient system for aggregating patient information for research, policy, and public health.

In addition, health record banks provide many other benefits, most importantly to patient care. The availability of comprehensive electronic patient information when and where needed can both improve care and reduce costs. With health record banks, such information is available and ready to be retrieved by providers with a single query. Leaving patient information where it’s created and putting it together when needed is both inefficient and prone to error (Lapsia V, Lamb K, and Yasnoff WA. Where should electronic records for patients be stored? Int J Med Informatics 81(12):821-7, 2012).

So why are we continuing to invest in a “federated” architecture for health information infrastructure that doesn’t work? It’s time for the health IT community to shift their efforts to building health record banks – for both patient care and research.

The Future of Health Information Infrastructure

April 22nd, 2013

In a recent interview with David Beyer of Patients Know Best, I discuss the future of health information infrastructure. In it, I describe the overall goal of HII (universal availability of comprehensive electronic patient information when and where needed) and the two key tasks needed to accomplish this goal (universal EHR adoption and a mechanism for aggregating each patient’s records). I explain how and why the current approach using “health information exchange” (HIE) that leaves records with the providers that created them is not working. Finally, I describe in some detail how the use of health record banks, community-based repositories of electronic patient records with access controlled by the patients themselves, can solve the problems of privacy, stakeholder cooperation, assuring all-electronic, standardized records, and financial sustainability that have hampered efforts to develop HII so far.

This information was also presented in an earlier Viewpoint article, “Putting Health IT on the Path to Success” in the Journal of the American Medical Association (subscription required for full access).

HIEs are Failing

January 14th, 2013

My guest column posted today at NHINWatch describes the evidence — now compelling — that our efforts to build a nationwide system of health information exchanges (HIEs) are failing. Health record banks are a feasible alternative, as explained in detail in the recent Architecture and Business Model white papers from the Health Record Banking Alliance. Are we ready to try a new approach that can succeed?

What’s Really Needed to Improve Health Care with Electronic Records

September 28th, 2012

The recent New York Times article on increased healthcare costs from electronic records (“Medicare Bills Rise as Records Turn Electronic”, 9/21/12) seems at first glance to be discouraging. Aren’t electronic medical records supposed to reduce healthcare costs?

On reflection, it’s really not surprising that as physicians adopt electronic records, reimbursements, which are currently based on the quality of documentation, are increasing. The improvements in that documentation resulting from electronic records naturally increase payments. Ultimately, we need to change the health care payment system from rewarding activity to rewarding good care. But how can we realistically do that unless comprehensive patient information is available to accurately assess whether the care provided is truly appropriate?

Imagine an aircraft maintenance system where individual planes are repaired in various airports around the country, but all the repair records remain where the work is done. No mechanic would have access to the complete repair history of any plane. Crazy, right? Yes, but very much analogous to how we handle health care records. Wherever you receive care, a record – be it paper or electronic – is left behind, and no doctor has ready access to your complete history. It’s no wonder that healthcare costs are so high and that there are so many avoidable medical errors and adverse events.

Just making all medical records electronic will not solve this problem. We also need to create a mechanism to aggregate all your scattered records into a complete whole when they are needed. Of course, this must be done in a way that protects patient privacy, and ensures that medical record access is only available with your permission.

A simple, but largely unexplored, solution to this problem is the community health record bank. Such a health record bank would provide each person with a free account where copies of all their health records would be deposited when they are created. All access to the medical records in each account would be controlled by the patient to protect privacy. A nonprofit patient-governed community organization would run the bank, and it would be paid for by new and innovative uses of your health information with your permission. For example, most people would gladly pay a few dollars a year so that their loved ones would immediately be notified if their health record bank account were accessed by emergency medical personnel – meaning that emergency care was being given.

How would this help reduce healthcare costs? The anticipated efficiencies and improvements from electronic records will not primarily come from making the existing silos of paper records electronic – rather, the savings will result from having comprehensive records for each patient available when and where needed. For example, how can unnecessary duplicate tests and procedures be avoided without access to all the records of each patient?

So why hasn’t this straightforward health record bank solution been implemented? The simplest explanation is a “failure of imagination.” In our current record system, when a provider finds out that there are needed records at another provider site, those records are requested and (hopefully) transmitted – albeit typically by fax. Much of the current work towards electronic “health information exchange” has been directed to automating this process. However, it is very difficult, complex, and expensive to assure complete patient records by instantly finding, retrieving and integrating them from the various locations where they exist (more on this in a future blog). And, to make matters worse, while the health record bank solution is good for everyone, none of the existing healthcare stakeholders can easily take on this responsibility because they would not readily be trusted to avoid using the information to gain an unfair competitive advantage. Finally, patients, who would benefit most, typically do not have an effective voice when these issues are considered.

So if we are to really get the benefits of electronic medical records, and take advantage of their significant potential to improve the quality and lower the cost of healthcare, we need to begin implementation of a solution that will really solve the problem — like health record banks. Comprehensive electronic health records are an essential prerequisite for controlling health care costs while ensuring quality. Of all those concerned about these issues, who will step in and provide the seed funding needed to solve this problem?

Measuring Health Information Infrastructure Progress

July 19th, 2012

As work continues across the country to develop our health information infrastructure, we need to be able to objectively evaluate our progress. In a recent column at, I describe the methodology a colleague and I developed for this, which was validated and published in the Journal of Biomedical Informatics.

Harvard’s Data Privacy Lab Launching Health Record Bank

April 17th, 2012

My guest column posted today at NHINWatch describes the imminent launch of a health record bank (HRB) by the Data Privacy Lab at Harvard. Notably, this is the first time that a major academic institution has hosted an HRB. All stored data will be double encrypted (like the two keys of a safe deposit box) to ensure that only the account holders have access. That, along with the secure and neutral environment, should go a long way to engender consumer trust. As it becomes more widely understood that successful health information infrastructure depends on having each patient’s comprehensive records in one place under the patient’s control, you can anticipate that additional HRBs will be established following Harvard’s lead.