Creating a Medical Research Database

Overview:

The Research Program on Genes Environment and Health (RPGEH) is an effort by the Division of Research at Kaiser Permanente to create a resource for health researchers to investigate causes and treatments for a wide range of diseases. In order to do this, the program has collected samples from over one hundred thousand Kaiser members in Northern California and created a system to combine information from the analysis of those samples with health records, survey information, and GIS data.

The organizing system for the program is concerned with several different tasks. Tracking participant interactions to recruit new participants and build the resource, managing samples and analysis, and compiling data sets requested by researchers.

 

What Resources are Being Organized?

The RPGEH is built on participants, their samples, and health data. It might be tempting in a research system to focus on samples and data as the primary resources but the need to recruit participants made participants the most important resource class. There are theoretical and practical reasons for this. A blood sample or medical record on its own is of limited use to researchers; it is the combination of these things that will allow researchers to find novel links to push forward our understanding of health. Participants are the nexus that binds together samples and data sets and make them meaningful. In a sense, participants are the primary resource of the RPGEH and any samples or health information is just a description of them. Also, without participants nothing else would be possible. A large part of the RPGEH has been recruiting nearly two hundred thousand participants to participate. Recruitment of a participant is no small feat. Just because we send them a consent form does not mean they will sign and return it, even if they do, that does not mean they will provide a sample or health survey, and even if they do they could decide later on to revoke permission to use their data and samples. Due to this level of agency, maintaining accurate and up to date descriptions of participants and their interactions is very important to the system.

As physical resources, samples present different challenges in the system. Different sample types are processed and stored in different ways. For example, blood samples must be split into different components and will go bad if they are not processed frozen quickly. A database stores the location and other descriptions of samples such as who they belong to and when they were processed.

The most important piece of health data for each participant, their health record, is not directly managed by the RPGEH. Rather when data is needed, it is pulled from Kaiser’s electronic health record system and transformed into the needed format.

 

Why are the resources being organized?

The goal of the RPGEH is to provide a one stop shop for health research. A researcher can make a request to study a disease and the RPGEH can find matching participants and provide samples or sample analysis and a data set compiled from Kaiser health records, health surveys, and other information sources. Resources are organized either to support recruitment or to allow the creation of research datasets.

 

How much are the resource organized?

There is a high level of description applied to each participant to keep track of their status in the program. Additionally, there is a high level of description about each patient available in their health records. Patient’s are not sorted into predefined categories based on their health information however. This is because we simply do not know what a researcher might need. Rather, when a request is made the descriptions in the health records are leveraged to find matching participants.

 

When are the resources being organized?

Most of the organization for the system’s resources is done when it first enters the system. When a participant signs up, their contact information is loaded into the system. When a sample is taken, information about it is loaded into the system. Further organization is done as needed to update the system, such as noting a new address or tracking samples that have been sent to a lab for analysis. Also, as has been noted, when a researcher requests data there is another level of organization applied to find matching participants and compile the data set needed for the research.

 

Who does the organizing?

The RPGEH data set is highly mediated. Researchers who want to use RPGEH data are not allowed direct access. Rather, they must submit an application with specific requirements. If the application is approved, then RPGEH data analysts pull the data and create a dataset for the researcher. Since the system is not outward facing, users can be trained to use the system. Additional organizing is done automatically. There are programs that hook into Kaiser’s systems to update participant and sample every day.

 

Other considerations?

Working with health data means the RPGEH must be very conscious of patient privacy HIPPAA regulations. Participants are given extensive information about their rights as research participants when they sign up. Data is encrypted and anonymized before it is sent to researchers.