Patient Registry FAQs — Orphan Disease Center

Patient Registries: Frequently Asked Questions

What is data?

Data is information that is ready for processing. It is typically stored in a spreadsheet (for example, a .csv or .xlsx file), and can be viewed/analyzed using Microsoft Excel or other software, such as SAS. You can learn more at: What Is Data? - The ACS Guide to Scholarly Communication (ACS Publications)

What is a data dictionary?

A data dictionary is a document that describes the structure of the collected data. It will typically include information on what data items (or variables) are collected (for example, participant sex, participant race, participant home country), and the characteristics of each data item. Possible characteristics may include the variable type (for example, numeric or character), the length of the variable response field, and the minimum/maximum value allowed for the variable.

Why is a data dictionary important?

A data dictionary is very important for those who plan to start a registry. The data dictionary provides an exhaustive list of all data items (or variables) collected in the data and details on how each data item is structured. Reviewing the data dictionary provides insight into possible challenges that could arise when analyzing the data, and can help inform your decision on whether a particular registry platform is the right fit for your organization. It is best to have an experienced data analyst review the data dictionary.

My registry vendor provides tables and charts. Does that mean I have access to all data?

No. Tables and charts are representations (or summaries) of the data, not the individual-level data itself. You have access to all data if you can (1) export a file (for example, a .csv or .xlsx file) with raw participant-level data, and (2) obtain a data dictionary. With these two items, you or an experienced data analyst can build tables, charts, and conduct additional analyses.

What is de-identified data?

De-identified data is data from which all identifying personal information (for example, participant name, participant email, participant home address) has been removed. Each participant’s personal information is replaced with a numeric code. The numeric codes are used to identify participants in the data.

Patients can upload their medical records into a patient registry platform. Is that data de-identified?

It depends. If data is uploaded into a patient registry platform in the form of pdf documents, it is usually NOT de-identified. This is because medical records received from a healthcare professional typically include identifying information (such as the patient name). One way to de-identify medical records is to extract the relevant information from the uploaded pdf and store that information in a spreadsheet. Any patient foundation which asks patients to upload medical records should understand what happens with those medical records and make sure the process is shared with potential registry participants.

What is patient-reported data?

Patient-reported data is reported by the patient directly. If a patient is asked to respond to a survey, the patient’s survey responses are examples of patient-reported data.

What is NOT patient-reported data?

Data that does not come directly from patient responses. For example, data obtained through a review of medical records is NOT patient-reported data.

What is data quality and why is it important?

Data quality usually refers to the accuracy, consistency, and completeness of the data. In order for conclusions from the data to be valid, the data must be accurate, consistent, and have the least possible missing information. More data is not always better; in fact, long surveys asking for a lot of information often result in lower overall data quality. It is important to have an experienced data analyst monitor the data and perform interim analyses as the data is collected. This will allow the research team to detect and correct any issues early in the data collection process.

Is all healthcare data collected in the US protected by HIPAA?

No. HIPAA only applies to data collected by organizations that are considered “covered entities” under HIPAA. The term “covered entities” includes health care providers, health plans, and health care clearinghouses. For more information please refer to: Covered Entities and Business Associates | HHS.gov

Are patient foundations required to follow HIPAA?

No. Patient foundations are typically not covered entities under HIPAA, and are not required to follow HIPAA nor can be held accountable for not following HIPAA.

Does being HIPAA compliant mean the same thing as saying that data is protected under HIPAA?

No. Data is only considered protected under HIPAA if it is collected by one of the covered entities. Please see more here: What is Considered Protected Health Information Under HIPAA? 2023 Update (hipaajournal.com)

What is important to share with the patient community about your registry?

Transparency with the patients is very important. Consider disclosing: purpose of the registry, what information is collected, how data is used by your foundation, purpose of any medical records that patients are asked to upload, how can data be accessed and who makes decisions on data access, how is data protected, when will results and findings be shared with patients. If the registry is approved by an IRB, it is important to share IRB name and contact information.

Myths Versus Facts about Patient Registries

For additional information on patient registries, please find these resources:

Do you have a question about patient registries not listed here?

Email Samantha Charleston to add it to the list!

Created in partnership with Sophia Zilber, Cure Mito Foundation and the Cure Mito Foundation Scientific Advisory Board