“We have got to get researchers access to that data in the GDC.” That was my thought at the American Society of Hematology Meeting last December, where I was astounded by talk after talk with clinical insights fostered by data from the Multiple Myeloma Research Foundation (MMRF).
Long committed to data sharing, the MMRF partnered with the National Cancer Institute’s Genomic Data Commons (GDC) and agreed to contribute molecular and clinical data from their landmark CoMMpass Study. Harmonizing this extensive data set and preparing it for sharing has been a major endeavor. Today, I’m proud to announce the release of this immense data set in the GDC.
More Rich, Longitudinal Data Than Ever Before
The MMRF has diligently collected over eight years (and graciously shared with the GDC) genomic and clinical data for nearly 1000 cases of multiple myeloma (MM).
Being more than a little familiar with The Cancer Genome Atlas (TCGA) and other large-scale cancer characterization projects, I know the level of dedication and effort it must have taken the MMRF to identify, enroll, and characterize so many patients, especially for an uncommon disease. For comparison, TCGA collected 200 cases of acute myeloid leukemia, another blood cancer with slightly higher incidence.
Importantly, the MMRF has collected longitudinal data for these patients. They gathered samples and clinical information from patients every six months for eight years. When possible, samples were obtained from patients at the time of relapse or resistance to drug treatments, allowing an analysis of tumor evolution. Clinical information includes molecular tests, treatments, and quality of life measures.
In total, we have genomic and clinical data for 995 cases of MM, 241 of which have genomic data at multiple timepoints. All 995 cases have longitudinal clinical data, with nine timepoints per case and 15 lab values per timepoint on average. There are over 19,000 files of harmonized alignments, mutation calls, and gene expression levels. These highly characterized and annotated cases represent a new type of multidimensional data we hope to continue adding to the GDC.
Shared Values in Shared Knowledge
I spend a lot of time encouraging people to share data, then I spend more time making sure the GDC has the infrastructure to accommodate said data sharing. This influx of wonderfully rich data has required a lot of work from both the GDC and the MMRF, including updates to the GDC’s underlying data structure and harmonization and processing of over 30,000 data files, with extensive checking and re-checking of results.
The level of sustained effort all but requires a firm belief in the value of sharing data for driving research. I’m glad that the MMRF shares our view on making molecular and clinical big data available and accessible to the entire research community. Indeed, the MMRF did make the raw data available early on through the NIH short read archive (SRA), but these data were too massive and complex for most researchers to use in that form. The GDC makes these complex data easy to search, allowing researchers to download only the portion of the data that is relevant for their research. Importantly, the GDC also provides a uniform pre-analyzed view of the data, enabling researchers to explore, for example, the association of genetic aberrations with response to treatment. I hope that this ease of use will ignite a flurry of activity in the myeloma research community.
“Ensuring that the curated data from CoMMpass is made available to researchers through the GDC is a priority for us at the MMRF,” said Steven Labkoff, MD, Chief Data Officer. “The depth and breadth of this incredibly rich data source will hopefully enable researchers to garner new therapeutic insights. It is our hope that the research community will take up this data set and help us to find cures for this disease.”
Uncovering 12 Molecular Subtypes
12 distinct molecular types of MM have been identified so far from the data. I understood MM to be a very heterogeneous disease, but this diversity is surprising. Again, to draw comparisons from TCGA, five prognostic subtypes were identified in breast cancer and gastrointestinal adenocarcinomas.
At ASH, I saw firsthand how CoMMpass data and analyses are improving our basic understanding of this enigmatic disease. Many of these studies examined in depth the molecular mechanisms driving myeloma cell growth and survival.
For example, researchers have identified specific genetic alterations associated with high risk disease in newly diagnosed MM patients. Particular poor prognosis was seen in “double hit” cases that have a complete loss of the TP53 gene and a gain/amplification involving the CKS1B locus on chromosome 1.
Informing How Patients are Treated and Monitored
With a dozen subtype distinctions and multiple related pre-cancerous conditions, MM is an especially difficult and confusing condition to face. The MMRF’s goal is to establish a clear, data-driven action plan for each individual patient.
“No two patients are the same, so why should they get treated the same?” says MMRF Chief Scientific Officer Daniel Auclair, PhD. “With a better molecular understanding of each patient’s unique presentation of the disease, we envision each patient getting a precise diagnosis, tailored treatment plan, and improved outcome.”
We are starting to see examples of molecularly-guided treatment decisions for patients. Much of this has been informed by newly-identified prognostic predictors and markers of transition to high-risk disease or resistance to drugs.
An in-depth look into the blueprint of MM has revealed that over 75% of patients have actionable molecular alterations for which drugs already exist in the clinic. MM trials targeting such alterations are currently ongoing, including the biomarker-driven MMRC MyDRUG trial testing multiple targeted therapies.
Another example is the identification of chromosomal translocations involving the Immunoglobulin Lambda (IgL) locus, associated with poor survival and lesser benefit from treatment with immunomodulatory drugs (IMIds). Counterintuitively, IgL translocations often co-occur with hyperdiploidy, a form of the disease normally associated with better outcome. However, patients with both hyperdiploidy and IgL translocations experienced worse outcome and no benefit from IMIds, indicating the importance of adding this genomic feature to the myeloma diagnostic panel.
These exciting studies have utilized only subsets of the entire CoMMpass data set. We look forward to even more clinical insights and impact on patients enabled by access to the harmonized data set released in the GDC today.
Access MMRF CoMMpass data at the GDC