Cardiovascular Research And Care In The Era Of Big Data
Ten years ago, I made a decision that had a profound and lasting impact on my career. I decided to take time off, in the midst of grad school, and learn introductory computer programming. This was already several years after the initial draft sequence of the human genome had been released, and big data was starting to permeate virtually every research project. I thought that speaking the language of big data would allow me to tap into a greater breadth of possibilities.
The following years could only be described as a revolution in terms of the impact of data in medical research. Additional full genome sequences began to trickle in, and with them ambitious projects to generate population-scale genomic resources. We began to see the first near-complete mappings of protein and genetic interactions in living cells, and our understanding of epigenetic effects increased exponentially. I spent several lucky years as the ‘techie’ in wet labs, the one lab mates would turn to when they wanted to boost their own projects with big data, applying tools I was constantly struggling to understand myself.
In speaking with the next generation of graduate researchers, it’s clear that the role of ‘techie’ has been abandoned in favor of a more comprehensive education, one in which students are encouraged to hone their skills in the procurement, analysis, and visualization of ‘big data’. The meteoric change in the magnitude of data available to researchers has made this shift inevitable.
While there are an increasing number of datasets publicly available to researchers and physicians, perhaps the greatest hurdle in realizing the potential of big data for research projects is the heterogeneity of these datasets. Data are collected according to different standards, with different levels of rigor, and using methodology (genomic, medical imaging, electronic health records (EHR), etc.) that is difficult to reconcile across survey sites and data types. There also exist varying levels of standardizations for data pre-processing or harmonization, often resulting in difficulty reproducing findings. For this reason, I see data pre-processing and harmonization as the greatest hurdle to scientific advancement in the realm of big data.
Currently there exist several efforts aimed at addressing the data heterogeneity problem. One notable effort, and one that I was fortunate enough to be involved in as an early beta tester, is the American Heart Association Precision Medicine Platform. First announced in 2016 as a collaboration between the AHA and Amazon Web Services (AWS), the goal of the platform is twofold: first to provide access to large patient cohort datasets, but also importantly, to provide a community workspace for solving problems in data harmonization and large-scale data analysis. I’m proud to contribute resources to this site, and I hope others will do the same.
All patient data has value, but I sincerely believe that we can not fully understand an individual’s highly personal and nuanced disease characteristics without a more comprehensive understanding of their phenotype. I see data combination as being crucial to the advancement of medical diagnosis for the cardiovascular community and beyond, and I’m happy to devote this phase of my career to the task of bringing that about.
Gabriel Musso, PhD, VP
Life Sciences, BioSymetrics Inc
Editor’s Note: Gabriel Musso is the VP Life Sciences at BioSymetrics and is a beta tester for the American Heart Association Institute for Precision Cardiovascular Medicine Precision Medicine Platform. To learn more about how to use big data to learn, search and discover, please go to http://precision.heart.org.