Advances in data science powering capabilities for analyzing large data sets


Kenneth Ottenbacher, PhD, OTR
Kenneth Ottenbacher, PhD, OTR

Ongoing advances in statistical methods and information technology provide researchers with unprecedented opportunities to access and analyze multiple databases and increasingly large data sets to answer clinical questions. The ACR/ARHP session Answering Clinically Relevant Questions Using Large Datasets will feature a panel of experts who will talk about opportunities and challenges in the evolving era of “big data.” The session takes place on Tuesday from 8:30 – 10:00 am in Room W474a.

Kenneth Ottenbacher, PhD, OTR, will begin the session with an introduction and overview of the emerging field of data science and how it is changing clinical research. Dr. Ottenbacher is the Russell Shearn Moody Distinguished Chair in Neurological Rehabilitation, Professor and Director of the Division of Rehabilitation Sciences in the School of Health Professions and Associate Director of the Sealy Center on Aging at the University of Texas Medical Branch (UTMB), Galveston, TX.

“Because of the advances in technology, it’s now possible to get access to large data sets, many of which didn’t even exist a few years ago, and even if they did exist, the average scientist or investigator didn’t have the hardware or software to be able to analyze them,” Dr. Ottenbacher said. “That’s all changed. Today, we have desktop computers that can analyze huge data sets that five years ago would have taken supercomputers to analyze.”

While the technology has created the potential to analyze data in new ways, Dr. Ottenbacher said the challenge is to train researchers, particularly clinician-scientists, in the data science skills necessary to help realize that potential.

“There’s all this data that had not been available to investigators before, particularly in clinical fields, that now, all of a sudden, has become available, but it requires a different way to think about and analyze research,” he said. “Clinicians, when they learn to do research, mainly learn to do prospective patient-oriented research with an emphasis on clinical trials, so the vast majority of today’s investigators weren’t trained in how to do this kind of analysis with large data sets.”

To address this gap, Dr. Ottenbacher said the National Institutes of Health has made education and training in data science one of its newest strategic priorities, and a growing number of universities are offering new degree programs in data science. Additionally, numerous initiatives and programs are in place or underway to provide data science support, resources, and training for current researchers, including a center recently established at Dr. Ottenbacher’s institution, UTMB, in conjunction with the University of Michigan and Cornell University to build research capacity in data science.

Soham Al Snih, MD, PhD
Soham Al Snih, MD, PhD

Also in this session, Soham Al Snih, MD, PhD, a colleague of Dr. Ottenbacher’s and part of the UTMB data science team, will describe how different sources of data, such as administrative data and population data sets, can be linked to answer clinically relevant research questions. Dr. Al Snih is an Associate Professor in the UTMB Division of Rehabilitation Sciences.

When it comes to answering clinical questions, the potential extends beyond simply being able to analyze large individual data sets, she said, but rather the ability to merge, or “harmonize,” multiple data sets that look at different topic areas, different outcomes, or different patient populations to target specific clinical questions.

“It’s hard to compare what happens in one data set with another, but with the tools that we have today, you can apply a metric that can be interpreted across a large number of large data sets,” Dr. Al Snih said. “That gives you a lot of flexibility and options in terms of analyzing small subsets of population, for example, and other things that you just couldn’t do with a single data set.”

Nancy A. Baker, ScD, MPH, OT
Nancy A. Baker, ScD, MPH, OT

In the final presentation of the session, Nancy A. Baker, ScD, MPH, OT, Associate Professor in the Department of Occupational Therapy, Tufts University, Medford, MA, will provide an example of how large data sets can be used to inform clinical care.

Taking advantage of the support and resources offered by the UTMB center, Dr. Baker connected with Drs. Ottenbacher and Al Snih, who provided her with guidance in merging multiple data sets to explore quality of care data related to carpal tunnel syndrome treatment.

“There is not a lot of information in the literature about the average treatment of carpal tunnel syndrome, aside from surgical intervention, so I decided that one way I could really address this issue was to go to a large data set and see what was happening in the real world in terms of what a large private insurance company’s data set could tell me,” Dr. Baker said.

Dr. Baker examined two data sets—an insurance company data set and the publicly available government-funded National Ambulatory Medical Care Survey (NAMCS). She used each data set to answer a different question—the insurance company data to look at treatment over time and the NAMCS data to explore differences in treatment between people on workers’ compensation and private insurance or Medicaid.

“More study needs to be done, but among the things I found was that there continues to be a lot of people with carpal tunnel who do not receive that much care, regardless of workers’ comp or private insurance,” she said. “But most interestingly, despite the general perception that carpal tunnel is a workers’ comp injury, the vast majority of people who have carpal tunnel receive treatment through their private insurance.”