The search for relevance in a field swamped by data

16th June 2017

ETH Zurich

Enaie Azambuja

0 0

Genomics, digital patient files and real-time health surveillance – never before have we had access to so much health data. ETH researchers explain how they extract relevant information from this sea of data and the potential benefits for personalised medicine.

Data science is currently experiencing a boom in biomedical research: more professors, more funding, more computing capacity and more research cooperations. Since its founding in June 2014, Borgwardt’s Machine Learning & Computational Biology Lab in the Department of Biosystems Science and Engineering (D-BSSE) in Basel has grown to 15 employees – and this growth is expected to continue.

The 36-year-old professor embodies a new type of data scientist, the kind that will likely become a permanent fixture in the medicine of the future. He studied computer science with a minor in biology in Munich, also obtaining a Master’s degree in the latter from Oxford University in his fourth year of studies.

Borgwardt grew up as the first human genome was being decoded, and he was fascinated by the emerging possibilities in genomics while still at university. Today, though, he knows that the initial expectations were often exaggerated, saying:

“We are still a long way from being able to infer the exact risk of occurrence of complex diseases based on a person’s genome.” One possible explanation for this is that complex diseases such as cancer and diabetes are not caused by individual changes in the genome, but rather by the interactions among millions of base pairs in human DNA. This is where computer science comes into play.

Exploring and analysing these interactions requires vast quantities of data. Today – 16 years after the sequence of the human genome was published – the necessary data sources are available. “We are currently observing an explosion of data volume in multiple dimensions,” says Borgwardt.

Thanks to technological advances in genomics, it is now possible to sequence billions of base pairs of a human genome in a matter of days – and at a cost of less than 2,000 Swiss francs.

This opens up completely new possibilities: whereas researchers previously focused on questions at the molecular level of the individual, today they are increasingly concerned with matters at the population level and ultimately also with the DNA of humankind as a whole.

At the same time, health surveillance is shifting from sporadic measurements, for instance at an annual check-up, to continuous real-time measurements. Thanks to wearables and smartphone apps, we already have the ability to chart our pulse, body temperature and exercise habits. On top of these new possibilities in health surveillance and genomics, hospital patient files are increasingly becoming available in electronic format.

All this data harbours great potential. Researchers hope that using it in a focused manner will help in creating personalised therapies and boosting their effectiveness. “This is where probability becomes an extremely important aspect,” explains Borgwardt.

Algorithms must be able to distinguish random correlations between patient data and the occurrence of a disease from statistically significant correlations. “The sheer size of the multidimensional dataspaces creates entirely new perspectives on these classical statistical problems.”

His group receives funding from an SNSF Starting Grant to develop new algorithms that spot statistically significant patterns in enormous collections of data. They are faster, require less computing capacity and can separate relevant data from irrelevant data much more efficiently than before.

Thanks to the advances in genomics and the increasing digitisation of patient data, data science is becoming relevant to medicine. The European Bioinformatics Institute predicts that within five years, the genome of 15% of the population in industrialised countries, or 150 million people, will have been sequenced.

Gunnar Rätsch, Professor of Biomedical Informatics, does the maths: that would be around 28 exabytes (= 28x109 gigabytes) of data. With this amount of information, new and efficient algorithms will be required in order to glean knowledge that can be used for research and patients and that contributes to more precise and personalised therapies.

For example, they will have to search through billions of base pairs for interactions associated with disease. As part of a project in the “Big Data” national research programme (NRP 75), Rätsch’s group is working on the efficient storage and analysis of these large quantities of genome data. “We’re in the midst of a breakthrough!” exclaims the data scientist euphorically.

But how, specifically, can patients benefit from these kinds of smart algorithms? Rätsch offers a practical example: in close collaboration with Inselspital, the Bern University Hospital, his group is currently developing an early warning system for organ failure in the intensive care ward.

Over a period of ten years, the hospital recorded data on blood pressure, pulse, temperature, medication, glucose and lactose levels, and electrocardiograms (EKGs) for nearly 54,000 patients.

Rätsch is now developing algorithms to analyse these 500 gigabytes of data containing some 3.5 billion individual measurements to find patterns that indicate an imminent emergency. This would enable doctors and nurses to intervene before a patient’s medical condition deteriorates.

Systems like this require machine learning, an important branch of data science. The idea is for programs to recognise patterns and rules from a given dataset, and continuously learn to do so better and better.

Machine learning plays an important role not only in evaluating patient data, but also in improving medical technology. A prime example is magnetic resonance imaging (MRI), which is one of the most important medical examination methods in use today, particularly for soft tissue.

Klaas Prüssmann, Professor at the Institute for Biomedical Engineering, a joint institution of ETH Zurich and the University of Zurich, has devoted himself to the further development of MRI technology.

In a recently published article, he describes a system that uses 30 temperature sensors and 16 magnetic sensors to run a self-diagnostic of the MRI unit. In the future, suspicious patterns could give technicians an early warning, thus reducing unit downtimes in the hospital and saving costs.

Prüssmann also feels that his field of research is “on the verge of a gold rush”. He expects data science to change not only MRI equipment, but the imaging as well. “If we succeed in translating relevant prior knowledge from MRI records – for instance that we are examining a brain and not a heart – into a manageable form, then we could greatly increase the speed and efficiency of the measurements.”

Prüssmann predicts a seismic shift in radiology, too: in the future it will be possible to compare millions of existing MRI images with one current measurement. This can help in identifying important indicators of certain diseases. In addition, applying algorithms to MRI images can reveal patterns that are not visible to the naked eye.