mitoML

Scientific Computing

Machine learning approaches to understand the development of defects in oxidative phosphorlylation in mitochondrial disease

Motivation for PhD project

Motivation for my PhD in applied data science in medicine

Science is about hypotheses supported by evidence and until recently we followed methods restricted to empiricism, theory and simulation. Data science is the fourth paradigm that evolves and advances our scientific enquiries. While data has always been penultimate step in any scientific conclusion, efficient utilisation of that data was either not possible or not realised. Data science is all about maximum utilisation of data to test hypotheses, predict trajectories, extract insights, inform and even automate decision making.

The fourth paradigm

There are many challenges with data: size (too big or too small), variety (e.g. text, image, signals, biomarkers), velocity (e.g. measuring rate of change in particle colliders) and importantly veracity (i.e. accuracy of the data). Data science tackle these challenges using concrete footings in statistical and computer science and leverages advances in technology (e.g. cloud computing).

Data science has already revolutionised our lives, from recommendations we receive online to fraud detection to self-driving cars. Its capability can further be appreciated by current events like when it managed to predict start of a pandemic weeks before even the local authorities.

Medicine, as any other science, is evidence based but, unlike other domains, relying on intuition instead of evidence in this field can be more detrimental. Medicine is a science of diagnosis, prognosis, treatment and prevention, and at each of these stages there is opportunity to collect, collate, explore, analyse, deduce patterns in, draw insight from, and predict outcome using the DATA.

Healthcare data collected either in clinical settings (e.g. electronic medical records, genetic records, clinical trials, genomic data) or otherwise (e.g. data from wearable sensors and smart phones) is the fastest growing data across all domains. Alongside we now have technology that can affordably store and process the data of this magnitude and beyond. To marry these two and make meaningful solutions/applications/models a growing community of data scientists are playing a role in medical research.

Now is an exciting time for data science in medicine and we have all the ingredients to make many impactful data driven scientific discoveries in this domain. This is evident from the impact of private firms and applications like Google’s deepmind, Insitro, NHS GP at Hand , IBM Watson-Health and SkinVision on healthcare even at this early age of data science in medicine. To highlight a case in point, SkinVision is an application that uses machine learning (a technique in data science) to detect skin cancer with better precision than a GP. And there are many more and much more advanced impactful ideas waiting to happen.

Having said that, the picture is not all great. There is a major known challenge of data science i.e. the data driven insights & predictions are as good as the data itself. If the veracity of the data is questionable or the data is biased or undefinable (i.e. grey areas of medicine) then results will be as such. And here comes the important role of domain experts (medical researchers) to identify these situations and to steer from problems towards solutions.

Finally, half of my family work in technology and the other half in medicine and our discussions (sometime boasting about what we do!) highlight the tremendous opportunity for applied data science in medicine to improve human condition and this motivates me still.