March 26th, 2020

COVID-19: Doing Good with Data + AI

RSS icon RSS Category: AI4Good, Data Science, Healthcare, Machine Learning, Time Series

During times of severe societal strain, individuals have historically shown an inclination to offer aid and assistance. Often these sacrifices have been at great cost to life or livelihood. In other cases, the efforts have been seemingly more mundane but nevertheless still essential. The efforts of the over 10,000 women code breakers of World War II is one such example. From 1941 to 1945, these women, recruited because of their math, science and foreign language abilities, worked tirelessly to break down and understand constantly mutating code systems. On any given day, a single individual’s efforts likely seemed minor. But in the collective, the results were substantial. At the conclusion of the war, Major General Chamberlin noted that these efforts “saved us many thousands of lives” and “shortened the war by no less than two years.” As data scientists, we currently have the ability to, in our own small way, contribute significantly to a contemporary battle: understand and prevent the spread of COVID-19.

Of note, it does seem clear that our most productive work on this topic will be in coordination with healthcare facilities and researchers. Just as the work of the WWII code breakers was collaborative and coordinated, so too should our efforts be collaborative ones with those on the medical front line. That said, there are a growing number of opportunities for interested data scientists. These include:

Moreover, there are increasingly a number of open-source data sets available for those willing to contribute to the effort. In our own efforts, for example, we have made use of the following data:

  • There is the popular 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE which contains confirmed, recovered and deceased cases of COVID-19 around the world. For the USA, it can provide some of this information at state and county level too. The same information could also be retrieved from the following source in different formats. Bear in mind the data sets are not perfect. They contain inaccuracies and duplicated entries, but they should provide a good basis for getting a reasonable understanding of how the virus spreads around the globe.
  • The following website has information regarding total beds and ICU units from multiple hospitals across the USA. It also estimates their current capacity. Similar information can also be retrieved from the following online spreadsheet.
  • The COVID Tracking Project has information regarding COVID-19 tests for multiple states in the USA, along with a breakdown of whether they were positive or negative.
  • In the interest of comparing COVID-19’s average days of staying in hospital against other diseases, the OECD website contains very useful information for multiple diseases and for many countries.
  • Hospital admission rates for the USA can be retrieved from here. For state level hospital admission rates, there is a breakdown here.

Using these (and other such data), construction of time series models that predict future cases of COVID-19 for different geographic regions, as well as forecast hospital admissions and assess when maximum capacity will be reached for a given region. 

For example consider the following SEIR (Susceptible-Exposed-Infected-Resistant) dashboarding application developed with H2O Q and H2O Driverless AI that is automatically updated as new daily data is made available. 

The application first takes as input (in addition to the available data) selected hospital and demographic input for a given hospital system. Then, using the selected parameters, new cases can be forecast for a given region with daily updates:

   

Second, using publically-available hospital bed data for a given region, capacity assessment for both overall hospital bed usage and ICU bed usage can be made:

Then, based on the latest data, flags and warnings can be designed and implemented.

Other simple, but useful applications are also possible. In some areas, substantial progress has already been made. Image processing, for example, has been found to be useful in the effective diagnosis of COVID-19. Likewise, using EHR (electronic health record) data, it is possible to identify variables associated with severe complications. Currently, there are a number of pharmaceutical research firms using AI for COVID-19 drug development. Further applications might include assessment of the impact of the virus against economic indicators and/or understanding the impact of weather in the spreading of COVID-19. 

In the end, it seems fruitful to explore areas of application where data science can contribute to the efforts to understand and combat COVID-19. Our hope is that, by joining forces, data scientists and medical practitioners can make effective and significant progress in these efforts.

 

 

About the Authors

David Engler

David Engler is a Senior Data Scientist and the Director of Customer Success at H2O. He has 15 years of experience leading data science teams in healthcare research and analytics and has over 20 publications in medical analytics as a primary author. He most recently built and led the analytics team for healthcare strategy at the University of Utah hospitals and clinics. David obtained his PhD in Biostatistics from Harvard University.

marios
Marios Michailidis

Marios Michailidis is a competitive data scientist at H2O.ai and a Kaggle Grandmaster (ex World #1 out of 500,000 members) . He holds a Bsc in accounting Finance from the University of Macedonia in Greece and an Msc in Risk Management from the University of Southampton. He has obtained  his PhD in machine learning at University College London (UCL) with a focus on ensemble modelling. He has worked in both marketing and credit sectors in the UK Market and has led many analytics’ projects with various themes including: Acquisition, Retention, Recommenders, Uplift, fraud detection, portfolio optimisation and more. He is the creator of KazAnova, a project made in Java for quick credit scoring  as well as is the creator of StackNet Meta-Modelling Framework.  Marios’ LinkedIn profile can be found here with more information about what he is working on now or past projects.

Leave a Reply

Building an AI Aware Organization

Responsible AI is paramount when we think about models that impact humans, either directly or

October 26, 2020 - by
Making AI a Reality

This blog post focuses on the content discussed in more depth in the free ebook

October 16, 2020 - by Ellen Friedman, PhD
H2O on Kubernetes using Helm

Deploying real-world applications using bare YAML files to Kubernetes is a rather complex task, and

October 16, 2020 - by Pavel Pscheidl
H2O Release 3.32 (Zermelo)

There’s a new major release of H2O, and it’s packed with new features and fixes! Among

October 14, 2020 - by Michal Kurka
The Challenges and Benefits of AutoML

Machine Learning and Artificial Intelligence have revolutionized how organizations are utilizing their data. AutoML or

October 14, 2020 - by Eve-Anne Tréhin
Combining the power of KNIME and H2O.ai in a single integrated workflow

KNIME and H2O.ai, the two data science pioneers known for their open source platforms, have

October 14, 2020 - by Rafael Coss and Stefan Pacinda

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img