March 26th, 2020
COVID-19: Doing Good with Data + AIRSS Share Category: AI4Good, Data Science, Healthcare, Machine Learning, Time Series
By: David Engler and Marios Michailidis
During times of severe societal strain, individuals have historically shown an inclination to offer aid and assistance. Often these sacrifices have been at great cost to life or livelihood. In other cases, the efforts have been seemingly more mundane but nevertheless still essential. The efforts of the over 10,000 women code breakers of World War II is one such example. From 1941 to 1945, these women, recruited because of their math, science and foreign language abilities, worked tirelessly to break down and understand constantly mutating code systems. On any given day, a single individual’s efforts likely seemed minor. But in the collective, the results were substantial. At the conclusion of the war, Major General Chamberlin noted that these efforts “saved us many thousands of lives” and “shortened the war by no less than two years.” As data scientists, we currently have the ability to, in our own small way, contribute significantly to a contemporary battle: understand and prevent the spread of COVID-19.
Of note, it does seem clear that our most productive work on this topic will be in coordination with healthcare facilities and researchers. Just as the work of the WWII code breakers was collaborative and coordinated, so too should our efforts be collaborative ones with those on the medical front line. That said, there are a growing number of opportunities for interested data scientists. These include:
- Data analysis competitions (Kaggle, COVID-19 Hackathon)
- Groups seeking data science expertise (Crowdcast, CovidCompare, covid19-healthsystemcapacity)
- Organizations seeking computing power (Folding@Home, Rosetta@home)
Moreover, there are increasingly a number of open-source data sets available for those willing to contribute to the effort. In our own efforts, for example, we have made use of the following data:
- There is the popular 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE which contains confirmed, recovered and deceased cases of COVID-19 around the world. For the USA, it can provide some of this information at state and county level too. The same information could also be retrieved from the following source in different formats. Bear in mind the data sets are not perfect. They contain inaccuracies and duplicated entries, but they should provide a good basis for getting a reasonable understanding of how the virus spreads around the globe.
- The following website has information regarding total beds and ICU units from multiple hospitals across the USA. It also estimates their current capacity. Similar information can also be retrieved from the following online spreadsheet.
- The COVID Tracking Project has information regarding COVID-19 tests for multiple states in the USA, along with a breakdown of whether they were positive or negative.
- In the interest of comparing COVID-19’s average days of staying in hospital against other diseases, the OECD website contains very useful information for multiple diseases and for many countries.
- Hospital admission rates for the USA can be retrieved from here. For state level hospital admission rates, there is a breakdown here.
Using these (and other such data), construction of time series models that predict future cases of COVID-19 for different geographic regions, as well as forecast hospital admissions and assess when maximum capacity will be reached for a given region.
For example consider the following SEIR (Susceptible-Exposed-Infected-Resistant) dashboarding application developed with H2O Q and H2O Driverless AI that is automatically updated as new daily data is made available.
The application first takes as input (in addition to the available data) selected hospital and demographic input for a given hospital system. Then, using the selected parameters, new cases can be forecast for a given region with daily updates:
Second, using publically-available hospital bed data for a given region, capacity assessment for both overall hospital bed usage and ICU bed usage can be made:
Then, based on the latest data, flags and warnings can be designed and implemented.
Other simple, but useful applications are also possible. In some areas, substantial progress has already been made. Image processing, for example, has been found to be useful in the effective diagnosis of COVID-19. Likewise, using EHR (electronic health record) data, it is possible to identify variables associated with severe complications. Currently, there are a number of pharmaceutical research firms using AI for COVID-19 drug development. Further applications might include assessment of the impact of the virus against economic indicators and/or understanding the impact of weather in the spreading of COVID-19.
In the end, it seems fruitful to explore areas of application where data science can contribute to the efforts to understand and combat COVID-19. Our hope is that, by joining forces, data scientists and medical practitioners can make effective and significant progress in these efforts.