April 22nd, 2015

Deep Learning for Public Safety

RSS icon RSS Category: Uncategorized
This article first appeared on KDnuggets

Contributors: Alex Tellez, Michal Malohlava, Prithvi Prabhu, Hank Roark, Amy Wang.

Download full report
We’ve seen some incredible applications of Deep Learning with respect to image recognition and machine translation but this particular use case has to do with public safety; in particular, how Deep Learning can be used to fight crime in the forward-thinking cities of San Francisco and Chicago. The cool thing about these two cities (and many others!) is that they are both open data cities, which means anybody can access city data ranging from transportation information to building maintenance records. So, if you are a data scientist or thinking about becoming a data scientist, there are publicly available city-specific datasets you can play with. For this example, we looked at the historical crime data from both Chicago and San Francisco and joined this data with other external data, such as weather and socioeconomic factors, using Spark’s SQL context.


Figure 1: Spark + H2O Workflow

We do the data import, ad-hoc data munging (parsing the date column, for example), and joining of tables by leveraging the power of Spark and then publish the Spark RDD as an H2O Frame (Fig. 1).
Figures 2 and 3 below include some cool visualizations we made of the joined table provided by the Flow framework as part of our latest H2O product, which you can download here.

Figure 2: San Francisco crime visualizations


Figure 3: Chicago crime visualizations

Interesting how in BOTH cities crime seems to occur most frequently during the winter – a surprising fact given how cold the weather gets in Chicago!
Using H2O Flow, we were able to look at the arrest rates of every category of recorded crimes in Chicago versus the percentage of total crimes each category represents. Some crimes with the highest arrest rates also occur least frequently, and vice versa.

Figure 4: Chicago arrest rates and total % of all crimes by category
H2O Flow allows users to construct their own custom graphs from imported data. On the left is the code used to generate the graph in Figure 4.

######Figure 5: Creating the custom graph in H2O Flow
Once the data is transformed to an H2O RDD, we train a Deep Neural Network to predict whether or not for a given crime if an arrest is more / less likely to be made. Here are some fun screenshots of our H2O Deep Learning model being tuned inside of Flow and the resulting AUC curve from scoring the trained model against the validation dataset.


Figure 6: San Francisco validation data AUC

Figure 7: Chicago validation data AUC


Figure 8: Geo-mapped predictions

Because each of the crimes reported comes with latitude-longitude coordinates, we scored our hold out data using the trained model and plotted the predictions on a map of Chicago – specifically, the Downtown district. The color coding corresponds to the model’s prediction for likelihood of an arrest with red being very likely (X > 0.8) and blue being unlikely (X < 0.2). Smart analytics + resource management = safer streets.

Leave a Reply

What Are Feature Stores and Why Are They Important?

Machine learning (ML) models are only as good as the data fed into them. In

January 18, 2022 - by Adam Murphy
A Beginner’s View of H2O MLOps

Note: this is a community blog post by Shamil Dilshan Prematunga. It was first published

January 15, 2022 - by Jo-Fai Chow
Shapley Values – A Gentle Introduction

If you can't explain it to a six-year-old, you don't understand it yourself. - Albert

January 11, 2022 - by Adam Murphy
The Bond Market & AI: How MarketAxess Brings it All Together

The vast majority of the equities market trades electronically while the bond market is still

January 11, 2022 - by Ian Gomez
H2O Release 3.36 (Zorn)

There’s a new major release of H2O, and it’s packed with new features and fixes! Among

January 7, 2022 - by Michal Kurka
1st Place Winner’s Blog – Kaggle 2021 Data Science and Machine Learning Survey

Kaggle, the largest global community of data scientists, conducted the 5th annual industry-wide survey that

January 4, 2022 - by Shivam Bansal and KunHao Yeh

Start your 14-day free trial today