February 19th, 2019

What is Your AI Thinking? Part 3

RSS icon RSS Category: Data Science, Driverless AI, Explainable AI, Financial Services, Machine Learning Interpretability

In the past two posts we’ve learned a little about interpretable machine learning in general. In this post, we will focus on how to accomplish interpretable machine learning using H2O Driverless AI. To review, the past two posts discussed:

  • Exploratory data analysis (EDA)
  • Accurate and interpretable models
  • Global explanations
  • Local explanations
  • Model debugging and sensitivity analysis
  • Fairness and disparate impact analysis
  • Model documentation

Let’s start with exploratory data analysis (EDA). EDA enables you to understand your data and form reasonable expectations for the results of your machine learning project. AutoViz in Driverless AI automates many EDA processes, allowing users to find outliers, understand relationships between input variables, and identify potential data quality problems all with just a few mouse clicks. To learn more about AutoViz, watch this recent talk by Leland Wilkinson, H2O.ai’s chief scientist, from H2O World San Fransisco 2019. Below is an example of the visualization capabilities available within AutoViz.

Several types of accurate and interpretable models can be trained automatically by H2O Driverless AI today, including traditional linear models for maximum interpretability. For users who want to try potentially more accurate, nonlinear, and still highly interpretable models, Driverless AI provides Jerome Friedman‘s RuleFit approach and monotonically constrained gradient boosting machines. Have a look at the Driverless AI documentation to see all the modeling options available today.

For global and local explanations H2O Driverless AI offers its cutting-edge machine learning interpretability (MLI) module. MLI enables you to see the overall drivers of model behavior for an entire dataset and to understand the logic behind any one individual model prediction. Driverless AI MLI can even be run on models created by other software packages! See this recent video and accompanying cheat sheet (also shown below) to get a better idea of exactly how MLI makes your models explainable.

Sensitivity or “what-if” analysis to test your model behavior in future mission-critical scenarios and disparate impact analysis to test for potential discrimination in model predictions are conducted using the Driverless AI Python API today. Both are major roadmap items and should be available in the graphical user interface (GUI) soon. Jupyter notebook examples are available for both sensitivity analysis and disparate impact analysis.

Model documentation is yet another process which Driverless AI automates. After each experiment, pertinent information about the trained complex model such as data dictionaries, modeling methodologies, and model assessments are summarized in a single document for human review.  Click here to download a basic sample report. Below is an example of the first two pages included in each report.

From our own internal experience over the past few years, it’s very important to combine all these techniques to create holistic, human-friendly solutions to real business problems. This series of blogs has introduced broad concepts in interpretable, fair, and trustworthy machine learning and highlighted H2O’s implementation progress toward them. To see what we have in store for future iterations of Driverless AI check out my recent talk from H2O World San Fransisco 2019.

P.S. If you are interested in interpretable machine learning for open source H2O there’s always good old linear models! We’ve also recently added monotonic constraints into the gradient boosting machine, provided this Github repo with lots of interpretability goodies for open source H2O, and we will be adding Shapley values for local explanation of tree-based models soon. Stay tuned!

This is the third blog in a 3-part series. You can catch the first and second parts here.

Leave a Reply

What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow
Regístrese para su prueba gratuita y podrá explorar H2O AI Hybrid Cloud

Recientemente, lanzamos nuestra prueba gratuita de 14 días de H2O AI Hybrid Cloud, lo que

May 17, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today