February 26th, 2019
How to explain a model with H2O Driverless AIRSS Share Category: Data Science, Explainable AI, H2O Driverless AI, Machine Learning Interpretability
By: Vinod Iyengar, VP of Products
The ability to explain and trust the outcome of an AI-driven business decision is now a crucial aspect of the data science journey. There are many tools in the marketplace that claim to provide transparency and interpretability around machine learning models but how does one actually explain a model?
H2O Driverless AI provides robust interpretability of machine learning models to explain modeling results. With these built in capabilities, everyone including expert and junior data scientists, domain scientists, and data engineers can develop trusted machine learning models and explain them without much complexity.
H2O.ai’s very own Patrick Hall has put together a collection of resources to help guide Driverless AI users around the Machine Learning Interpretability (MLI) capabilities built into the platform. This blog will focus on the Machine Learning Interpretability walkthrough video for Driverless AI and the MLI cheat sheet that goes along with the video.
First, adjust the overall settings in Driverless AI to create a more interpretable model.
Set interpretability to >=7, which will result in:
- Feature selection (less variables in your model)
- Monotonicity constraints (as a modeling input variable value increases, the model predictions will also always increase or always decrease)
- Simpler data transformations before model fitting, meaning the inputs to your final model will be more directly interpretable
Also, just like setting a random seed in your favorite modeling package, be sure to click on the ‘Reproducible’ button to ensure repeatable and reproducible results.
During the Driverless AI training process the system will attempt to create new features from the original features in the data set you provided. The final model will typically be built using a combination of the original features and features the system creates on its own. Because of the settings specified above, the system won’t create too many new features, won’t create extremely complex new features, and will make sure the relationship between the original and new features and the model predictions is explainable. Once the model is trained, all the interactive charts described below can be used to explore details about the model. Let’s start with global Shapley feature importance and continue on from there.
Global Shapley Feature Importance
Global Shapley feature importance provides an overall view of the drivers of your model’s predictions. Global Shapley values are reported for original features and any feature the Driverless AI system creates on its own.
Global Shapley values:
- The average numerical impact of original features and features the Driverless AI system derived on its own that are used in the final Driverless AI model
- Positive features which push the model’s prediction higher on average; negative features which push predictions lower on average
- The average model prediction
Global Original Feature Importance
Global original feature importance provides an approximate overall view of how your original features affect model predictions.
Partial dependence shows the average Driverless AI model prediction and its standard deviation for different values of important original features. This helps you understand the average model behavior for the most important original features.
Global Surrogate Decision Tree
The global surrogate decision tree provides an overall flowchart of the Driverless AI model’s decision making processes based on the original features.
The surrogate decision tree shows:
- Higher and more frequent features in the tree that are more important to the Driverless AI model than lower or less frequent variables
- The logic behind model decisions and their associated approximate predicted numeric values
- Features above or below one-another that can indicate an interaction
- Thickest edges (lines) that are the most common logic through the tree, and likely through the Driverless AI model
Global Interpretable Model
The global interpretable model is a linear model of the Driverless AI model predictions.
The interpretable, global linear model of the Driverless AI predictions shows:
- The Driverless AI model predictions ranked from lowest to highest
- The Driverless AI model predictions compared to a linear model
- Quantifies the nonlinearity of the Driverless AI model with the linear model R-squared statistic
- A basic sanity check of Driverless AI model performance by plotting actuals vs. predicted
Local Shapley Feature Importance
Local Shapley feature importance shows how each feature directly impacts each individual row’s prediction.
Local Shapley values:
- Are similar to global Shapley values but show the numeric impact of each feature for each Driverless AI model prediction
- Are accurate, consistent, and likely suitable for creating reason codes and adverse action codes
- Always add up to the model prediction
Local Linear Explanations
Local linear explanations show the local linear trends around an individual row (they are derived using the LIME technique). They pair nicely with local Shapley values. The local Shapley values give a point estimate for how a feature impacts an individual row’s prediction, while local linear explanations tell us about the trends of each feature for the same row’s prediction.
Local Surrogate Tree Decision Path
The local surrogate decision tree path shows how the logic of the model is applied to any given individual.
The decision path:
- Shows approximately how row values impact Driverless AI model predictions for that row by showing the row’s path through the decision tree surrogate flow chart
- The path can also show local interactions that affect model predictions for this row
Original Feature Individual Conditional Expectation (ICE)
Individual conditional expectation (ICE) shows how an individual prediction changes when one of the input’s feature’s values are changed. If ICE values are different from partial dependence this can also help confirm interactions spotted in the surrogate decision tree.
Local Original Feature Importance
Like the Shapley values in step 6, local original feature importance shows the original features that drive a prediction for an individual row (The Shapley values in step 6 show importance for the new and original features — not just the original features).
As you can see the MLI module of H2O Driverless AI can tell you a lot about how your model is behaving. As this is a relatively new area of research and development, stay tuned for more features like these! Also, check out our Explainable AI page for more resources related to machine learning interpretability.