March 20th, 2020

Summary of a Responsible Machine Learning Workflow

RSS icon RSS Category: Data Science, Deep Learning, Machine Learning, Machine Learning Interpretability, Neural Networks, Python, Responsible AI
Brain Pattern

A paper resulting from a collaboration between H2O.AI and BLDS, LLC was recently published in a special “Machine Learning with Python” issue of the journal, Information ( In “A Responsible Machine Learning Workflow with Focus on Interpretable Models, Post-hoc Explanation, and Discrimination Testing,” coauthors, Navdeep Gill, Patrick Hall, Kim Montgomery, and Nicholas Schmidt compare model accuracy and fairness metrics for two types of constrained, explainable models versus their non-constrained counterparts.







Explainable Neural Networks

Explainable neural networks (XNN’s) are neural networks with architecture constraints that make the trained network easier to interpret [Vaughan, et. al. 2018].  The below diagram shows typical XNN architecture:

[Vaughan, et. al. 2018]

The XNN network consists of several layers. The initial projection layer calculates linear combinations of the input features.  Then each linear combination of input features is fed to a separate subnetwork, hk.  The subnetworks learn a nonlinear function of the projection layer outputs, referred to as the ridge function.  Finally, the network calculates a linear combination of the subnetwork outputs. To promote sparsity, which increases the interpretability of the model, regularization terms were added to both the initial projection layers and the output layer. 

The calculation performed by the XNN network resembles a generalized additive model,

f(x) = +1 h1(1T x)+2 h2(2T x)+… +k hk(kT x)

except the learned nonlinear functions are functions of linear combinations of features instead of single features.

Monotonic GBM

Monotonic GBM (MGBM) is a standard gradient boosting algorithm for which the trees are constrained so that the splits in the decision trees obey user-defined monotonicity constraints with respect to the input features and the target.  The monotonicity constraints were defined using domain knowledge.


The models were trained on two datasets.  The first was a simulated dataset produced by a known generating function based on a signal generating function proposed be Friedman.

f(x) = 10 sin(x1x2) +20 (x3 – 0.5)2 +10 x4 + 5 x5

Added to the Friedman model were two binary features and a 5 level categorical feature added for complexity, two binary class-control features for discrimination testing, and a noise term drawn from a logistic distribution.

The second dataset was a mortgage dataset taken from a set of consumer-anonymized loans from the Home Mortgage Disclosure Act (HDMA) database for which the objective was to predict whether the loans were “high-priced” compared to similar loans.


Below are the ridge functions learned by the XNN network for the simulated data.   From the output weights in Figure A, only 5 of the ridge functions contributed significantly to the final result.  Figure B shows the ridge functions that were learned by each of the important subnetworks. Figure C shows the weights of the original features input from the projection layer into each subnetwork.

All of the inputs to the subnetworks except for subnetwork 4 were sparse in terms of the number of original features important to the subnetwork. The input to subnetwork 1 was dominated by the  Friedman x3 function and had a shape that is roughly quadratic as would be expected. The projection layer inputs to subnetwork 4 were rather complicated, but did find the correct 2:1 ratio of the x4 and x5 terms from the generating equation.  Subnetworks 7, 8, and 9 received inputs from Friedman functions x1 and x2 and reflect the network’s attempt to model the nonlinear    sin(x1 x2) function which is difficult to represent due to the architectural restrictions of the network.

The ridge functions learned from the mortgage data were mostly piecewise linear functions calculated on relatively complicated combinations of the input features.

MGBM Tree Shap Feature Importance

XNN Deep Shap Feature Importance

Interestingly, the Shapley feature importance for the MGBM and XNN models were quite different.  For the MGBM model the loan to value ratio was most important followed by debt to income ratio, property value, and loan amount.  The XNN model was less dominated by one feature and depended heavily on property value, loan amount, no introductory rate period flag, introductory rate period, and loan to value ratio.  The XNN’s ability to use features more robustly may have led it to have better performance over the MGBM.

The neural network models outperformed the gradient boosting models for both the simulated data and the housing data, with the constrained XNN models producing similar results to the unconstrained artificial neural networks. The below results are from the mortgage test data. These results indicate that the XNN’s may provide a more interpretable alternative to standard neural networks for at least some problems.

Model Accuracy AUC Logloss
GBM 0.795 0.828 0.252
MGBM 0.765 0.814 0.259
ANN 0.865 0.871 0.231
XNN 0.869 0.868 0.233

The below table shows fairness statistics calculated for the two restricted models.  FPR is the ratio of false positives between the protected and control class where larger values indicate less fairness for the protected class.  AIR is the ratio between the proportion of the protected class that receives favorable outcomes and the proportion of the control class that receives favorable outcomes. AIR values significantly below 1.0 indicate a model is less fair toward the protected class.

Model Protected Class Control Class AIR FPR


MGBM Black White 0.776 2.10
Female Male 0.948 1.15
XNN Black White 0.743 2.45
Female Male 0.955 1.21

In the racial category, the XNN model had somewhat weaker fairness results, while in the gender category the results were quite similar. It’s not clear that either of the restricted models would help would affect fairness in general.


The paper presents an approach to training explainable models and testing their accuracy and fairness and tests that approach on both simulated data and a more realistic set of mortgage data.  In the examples studied in the paper, the neural network models outperformed the gradient boosting models and the XNN’s were able to provide some interpretability advantages with little or no loss in model accuracy.

About the Author

Kim Montgomery

Kim has a Ph.D. in applied mathematics, with a background in both predictive modeling and differential equations. She has significant experience applying mathematical modeling to problems in the energy industry and in the biosciences. She is a Kaggle grandmaster and has been ranked as high as 15th in the overall Kaggle rankings.  She’s excited to be applying her skills at

Leave a Reply logra gran posicionamiento en integridad de visión en el cuadrante Visionarios del Cuadrante Mágico de Gartner 2021 para Data Science y Machine Learning

En, nuestra misión es democratizar la IA y creemos que impulsar el valor de

April 11, 2021 - by Read Maloney, SVP of Marketing
Safer Sailing with AI

In the last week, the world watched as responders tried to free a cargo ship

April 1, 2021 - by Ana Visneski, Jo-Fai Chow and Kim Montgomery
H2O AI Hybrid Cloud: Democratizing AI for Every Person and Every Organization

Harnessing AI's true potential by enabling every employee, customer, and citizen with sophisticated AI technology

March 24, 2021 - by Parul Pandey é a mais avançada por sua capacidade de execução no quadrante dos visionários no relatório do Gartner de Ciências de Dados e Machine Learning em 2021

*Este artigo foi originalmente escrito em inglês pelo SVP de Marketing, Read Maloney, e traduzido

March 16, 2021 - by Read Maloney, SVP of Marketing Placed Furthest in Completeness of Vision in 2021 Gartner Data Science and Machine Learning Magic Quadrant in the Visionaries Quadrant.

At, our mission is to democratize AI, and we believe driving value from data

March 9, 2021 - by Read Maloney, SVP of Marketing
Learning from others is imperative to success on Kaggle says this Turkish GrandMaster

In conversation with Fatih Öztürk: A Data Scientist and a Kaggle Competition Grandmaster. In this series

February 15, 2021 - by Parul Pandey

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img