June 9th, 2020
From GLM to GBM – Part 1RSS Share Category: Data Science, Explainable AI, GBM, GLM, Machine Learning Interpretability, Responsible AI, Shapley
By: Patrick Hall and Michael Proksch
How an Economics Nobel Prize could revolutionize insurance and lending
Part 1: A New Solution to an Old Problem
Insurance and credit lending are highly regulated industries that have relied heavily on mathematical modeling for decades. In order to provide explainable results for their models, data scientists and statisticians in both industries relied heavily on generalized linear models (GLMs). However, this post argues that it’s now time for a transition to a more contemporary approach. Specifically, we address the transition toward using a newer type of machine learning (ML) model, gradient boosting machines (GBMs). GBMs are not only more sophisticated estimators of risk, but due to a Nobel-laureate breakthrough known as Shapley values, they are now seemingly just as interpretable as traditional GLMs. More nuanced risk estimation means less payouts and write-offs for policy and credit issuers, but it also means a broader group of customers can participate in mainstream insurance and credit markets. The high levels of interpretability provided by Shapley values enable you to explain a model prediction to customers, business partners, model validation teams, and regulators. In short, all this means fewer losses, more customers, and managing regulatory requirements! Are you interested? Scroll down to learn more.
GLMs: Fulfilling Regulatory Requirements and Business Needs
In the U.S., a host of federal and state regulations govern the use of predictive models in insurance and financial services. For example, last year the Department of Financial Services in New York issued a guidance letter to life insurers addressing the use “of unconventional sources or types of external data … including within algorithms and predictive models,” stating policy pricing must exclude “race, color, creed, national origin, status as a victim of domestic violence, past lawful travel, or sexual orientation in any manner, or any other protected class” to avoid discrimination. Of course, AI regulation is not just an American phenomenon, it is an international one. The governments of at least Canada, Germany, the Netherlands, Singapore, the U.K., and the U.S. have either proposed or enacted AI-specific regulations. Generally these existing and proposed statutes aim to ensure predictive modeling processes are accurate, monitored, non-discriminatory, stable, secure and transparent.
It’s well known that GLMs are highly stable and transparent models. In the hands of “regression artists” at large insurance and financial institutions, they can also be highly accurate. Institutions have spent years optimizing GLM approaches and building documentation, discrimination testing, monitoring and validation processes around GLMs. Hence, it’s for a combination of reasons, both mathematical and process-oriented, that GLMs are still used so widely today. New methodologies, however, can provide a more accurate, flexible, and now, highly transparent way to create more advanced predictive models.
The Benefits of Shapley Values and GBM vs. GLM
GLMs have proven to be of substantial value to insurance and banking. Yet, new algorithms can outperform linear models, in accuracy and even in explainability. In particular, GBMs can account for non-linearity between a predictor and dependent variable and interactions between predictor variables without a lot of manual tweaking. This is why GBMs usually show a higher accuracy in out of sample predictions vs. GLMs, leading to significant business impacts. (This will be demonstrated in Part II of this blog post.)
The interpretation of GBMs can be based on several techniques, such as accumulated local effect (ALE) plots, individual conditional expectation (ICE) plots, partial dependence plots, surrogate models, variable importance, and most recently, Shapley values. Shapley values are named after Lloyd Shapley (1923 – 2016), a mathematician, economist and code breaker in World War II. Shapley introduced his method in 1953 and won the Nobel Prize in Economics for his inventions in 2012.
From the 1960s onward, Lloyd Shapley used what is known as Cooperative Game Theory to study different matching methods. In a cooperative game in which the payoff must be attributed to players who have made unequal contributions, the Shapley value determines the fairest distribution of payoffs. For example, the Shapley value can be used to determine what each member of a group should pay in a restaurant when everyone shares their food. The theory of Shapley values are based on four axiomatic principles, efficiency, symmetry, missigness, and additivity, which makes this method unique compared to other explanation techniques — even our beloved regression coefficients! Moreover, research has shown that Shapley values, compared to other explanations, have a much stronger overlap with human explanations.
The Difference Between GBM and GLM: A Scenario Analyses
One of the first steps to trusting a new technique is seeing how it works compared to what you’re used to doing. The rest of this part of the blog will do just that for GBMs and Shapley values vs. GLMs. One of the most important differences between Shapley values and GLM incremental value calculations is that they have different baselines:
- GBM and Shapley value baseline: the average of the model predictions in a chosen dataset (often the training dataset).
- GLM baseline: the average of the known outcome in the training data when the values of all input variables are equal to 0.
So, while the baseline (or intercept) for a linear model is the average expected outcome when predictor variable values equal 0, Shapley values always include all observations and variables when they evaluate the impact of just one variable! Doesn’t this seem more realistic? What event in the real-world occurs in a completely independent manner?
Let’s have a look at a more detailed example to see a few more basic differences. In terms of car insurance, when the insurance holder does not drive at night time, as only 20% of the drivers do not, they have a lower tendency to have an accident. In a simple example modeling scenario, Shapley values show a savings for that customer compared to the average driver. In comparison, GLMs use as a baseline the non-driver and add the additional risk pay-out value to that baseline. Knowing that difference allows an accurate comparison and calculation of incremental values in both cases. In order to show more differences between GBMs and GLMs, in accuracy and in explanatory power, we want to introduce three scenarios in the context of our car insurance example: a linear scenario, a non-linear scenario, and an interaction scenario (see Figure 1).
Figure 1: Three scenarios to show the difference between GBM and GLM.
Suppose a dataset contains 50 observations for each scenario. In all three scenarios the pay-out without the characteristic, say driving at night, is 50 coins (see Figure 2). In the linear relationship scenario, 10 members are night-drivers, who cause a loss risk of an extra 50 coins each. The result shows that both models are 100% accurate and attribute 500 coins in total (to all 10 members). In the linear case, both models are accurate and attribute the correct number of coins to the characteristic. The second scenario includes a non-linear relationship between predictor and dependent variables. We now also look at how often a person is driving at night. If you only drive once a week, you cause a loss risk of extra 50 coins (8 customers). If you drive twice or more per week, it is 70 coins (2 customers). The final attributed coins should add up to 540. However, only the GBM shows 540 as the result with an error of 2%, while the GLM shows a prediction error of 4% and 503 attributed coins as the result. The last scenario includes an interaction of the variables night driving and a commute time greater than 1 hour. Each variable is adding 50 coins as possible loss, but together they add 200. The result should be 3000 coins, but only the GBM shows the correct attribution with a prediction error of 0%.
Figure 2: GBM vs. GLM results for the three example scenarios.
Although we have only shown the overall sum of the attribution of the outcome to the different predictor variables, the per-customer attribution via Shapley values is also likely to be more accurate. Per-customer attributions will be discussed further in Part II.
In this blog post we have shown that both models GLM and GBMs can be highly accurate and are able to provide valid explanations and attributions in less realistic, linear, and independent scenarios. As soon as we dive deeper into more complex scenarios, GBMs are more accurate in predicting non-linear and interacting relationships and they also provide accurate results for each variables’ incremental impact. Part II of this blog post will dive even deeper into the comparison of the different models and outcomes on a more realistic credit card default example. In the meantime, if you’re looking to compare GLMs, GBMs, and Shapley values for yourself, you can checkout free and open source H2O-3 (or XGBoost, or shap), or try H2O Driverless AI.