November 14th, 2019
Novel Ways To Use Driverless AIRSS Share Category: H2O Driverless AI, Machine Learning Interpretability
By: Thomas Ott
I am biased when I write that Driverless AI is amazing, but what’s more amazing is how I see customers using it. As a Sales Engineer, my job has been to help our customers and prospects use our flagship product. In return, they give us valuable feedback and talk about how they used it.
Feedback is gold to us. Driverless AI has evolved into its current iteration because of feedback. Customers and prospects tell us what they like and what they want to see in the product. It then takes a few Github requests and it shows up in the product after a few weeks (Makers Gonna Make – thanks Dev team)!
Relative Feature Importance Chart
One of the big items in Driverless AI is its feature engineering. Our ‘secret sauce’ extracts out more model performance when combined with model tuning and selection. You can see these new features list in the lower middle part of the UI. This feature importance list gets updated with every model iteration. If ensembling is on, the final list is the ensemble feature importance list. I had a prospect that had a model already in production and had identified the top 5 features in their model. They loaded the same training data set into Driverless, hit run, and inspected the top 5 features that it created.
Not all top 5 features from the production model were in the Driverless AI feature plot, they ended up in the top 10! Instead, there were new interactions between the original top 5 features! Not only did Driverless AI do this, but it also showed that the production model can be squeezed for more performance! The prospect was quite impressed!
Some customers and prospects like to use Driverless AI as a benchmarking tool. They use it to come up with a model and feature pipeline with a specific performance metric (AUC, MAPE, etc) and then try to beat it with their code.
They spend time looking through the log and trace files to see how Driverless AI hyper-parameter tuned a specific model. Then they look at the feature engineering and see what interactions it came up with. The last step is for them to use their Domain Expertise to code up their custom pipeline from scratch. Sometimes they beat Driverless AI, sometimes they don’t!
It goes without saying that our MLI module is invaluable for Model Explainable way. The dashboard is packed with Shapley, K-Lime, and Disparate Impact Analysis. It allows you to download a scoring pipeline so you can get the reason codes for each prediction. It’s a very complex but feature-rich module, and our customers love it.
One quiet superpower it has is the ability to take a training and prediction set from another black box model and run MLI on it! You can generate reason codes, use K-Lime or K-Sup, and look inside your model to make sure it makes sense!
The best part? The dashboard, scoring pipeline, and the ability to check other models outside of Driverless AI come with every licensed instance!
Want to learn more about Driverless AI capabilities? Check out this free MLI tutorial.
Some business managers and data scientists sit up in their chairs when I demo this feature. Their eyes light up and the questions start. “Can I add my own logos?” Yes. “Does it show the final ensemble model?” Yes. “Can this report be generated for every experiment?” Yes, if you want it to.
The reason AutoDoc is a hit is because it saves time. A LOT OF TIME. More often than not, large companies need to document the models in production. This can be for many reasons (i.e. regulatory) and writing the docs can be a time-consuming process. Click on “Download Autoreport” after the Experiment finishes and 85% of your work is ready. The best part? We include AutoDoc in every Driverless AI license.
Last but not least, the scoring pipelines. Some prospects and customers love this feature. In the early days of Machine Learning, people wrestled with building and running models in production. While we still wrestle with this problem today, we’ve since ‘streamlined’ the process into pipelines.
A pipeline is bits of code that might munge or transform data into the same shape that the model trained on. Then it might pass to another piece of the pipeline that contains a tuned model for inferencing/scoring, and then out come the predictions.
You can, of course, write all this by hand – and data scientists often do – but it takes a great deal of time! With a click of a button, Driverless AI lets you generate a Python, C++, or Java pipeline. All you need to do of all the feature transformations, model tuning, and selection. All you need to do is pass your scoring data outcome the predictions. If you couple it with the MLI scoring pipeline, you can also get reason codes along with your predictions.
Over time, I’ve noticed that everyone who touches H2O Driverless AI gravitates to a specific feature of it. The main reason? It saves them time. Need to try new use cases? Give it to Driverless AI to run through while you work on other things. Or you a startup that’s looking be nimble and build your competitive advantage? Let Driverless AI build your models to generate your scoring pipelines.
The possibilities are endless. What will you Make today?