July 28th, 2020

Exploring the Next Frontier of Automatic Machine Learning with H2O Driverless AI

RSS icon RSS Category: AutoML, Driverless AI

At H2O.ai, it is our goal to democratize AI by bridging the gap between the State-of-the-Art (SOTA) in machine learning and a user-friendly, enterprise-ready platform. We have been working tirelessly to bring the SOTA from Kaggle competitions to our enterprise platform Driverless AI since its very first release. The growing list of Driverless AI features and our growing team of Kaggle Grandmasters and industry expert data scientists can be seen as our effort and commitment to achieve that goal.

Today, we are excited to announce the availability of our latest Driverless AI release 1.9 which comes with tons of new features. This article is the first of the 1.9 release blog series. It provides a quick overview of the new features. There will be more blog posts about individual new features in the coming weeks so watch this space. You should also check out this webinar by Arno Candel and Dan Darnell.

Without further ado, here is a list of the new features in 1.9:

  • Automatic image recognition (webinar)
  • State-of-the-Art Text Analytics with BERT (webinar)
  • End-to-End Model Deployment and Operation with MLOps
  • Improved GUI/UX for machine learning interpretability
  • Shapley values for original features
  • Automatic project leaderboard
  • Uncertainty quantification for regression
  • Custom visualizations
  • Advanced time-series with epidemic SEIRD model
  • Zero-inflated model
  • Multi-layer hierarchical feature engineering
  • Multi-node support for model training

That’s a lot to go through. I will try to keep it short and sweet.

Automatic Image Recognition

Yes, we heard you. Image recognition is one of the most common questions/requests from our users. After months of hard work and rigorous testing, we can now present the first version of automatic image recognition in Driverless AI.

This is the brainchild of our Kaggle Grandmaster Yauhen Babakhin and team. The idea is to mimic what Yauhen would do when he faces a new image recognition challenge. In order to automate the most time-consuming tasks, our team implemented two key features:

  • Pre-trained image transformers – transforming images into vectors.
  • Automatic image model – automatic model training strategy optimization, hyperparameters tuning, image augmentation as well as model inspection for sanity checks and debugging. The automatic image model allows our users to get more information about the best individual model in the insights tab.

More importantly, we can turn the image transformer into a production-ready MOJO pipeline.

For more information, check out this webinar.

State-of-the-Art Text Analytics with BERT

Bidirectional Encoder Representations from Transformers (BERT) achieved SOTA results on a number of natural language processing (NLP) tasks. Our in-house NLP experts Sudalai Rajkumar (SRK), Maximilian Jeblick, and Trushant Kalyanpur have been working hard on the BERT implementation for the 1.9 release. This enables our users to leverage SOTA techniques based on a variety of BERT models and transformers in our latest Driverless AI release out-of-the-box.

The base BERT model can be further extended for domain specific problems using recipes. Users can also productionize the models using either C++ MOJO or Python pipeline.

For more information, check out this webinar.

End-to-End Model Deployment and Operation with MLOps

We are releasing MLOps to automate the end-to-end model life cycle with Driverless AI. The key capabilities of MLOps are:

  • Model management – an easy collaboration with projects workspace and model store.
  • Model deployment – deploy models to different environments (cloud/on-premises).
  • Model monitoring – monitor specific metrics or parameters.

For more information, check out our product page.

Improved GUI/UX for Machine Learning Interpretability

Results from individual analysis will appear on the screen as soon as they become available. This should give our users a better overall user experience.

Shapley Values for Original Features

We also added Kernel Shapley for the original features. This should make it easier for our users to visualize and compare the contribution from each original feature.

Automatic Project Leaderboard

The new project leaderboard feature makes it easy to run multiple diverse experiments. This is useful for estimating the model complexity and accuracy trade-offs. It also keeps track of the expert settings and modeling constraints so our users can quickly generate different and diverse models within a defined search space.

Uncertainty Quantification for Regression

This is another common feature request. We added empiric confidence bands based on actual model behavior on holdout data for regression problems.

Custom Visualizations

We have extended the visualization interface to allow users’ inputs for custom visualizations.

Advanced Time-Series: Epidemic SEIRD Model

Over the last couple of months, some of our Kaggle Grandmasters participated and placed in top rankings for Kaggle COVID-19 forecasting competitions. They also published an article on the backtesting of covid forecast.

Susceptible-Exposed-Infected-Recovered-Dead (SEIRD) is one of the useful methods for epidemic forecasting use cases. We have integrated it with Driverless AI to further enhance our time-series modeling capabilities.

Read more about our COVID-19 response here.

Zero-inflated Model

For regression problems with many zeros in the dataset (such as loan default), zero-inflated model (a combination of binary classification and regression) could be a better solution than the standard models. This is now available in 1.9 out-of-the-box thanks to Ryan Chesler.

Multi-layer Hierarchical Feature Engineering

We added a feature for users to configure feature engineering steps (transformers) in multiple stages (layers). This feature allows an optional pre-processing layer for specific custom data cleanup/conversions. Subsequent layers can also take each previous layer’s output as input. Our users can now create complex feature engineering pipelines in a flexible manner.

Multi-node Support for Model Training

Driverless AI can now be configured to run in a multi-node worker mode. This allows users to scale up the training process when they need to complete multiple experiments in a short amount of time.

Note: This new multi-node feature is in a preview (alpha) stage. If you are interested in using multi-node configurations, please contact support@h2o.ai. A single experiment runs entirely on one machine. For this reason, using a large number of commodity-grade hardware is not useful in the context of multi-node.

How to Get Started?

If you are new to Driverless AI, we would recommend our risk-free, web-based test drive in H2O Aquarium Cloud. Each lab session lasts for two hours and you can keep trying our software for free. No license key required. We also have self-paced tutorials to guide you through the journey. Note: We are in the process of updating the materials to Driverless AI 1.9. The new tutorials should be available in the coming weeks.

For existing users with license keys, please download the latest version from our website. You can also find the links to different cloud marketplaces on the same page.

I hope you enjoy reading this quick overview. Please give it a spin and share your experience with us.

Learning Resources

Acknowledgements

I would like to thank my colleagues for all the technical details and feedback. Driverless AI is the result of continuous team effort led by Arno. To illustrate, let me just leave a screenshot of Arno’s GitHub page here.

Until next time,

Joe

About the Author

Jo-Fai Chow

Jo-fai (or Joe) has multiple roles (data scientist / evangelist / community manager) at H2O.ai. Since joining the company in 2016, Joe has delivered H2O talks/workshops in 40+ cities around Europe, US, and Asia. Nowadays, he is best known as the H2O #360Selfie guy. He is also the co-organiser of H2O's EMEA meetup groups including London Artificial Intelligence & Deep Learning - one of the biggest data science communities in the world with more than 11,000 members.

Leave a Reply

What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow
Regístrese para su prueba gratuita y podrá explorar H2O AI Hybrid Cloud

Recientemente, lanzamos nuestra prueba gratuita de 14 días de H2O AI Hybrid Cloud, lo que

May 17, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today