July 28th, 2020

Exploring the Next Frontier of Automatic Machine Learning with H2O Driverless AI

RSS icon RSS Category: AutoML, Driverless AI

At H2O.ai, it is our goal to democratize AI by bridging the gap between the State-of-the-Art (SOTA) in machine learning and a user-friendly, enterprise-ready platform. We have been working tirelessly to bring the SOTA from Kaggle competitions to our enterprise platform Driverless AI since its very first release. The growing list of Driverless AI features and our growing team of Kaggle Grandmasters and industry expert data scientists can be seen as our effort and commitment to achieve that goal.

Today, we are excited to announce the availability of our latest Driverless AI release 1.9 which comes with tons of new features. This article is the first of the 1.9 release blog series. It provides a quick overview of the new features. There will be more blog posts about individual new features in the coming weeks so watch this space. You should also check out this webinar by Arno Candel and Dan Darnell.

Without further ado, here is a list of the new features in 1.9:

  • Automatic image recognition (webinar)
  • State-of-the-Art Text Analytics with BERT (webinar)
  • End-to-End Model Deployment and Operation with MLOps
  • Improved GUI/UX for machine learning interpretability
  • Shapley values for original features
  • Automatic project leaderboard
  • Uncertainty quantification for regression
  • Custom visualizations
  • Advanced time-series with epidemic SEIRD model
  • Zero-inflated model
  • Multi-layer hierarchical feature engineering
  • Multi-node support for model training

That’s a lot to go through. I will try to keep it short and sweet.

Automatic Image Recognition

Yes, we heard you. Image recognition is one of the most common questions/requests from our users. After months of hard work and rigorous testing, we can now present the first version of automatic image recognition in Driverless AI.

This is the brainchild of our Kaggle Grandmaster Yauhen Babakhin and team. The idea is to mimic what Yauhen would do when he faces a new image recognition challenge. In order to automate the most time-consuming tasks, our team implemented two key features:

  • Pre-trained image transformers – transforming images into vectors.
  • Automatic image model – automatic model training strategy optimization, hyperparameters tuning, image augmentation as well as model inspection for sanity checks and debugging. The automatic image model allows our users to get more information about the best individual model in the insights tab.

More importantly, we can turn the image transformer into a production-ready MOJO pipeline.

For more information, check out this webinar.

State-of-the-Art Text Analytics with BERT

Bidirectional Encoder Representations from Transformers (BERT) achieved SOTA results on a number of natural language processing (NLP) tasks. Our in-house NLP experts Sudalai Rajkumar (SRK), Maximilian Jeblick, and Trushant Kalyanpur have been working hard on the BERT implementation for the 1.9 release. This enables our users to leverage SOTA techniques based on a variety of BERT models and transformers in our latest Driverless AI release out-of-the-box.

The base BERT model can be further extended for domain specific problems using recipes. Users can also productionize the models using either C++ MOJO or Python pipeline.

For more information, check out this webinar.

End-to-End Model Deployment and Operation with MLOps

We are releasing MLOps to automate the end-to-end model life cycle with Driverless AI. The key capabilities of MLOps are:

  • Model management – an easy collaboration with projects workspace and model store.
  • Model deployment – deploy models to different environments (cloud/on-premises).
  • Model monitoring – monitor specific metrics or parameters.

For more information, check out our product page.

Improved GUI/UX for Machine Learning Interpretability

Results from individual analysis will appear on the screen as soon as they become available. This should give our users a better overall user experience.

Shapley Values for Original Features

We also added Kernel Shapley for the original features. This should make it easier for our users to visualize and compare the contribution from each original feature.

Automatic Project Leaderboard

The new project leaderboard feature makes it easy to run multiple diverse experiments. This is useful for estimating the model complexity and accuracy trade-offs. It also keeps track of the expert settings and modeling constraints so our users can quickly generate different and diverse models within a defined search space.

Uncertainty Quantification for Regression

This is another common feature request. We added empiric confidence bands based on actual model behavior on holdout data for regression problems.

Custom Visualizations

We have extended the visualization interface to allow users’ inputs for custom visualizations.

Advanced Time-Series: Epidemic SEIRD Model

Over the last couple of months, some of our Kaggle Grandmasters participated and placed in top rankings for Kaggle COVID-19 forecasting competitions. They also published an article on the backtesting of covid forecast.

Susceptible-Exposed-Infected-Recovered-Dead (SEIRD) is one of the useful methods for epidemic forecasting use cases. We have integrated it with Driverless AI to further enhance our time-series modeling capabilities.

Read more about our COVID-19 response here.

Zero-inflated Model

For regression problems with many zeros in the dataset (such as loan default), zero-inflated model (a combination of binary classification and regression) could be a better solution than the standard models. This is now available in 1.9 out-of-the-box thanks to Ryan Chesler.

Multi-layer Hierarchical Feature Engineering

We added a feature for users to configure feature engineering steps (transformers) in multiple stages (layers). This feature allows an optional pre-processing layer for specific custom data cleanup/conversions. Subsequent layers can also take each previous layer’s output as input. Our users can now create complex feature engineering pipelines in a flexible manner.

Multi-node Support for Model Training

Driverless AI can now be configured to run in a multi-node worker mode. This allows users to scale up the training process when they need to complete multiple experiments in a short amount of time.

Note: This new multi-node feature is in a preview (alpha) stage. If you are interested in using multi-node configurations, please contact support@h2o.ai. A single experiment runs entirely on one machine. For this reason, using a large number of commodity-grade hardware is not useful in the context of multi-node.

How to Get Started?

If you are new to Driverless AI, we would recommend our risk-free, web-based test drive in H2O Aquarium Cloud. Each lab session lasts for two hours and you can keep trying our software for free. No license key required. We also have self-paced tutorials to guide you through the journey. Note: We are in the process of updating the materials to Driverless AI 1.9. The new tutorials should be available in the coming weeks.

For existing users with license keys, please download the latest version from our website. You can also find the links to different cloud marketplaces on the same page.

I hope you enjoy reading this quick overview. Please give it a spin and share your experience with us.

Learning Resources

Acknowledgements

I would like to thank my colleagues for all the technical details and feedback. Driverless AI is the result of continuous team effort led by Arno. To illustrate, let me just leave a screenshot of Arno’s GitHub page here.

Until next time,

Joe

About the Author

Jo-Fai Chow

Jo-fai (or Joe) has multiple roles (data scientist / evangelist / community manager) at H2O.ai. Since joining the company in 2016, Joe has delivered H2O talks/workshops in 40+ cities around Europe, US, and Asia. Nowadays, he is best known as the H2O #360Selfie guy. He is also the co-organiser of H2O's EMEA meetup groups including London Artificial Intelligence & Deep Learning - one of the biggest data science communities in the world with more than 11,000 members.

Before joining H2O, he was in the business intelligence team at Virgin Media where he developed data products to enable quick and smart business decisions. He also worked remotely for Domino Data Lab as a data science evangelist promoting products via blogging and giving talks at external events.

Joe has a background in water engineering. Before his data science journey, he was an EngD researcher at STREAM Industrial Doctorate Centre working on machine learning techniques for drainage design optimisation. Prior to that, he was an asset management consultant specialised in data mining and constrained optimisation for the utilities sector in UK and abroad. He also holds a MSc in Environmental Management and a BEng in Civil Engineering.

Long before Joe immersed himself in the world of open-source R and Python, he learned his trade as an avid MATLAB user. When he was a kid, his parents taught him one of the famous old Chinese sayings - when one drinks water, one must not forget where it comes from. So when Twitter asked Joe to be creative, he simply put down @matlabulous as his handle.

Leave a Reply

Automate your Model Documentation using H2O AutoDoc

Create model documentation for Supervised learning models in H2O-3 and Scikit-Learn — in minutes. The Federal Reserve’s

November 19, 2020 - by Parul Pandey
Mitos e verdades sobre o AutoML

Todas as revoluções que tivemos até hoje, tanto as tecnológicas quanto industriais, possuem uma semelhança:

November 10, 2020 - by Alan Silva and Bruna Smith
Maximizing your Value from AI

Some organizations have already identified the benefits that can be gained from Artificial Intelligence and

November 9, 2020 - by Eve-Anne Tréhin
AI in the Financial Industry: 8 Key Takeaways from the Bill.com + H2O.ai Fireside Chat

The current global pandemic crisis presents various challenges to businesses in all industries, including financial

November 5, 2020 - by Ian Gomez and Bruna Smith
Fallback Featured Image
The Importance of Explainable AI

This blog post was written by Nick Patience, Co-Founder & Research Director, AI Applications &

October 30, 2020 - by
Building an AI Aware Organization

Responsible AI is paramount when we think about models that impact humans, either directly or

October 26, 2020 - by

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img