June 21st, 2017

Scalable Automatic Machine Learning: Introducing H2O’s AutoML

Category: AutoML, Ensembles, H2O Release, Technical
Machine for peneteration

Prepared by: Erin LeDell, Navdeep Gill & Ray Peck
Machine for peneteration
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts and experts, alike. The first steps toward simplifying machine learning involved developing simple, unified interfaces to a variety of machine learning algorithms (e.g. H2O).
Although H2O has made it easy for non-experts to experiment with machine learning, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Deep Neural Networks in particular are notoriously difficult for a non-expert to tune properly. We have designed an easy-to-use interface which automates the process of training a large, diverse, selection of candidate models and training a stacked ensemble on the resulting models (which often leads to an even better model). Making it’s debut in the latest “Preview Release” of H2O, version 3.12.0.1 (aka “Vapnik”), we introduce H2O’s AutoML for Scalable Automatic Machine Learning.
H2O’s AutoML can be used for automating a large part of the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. The user can also use a performance metric-based stopping criterion for the AutoML process rather than a specific time constraint. Stacked Ensembles will be automatically trained on the collection individual models to produce a highly predictive ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.

AutoML Interface

We provide a simple function that performs a process that would typically require many lines of code. This frees up users to focus on other aspects of the data science pipeline tasks such as data-preprocessing, feature engineering and model deployment.
R:

aml <- h2o.automl(x = x, y = y, training_frame = train,
                  max_runtime_secs = 3600)

Python:

aml = H2OAutoML(max_runtime_secs = 3600)
aml.train(x = x, y = y, training_frame = train)

Flow (H2O’s Web GUI):
Run AutoML

AutoML Leaderboard

Each AutoML run returns a “Leaderboard” of models, ranked by a default performance metric. Here is an example leaderboard for a binary classification task:
Model Id auc data
More information, and full R and Python code examples are available on the H2O 3.12.0.1 AutoML docs page in the H2O User Guide.

Leave a Reply

AI for Smarter Manufacturing

Manufacturing is a centuries old industry and has seen significant changes dating back to the

July 19, 2019 - by Vinod Iyengar
Leads to Leases

There is such a large amount of unstructured data being produced by companies. I personally

July 18, 2019 - by Priya Jain
Healthcare
ArmadaHealth Uses AI to Match Patients with Specialists to Improve Health Outcomes

As an intern for H2O.ai, I am amazed to see how instrumental AI has been

July 9, 2019 - by Priya Jain
Toward AutoML for Regulated Industry with H2O Driverless AI

Predictive models in financial services must comply with a complex regime of regulations including the

July 8, 2019 - by Patrick Hall and Navdeep Gill
Underwrite.ai Transforms Credit Risk Decision-Making Using AI

Determining credit has been done by traditional techniques for decades. The challenge with traditional credit

June 21, 2019 - by Priya Jain
The Reproductive Science Center of SF Bay Area uses AI to Treat Infertility

Having your own baby may be a dream that many people have but some cannot

June 19, 2019 - by

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img