Automatic Machine Learning (AutoML)


What is Automatic Machine Learning (AutoML)?

Choosing the best machine learning models and tuning them can be time consuming and exhaustive. Often, it requires years of expertise to know which parameters to tune. The field of AutoML focuses on solving this issue. AutoML is useful both for experts, by automating the process of choosing and tuning a model; and for non-experts as well, by helping them to create high performing models in a timely manner. Some of the repetitive tasks in machine learning can be automated. These tasks include data preparation, feature engineering, feature selection, model tuning as well as building ensembles has three generations of Automatic machine learning:

First Generation - H2O AutoML

H2O AutoML is an automated algorithm for automating the machine learning workflow. It includes data pre-processing such as imputation, standardization, and one-hot encoding categorical features. It also provides automatic model training and hyper-parameter tuning via random grid search with user-defined time, space, and resource constraints. 

The current version of AutoML trains, tunes, and cross-validates the following models via random grid search: Generalized Linear Model (GLM), Distributed Random Forest, Extremely-Randomized Random Forest, Gradient Boosting Machines (GBM), eXtreme Gradient Boosting (XGBoost) and Deep Neural Networks. After training a list of different grid search models, AutoML reuses the models and builds stacked ensembles. A leaderboard showing model performance from all individual models as well as the stacked ensembles will be shown. Users can pick the best model based on their preference.

Like all other H2O-3 features, H2O AutoML is designed with scalability in mind. H2O Sparkling Water provides a seamless integration of Spark and H2O. Users can leverage their existing data munging pipelines in Spark, feed data into H2O, train high quality models with AutoML, and store the model predictions in Spark.

Second Generation - H2O Driverless AI

Data scientists spend a lot of their time on feature engineering, an iterative and time-consuming task, that we knew could be automated to save them time, and make data scientists more productive. That’s when our second generation, Driverless AI, comes in. It is an automatic machine learning in the form of enterprise software. This is the platform that organizations can use to address the Talent, Time and Trust [1]. It builds on top of the capabilities of our open-source H2O-3 (i.e. Python/R/web interfaces [2][3], multiple data connectors [4], and low-latency scoring pipelines [5]). 

We architected Driverless AI from the ground up and worked to remove the obstacles that data scientists face. We fine-tuned and then accelerated the automatic feature engineering capability as a key part of the platform. Now, Driverless AI is a leading automatic machine learning platform in the market. It automates supervised learning problems such as regression, classification, time-series forecast and text analytics. It accelerates the data science workflow from initial analysis to final deployment:

  • Quick exploratory data analysis with Autovis [6]
  • Automatic feature engineering with out-of-the-box data transformers [7] 
  • Automatic model training/tuning of different algorithms (e.g. XGBoost, LightGBM, TensorFlow) for regression, classification, time-series, and text analytics [8]
  • Machine learning interpretability tools for explaining modeling results in a human-readable format [9]
  • Automatic documentation for data scientists as well as business users and regulators [10]
  • Low-latency scoring pipelines for flexible deployment in production environments [11]

Third Generation - Extensible H2O Driverless AI

Our third generation is the extensible version of Driverless A which allows users to add additional feature engineering, model scores, and algorithms in order to enhance machine learning experiments. These extensions are often referred to as recipes. Hence, the ability to extend Driverless AI via recipes is called Bring Your Own Recipe (BYOR). There is also a collection of open-source recipes available online from

See also




Related Links

No Related Resource entered.