H2O accelerates the process of connecting to, preprocessing, cleaning, and transforming data to help data scientists and engineers make high quality datasets and features for machine learning.

Data Connectors

H2O AI Hybrid Cloud provides over 200 data connectors to make it easy to ingest data from popular data stores, including, Hadoop HDFS, object storage services, and databases.

Automatic Data Preprocessing and Cleaning

H2O AI Hybrid Cloud automatically processes and cleans both tabular data and text data. It detects missing values and flags them, handles imbalanced data, and detects duplicate records. For text data, H2O automatically removes stop-words, provides stemming and lemmatization, and auto-corrects typos. The platform also provides over 40 unique Python scripts, referred to as recipes, that data engineers and data scientists developed so users can quickly access classic data prep functions such as: binds, splits, and calculations.

Data Transformations

H2O offers a selection of 100+ pre-configured data transformations, such as one-hot encoding, imputing missing data with mean or median, and date/time embeddings, so you can transform your data into effective ML formats with a single click. In addition, H2O’s platform is extensible and enables data scientists and engineers to use their transformers of choice.

Automatic Data Visualization

H2O’s Automatic Data Visualization (AutoViz) generates graphs and data plots that help data scientists and engineers explore and understand their data prior to model building. AutoViz automatically identifies outliers, unusual correlations, clusters, and skewed or non-normal distributions that can affect the accuracy of prediction models. Unlike data visualization software that presents graphical summaries of datasets, AutoViz uses statistical and machine learning algorithms to find problems that would otherwise be buried in long lists of “insights.” AutoViz is an anomaly detector for data. AutoViz is the only commercial or open-source program that identifies data anomalies without user input.

Automatic Feature Engineering

H2O automates the entire feature engineering process, significantly accelerating one of the most time consuming data science tasks. H2O detects the relevant features, finds interactions within those features, and derives new features from the data. With the new derived features, the technology recalculates the relevant features and continues to iterate until the best features have been created and ranked for importance.

