October 28th, 2021

Announcing the H2O AI Feature Store

RSS icon RSS Category: H2O AI Cloud, Product Updates

We’re really excited to announce the H2O AI Feature Store – The only intelligent feature store in the market. We’ve been working on this for many months with our co-development partner: AT&T. This enabled us to build a first-of-its-kind platform that is designed to be enterprise-grade from day 1. It is built with best-of-breed technology that integrates seamlessly with all the common enterprise machine learning pipelines. The Feature Store will be available as part of the H2O AI Hybrid Cloud for customers to use.

What is a Feature

Before we get to the feature store, let’s look at a quick review of what a feature is. For most machine learning and AI applications, raw data is typically not used directly but transformed into ‘features’ that are optimized for capturing the most signal from the data. The features can often be simple transformations (like logarithmic or exponential) or aggregations (sum of sales over a time period) or interactions with other features (debt to income ratio).

Key Challenges

As we deployed our AI platform at many large enterprises and put models in production, we started hearing about issues around redundant, time-consuming work in recreating features in production and lack of collaboration across data science teams. In some of the large companies, getting the right set of features put together is probably the most significant part of the project.

One large financial service customer told us that it took them nearly half a year to put together data for a new model. A lot of the challenges were around resolving data access, permissions, approvals for certain features, and model review from governance teams. What they found is that often, the same set of features were being used by models – so if they could reuse features from existing models then they could bypass the whole upfront process. H2O AI Feature Store is a repository to store, update, retrieve, and share machine learning (ML) features.

Many data scientists and domain experts often spend large amounts of time exploring and transforming raw data to create predictive features. Unfortunately, these highly valuable and often costly features are typically only available to the data scientists that created them. H2O AI Feature Store makes it easy for organizations to organize, govern, share and operationalize these valuable features. And as important is the fact that these features are made available for both batch and real-time requirements without having to engineer them again.

With the H2O AI Feature Store, organizations can increase their pace of innovation and deliver impactful AI outcomes faster.

How does it work?

The feature store consists of 3 main components:

  • Offline store of features for training and batch scoring
  • Online store for real-time scoring and streaming
  • Metadata Registry to enable search and collaboration

Data scientists and engineers can continue to build features in their existing environments or tools and bring those to the feature store through one of our many clients. We have native integration to platforms like Databricks, Snowflake, Teradata, and more. Data scientists can also directly use our Scala or Python client to access the feature store.

Users can create new projects, register them and then ingest data in the feature store.

Once the data is in the feature store, they can configure how frequently they want to update the features based on their use case and needs. The feature store keeps a mirrored copy of the data for both online and offline requirements. Typically the offline store is used to access the data for model training (using historical data) and batch scoring. The offline store is built to handle massive amounts of data. The online store, on the other hand, is typically used for real-time scoring and streaming use cases, and therefore it is built to deliver features with sub-millisecond latency. Models that are deployed using H2O MLOps or for that matter anywhere can hit the feature store in real-time in the middle of a transaction and use the output to score and provide predictions back.

Key Capabilities

We also have a bunch of capabilities that we are super excited about:

 

  • Automatic Feature Recommendations

Automatically improve the features in your feature store. Data scientists can select the feature sets that they are looking to update and improve and simply request feature recommendations. H2O will automatically recommend new features and feature updates that could improve AI model performance. Data Scientists can review the proposed updated features and accept or discard them, retaining complete control. Users can set up feature recommendations to run automatically or on demand.

 

  • Automatic Feature Drift 

Automatically checks both individual features and feature sets for drift over time and alerts users. Alerts can be used to trigger retraining or refitting to keep models accurate.

 

  • Automatic Bias Identification

Automatically detect bias in your features. Data Scientists can simply select the set of features they’d like to analyze for bias, and the H2O AI Feature Store will analyze and report if bias was detected. This capability helps data scientists monitor features on an ongoing basis to continually remove bias. With our automatic Bias Identification feature, data scientists have complete control to review and take action on features that may create bias.

 

  • Feature Rank

Automatically score features to indicate popularity or value across different use-cases. This will be tied to the variable importance of models to understand which features are most valuable across use-cases.

 

  • Detailed Cataloging

Add over 40 metadata attributes, such as Description, Data Sources, and Data Sensitivity Categories. Additionally, metadata tags can be added to further improve the feature discoverability and exploration. The complete list of attributes is located in our H2O AI Feature Store documentation.

Get started

To learn more about the feature store, check out the page and sign up for early access.

About the Author

Vinod Iyengar

Vinod Iyengar is the Vice President of Product at H2O.ai. He leads a team charged with product management and product development across the H2O.ai platform.

Vinod has worked for H2O.ai since 2015. In his time with the company, he has worked as the VP of marketing & technical alliances, and VP of customer success & product. Vinod received his bachelor’s degree in engineering from the University of Mumbai and his master’s degree in quantitative analysis from the University of Cincinnati College of Business.

Leave a Reply

Amazon Redshift Integration for H2O.ai Model Scoring

We consistently work with our partners on innovative ways to use models in production here

November 22, 2021 - by Eric Gudgion
Building Resilient Supply Chains with AI

A global pandemic, a fundamental shift in the demand for goods and services worldwide, and

November 11, 2021 - by Adam Murphy
Introducing the H2O.ai Wildfire Challenge

We are excited to announce our first AI competition for good - H2O.ai Wildfire Challenge. We’ve

November 5, 2021 - by
MLB Player Digital Engagement Forecasting

Are you a baseball fan? If so, you may notice that things are heating up

October 29, 2021 - by Jo-Fai Chow
An Introduction to Time Series Modeling:
Time Series Preprocessing and Feature Engineering

Time is the only nonrenewable resource - Sri Ambati, Founder and CEO, H2O.ai. Prediction is very

October 26, 2021 - by Adam Murphy
New Features Now Available with the Latest Release of the H2O AI Hybrid Cloud 21.10

The Makers here at H2O.ai have been busy building new features and enhancing capabilities across

October 18, 2021 - by

Start your 14-day free trial today