October 28th, 2021
Announcing the H2O AI Feature StoreRSS Share Category: H2O AI Cloud, Product Updates
By: Vinod Iyengar
We’re really excited to announce the H2O AI Feature Store – The only intelligent feature store in the market. We’ve been working on this for many months with our co-development partner: AT&T. This enabled us to build a first-of-its-kind platform that is designed to be enterprise-grade from day 1. It is built with best-of-breed technology that integrates seamlessly with all the common enterprise machine learning pipelines. The Feature Store will be available as part of the H2O AI Hybrid Cloud for customers to use.
What is a Feature
Before we get to the feature store, let’s look at a quick review of what a feature is. For most machine learning and AI applications, raw data is typically not used directly but transformed into ‘features’ that are optimized for capturing the most signal from the data. The features can often be simple transformations (like logarithmic or exponential) or aggregations (sum of sales over a time period) or interactions with other features (debt to income ratio).
As we deployed our AI platform at many large enterprises and put models in production, we started hearing about issues around redundant, time-consuming work in recreating features in production and lack of collaboration across data science teams. In some of the large companies, getting the right set of features put together is probably the most significant part of the project.
One large financial service customer told us that it took them nearly half a year to put together data for a new model. A lot of the challenges were around resolving data access, permissions, approvals for certain features, and model review from governance teams. What they found is that often, the same set of features were being used by models – so if they could reuse features from existing models then they could bypass the whole upfront process. H2O AI Feature Store is a repository to store, update, retrieve, and share machine learning (ML) features.
Many data scientists and domain experts often spend large amounts of time exploring and transforming raw data to create predictive features. Unfortunately, these highly valuable and often costly features are typically only available to the data scientists that created them. H2O AI Feature Store makes it easy for organizations to organize, govern, share and operationalize these valuable features. And as important is the fact that these features are made available for both batch and real-time requirements without having to engineer them again.
With the H2O AI Feature Store, organizations can increase their pace of innovation and deliver impactful AI outcomes faster.
How does it work?
The feature store consists of 3 main components:
- Offline store of features for training and batch scoring
- Online store for real-time scoring and streaming
- Metadata Registry to enable search and collaboration
Data scientists and engineers can continue to build features in their existing environments or tools and bring those to the feature store through one of our many clients. We have native integration to platforms like Databricks, Snowflake, Teradata, and more. Data scientists can also directly use our Scala or Python client to access the feature store.
Users can create new projects, register them and then ingest data in the feature store.
Once the data is in the feature store, they can configure how frequently they want to update the features based on their use case and needs. The feature store keeps a mirrored copy of the data for both online and offline requirements. Typically the offline store is used to access the data for model training (using historical data) and batch scoring. The offline store is built to handle massive amounts of data. The online store, on the other hand, is typically used for real-time scoring and streaming use cases, and therefore it is built to deliver features with sub-millisecond latency. Models that are deployed using H2O MLOps or for that matter anywhere can hit the feature store in real-time in the middle of a transaction and use the output to score and provide predictions back.
We also have a bunch of capabilities that we are super excited about:
- Automatic Feature Recommendations
Automatically improve the features in your feature store. Data scientists can select the feature sets that they are looking to update and improve and simply request feature recommendations. H2O will automatically recommend new features and feature updates that could improve AI model performance. Data Scientists can review the proposed updated features and accept or discard them, retaining complete control. Users can set up feature recommendations to run automatically or on demand.
- Automatic Feature Drift
Automatically checks both individual features and feature sets for drift over time and alerts users. Alerts can be used to trigger retraining or refitting to keep models accurate.
- Automatic Bias Identification
Automatically detect bias in your features. Data Scientists can simply select the set of features they’d like to analyze for bias, and the H2O AI Feature Store will analyze and report if bias was detected. This capability helps data scientists monitor features on an ongoing basis to continually remove bias. With our automatic Bias Identification feature, data scientists have complete control to review and take action on features that may create bias.
- Feature Rank
Automatically score features to indicate popularity or value across different use-cases. This will be tied to the variable importance of models to understand which features are most valuable across use-cases.
- Detailed Cataloging
Add over 40 metadata attributes, such as Description, Data Sources, and Data Sensitivity Categories. Additionally, metadata tags can be added to further improve the feature discoverability and exploration. The complete list of attributes is located in our H2O AI Feature Store documentation.
To learn more about the feature store, check out the page and sign up for early access.