December 23rd, 2016

What is new in H2O latest release 3.10.2.1 (Tutte) ?

RSS icon RSS Category: Community, H2O Release
wheels-lightened

Today we released H2O version 3.10.2.1 (Tutte). It’s available on our Downloads page, and release notes can be found here.
sz42-6-wheels-lightened
Photo Credit: https://en.wikipedia.org/wiki/W._T._Tutte
Top enhancements in this release:
GLM MOJO Support: GLM now supports our smaller, faster, more efficient MOJO (Model ObJect, Optimized) format for model publication and deployment (PUBDEV-3664, PUBDEV-3695).
ISAX: We actually introduced ISAX (Indexable Symbolic Aggregate ApproXimation) support a couple of releases back, but this version features more improvements and is worth a look. ISAX allows you to represent complex time series patterns using a symbolic notation, reducing the dimensionality of your data and allowing you to run our ML algos or use the index for searching or data analysis. For more information, check out the blog entry here: Indexing 1 billion time series with H2O and ISAX. (PUBDEV-3367, PUBDEV-3377, PUBDEV-3376)
GLM: Improved feature and parameter descriptions for GLM. Next focus will be on improving documentation for the K-Means algorithm (PUBDEV-3695, PUBDEV-3753, PUBDEV-3791).

Quasibinomial support in GLM:
the quasibinomial family is similar to the binomial family except that, where the binomial models only support 0/1 for the values of a target, the quasibinomial family allows for two arbitrary values. This feature was requested by advanced users of H2O for applications such as implementing their own advanced estimators. (PUBDEV-3482, PUBDEV-3791)
GBM/DRF high cardinality accuracy improvements: Fixed a bug in the handling of large categorical features (cardinality > 32) that was there since the first release of H2O-3. Certain such categorical tree split decisions were incorrect, essentially sending observations down the wrong path at any such split point in the decision tree. The error was systematic and consistent between in-H2O and POJO/MOJO, and led to lower training accuracy (and often, to lower validation accurary). The handling of unseen categorical levels (in training and testing) was also inconsistent and unseen levels would go left or right without any reason – now they follow the path of a missing values consistently. Generally, models involving high-cardinality categorical features should have improved accuracy now. This change might require re-tuning of model parameters for best results. In particular the nbins_cats parameter, which controls the number of separable categorical levels at a given split, which has a large impact on the amount of memorization of per-level behavior that is possible: higher values generally (over)fit more.
Direct Download: http://h2o-release.s3.amazonaws.com/h2o/rel-tutte/1/index.html
For each PUBDEV-* information please look at the release note links at the top of this article
Accordingly to VP of Engineering Bill Gallmeister, this release consist of signifiant work done by his engineering team. For more information on these features and all the other improvements in H2O version 3.10.2.1, review our documentation.
Happy Holidays from all H2O team!!
@avkashchauhan (Avkash Chauhan)

Leave a Reply

Amazon Redshift Integration for H2O.ai Model Scoring

We consistently work with our partners on innovative ways to use models in production here

November 22, 2021 - by Eric Gudgion
Building Resilient Supply Chains with AI

A global pandemic, a fundamental shift in the demand for goods and services worldwide, and

November 11, 2021 - by Adam Murphy
Introducing the H2O.ai Wildfire Challenge

We are excited to announce our first AI competition for good - H2O.ai Wildfire Challenge. We’ve

November 5, 2021 - by
MLB Player Digital Engagement Forecasting

Are you a baseball fan? If so, you may notice that things are heating up

October 29, 2021 - by Jo-Fai Chow
Announcing the H2O AI Feature Store

We’re really excited to announce the H2O AI Feature Store - The only intelligent feature

October 28, 2021 - by Vinod Iyengar
An Introduction to Time Series Modeling:
Time Series Preprocessing and Feature Engineering

Time is the only nonrenewable resource - Sri Ambati, Founder and CEO, H2O.ai. Prediction is very

October 26, 2021 - by Adam Murphy

Start your 14-day free trial today