H2O / H2O-3


H2O is an open-source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build scalable machine learning models from data on your laptop to big data on distributed systems like Hadoop and Spark. It also provides deployment ready artifacts (POJO, MOJO) of those models in an enterprise environment. 

We started the company in the open source with H2O. Our intent was to create a thriving community for AI and ML. We built an interface for building models in R and Python in an interactive notebook environment, optimized H2O for big data / Hadoop environments, as well as for distribution in-memory computing frameworks like Spark – called it Sparking Water. And recently H2O added support for Kubernetes.

The first H2O version three release dates back to 2015 [1]. A lot of improvements have been made by 100+ contributors over the years [2] but the major version number remains the same. Hence, the open-source H2O machine learning platform is also commonly known as H2O-3.


H2O's core code is written in distributed memory optimized Java. Inside H2O, a Distributed Key/Value store is used to access and reference data, models, objects, etc. across all nodes and machines. The algorithms are implemented on top of H2O's distributed Map/Reduce framework and utilize the Java Fork/Join framework for multi-threading. The data is read in parallel and is distributed across the cluster and stored in memory in a columnar format in a compressed way. H2O.ai's data parser has built-in intelligence to guess the schema of the incoming dataset and supports data ingests from multiple sources in various formats.

The speed, quality, ease-of-use, and model-deployment for the various cutting edge Supervised, and Unsupervised algorithms like Deep Learning, Tree Ensembles, and GLRM make H2O a highly sought after API for scalable data science. On top of the above algorithms, H2O also packages XGBoost - a powerful, open-source gradient boosting algorithm from the Distributed Machine Learning Community (DMLC) [3].




Related Links

No Related Resource entered.