February 25th, 2015

Strata San Jose 2015

RSS icon RSS Category: Uncategorized

I had a great time at Strata SJ 2015! I had a lot of fun answering questions and talking to enthusiastic and curious H2O users at our booth. It was great seeing how many people are involved in the H2O community and I also really enjoyed drinking free margaritas at the booth crawl.
The H2O team met some really great people with lots of different use cases for our product and we hope to see all of you again at our First-Fridays Hackathons or other meetups.

Strata 2015 Presentation

The H2O.ai team presented on Thursday right after lunch. We had two presenters on stage – Cliff and Michal presented two new super cool features of H2O – Python API and Sparkling Water. The presentation was legendary! Not only because the room was packed – all seats were occupied and people were standing along walls, but also we received lot of interesting questions and feedback regarding H2O, Python and Sparkling Water.
The presentation involved introduction of H2O and its features, but a major part of the talk was devoted to online product demo (real online demo running on Cliff and Michal’s laptops using the latest H2O release!). For this case, we became CitiBike New York data scientists, predicting number of bikes at individual bike sharing stations at any given time based on historical data and weather data.
The demo used two publicly available datasets – CitiBike NY historical data from years 2013 and 2014 (available here) and New York weather data publicly available from National Climatic Data Center. They are also available in H2O’s S3 storage which you can get by cloning the H2O repository and fetching big data:

git clone https://github.com/h2oai/h2o-dev.git
cd h2o-dev
./gradlew syncBigdataLaptop
cd bigdata/laptop/citibike-nyc/

During the demo we demonstrated real-life machine learning workflow involving the following steps:
– data loading
– data munging including feature generation and refinement
– filtering data
– joining data from both sources (i.e., joining weather and bikes tables)
– splitting data into three splits for model training, on-the-fly validation, and testing
– and finally models (in this case we generated GBM and GLM) training and evaluation of their performance based on R-squared score
Our overall goal of the talk was to demonstrate this data science workflow using Python API and then perform the same workflow from Sparkling Water, combining Scala, Spark, and H2O APIs.
Cliff showed the workflow using Python API directly from iPython notebook. The notebook source is available in H2O’s GitHub here and the raw Python code is here.
Michal (that’s me) demonstrated Sparkling Water by using Sparkling Shell (regular Spark shell with additional Sparkling Water library) and went step-by-step through the workflow described by a script available in H2O GitHub. also showed our new H2O UI and used it to explore data.
The entire presentation was recorded by Strata and will be available soon at Strata Proceedings. However, the presentation deck is already available here:

Additional Resources

Leave a Reply

What Are Feature Stores and Why Are They Important?

Machine learning (ML) models are only as good as the data fed into them. In

January 18, 2022 - by Adam Murphy
A Beginner’s View of H2O MLOps

Note: this is a community blog post by Shamil Dilshan Prematunga. It was first published

January 15, 2022 - by Jo-Fai Chow
Shapley Values – A Gentle Introduction

If you can't explain it to a six-year-old, you don't understand it yourself. - Albert

January 11, 2022 - by Adam Murphy
The Bond Market & AI: How MarketAxess Brings it All Together

The vast majority of the equities market trades electronically while the bond market is still

January 11, 2022 - by Ian Gomez
H2O Release 3.36 (Zorn)

There’s a new major release of H2O, and it’s packed with new features and fixes! Among

January 7, 2022 - by Michal Kurka
1st Place Winner’s Blog – Kaggle 2021 Data Science and Machine Learning Survey

Kaggle, the largest global community of data scientists, conducted the 5th annual industry-wide survey that

January 4, 2022 - by Shivam Bansal and KunHao Yeh

Start your 14-day free trial today