February 25th, 2015

Strata San Jose 2015

RSS icon RSS Category: Uncategorized

I had a great time at Strata SJ 2015! I had a lot of fun answering questions and talking to enthusiastic and curious H2O users at our booth. It was great seeing how many people are involved in the H2O community and I also really enjoyed drinking free margaritas at the booth crawl.
The H2O team met some really great people with lots of different use cases for our product and we hope to see all of you again at our First-Fridays Hackathons or other meetups.

Strata 2015 Presentation

The H2O.ai team presented on Thursday right after lunch. We had two presenters on stage – Cliff and Michal presented two new super cool features of H2O – Python API and Sparkling Water. The presentation was legendary! Not only because the room was packed – all seats were occupied and people were standing along walls, but also we received lot of interesting questions and feedback regarding H2O, Python and Sparkling Water.
The presentation involved introduction of H2O and its features, but a major part of the talk was devoted to online product demo (real online demo running on Cliff and Michal’s laptops using the latest H2O release!). For this case, we became CitiBike New York data scientists, predicting number of bikes at individual bike sharing stations at any given time based on historical data and weather data.
The demo used two publicly available datasets – CitiBike NY historical data from years 2013 and 2014 (available here) and New York weather data publicly available from National Climatic Data Center. They are also available in H2O’s S3 storage which you can get by cloning the H2O repository and fetching big data:

git clone https://github.com/h2oai/h2o-dev.git
cd h2o-dev
./gradlew syncBigdataLaptop
cd bigdata/laptop/citibike-nyc/

During the demo we demonstrated real-life machine learning workflow involving the following steps:
– data loading
– data munging including feature generation and refinement
– filtering data
– joining data from both sources (i.e., joining weather and bikes tables)
– splitting data into three splits for model training, on-the-fly validation, and testing
– and finally models (in this case we generated GBM and GLM) training and evaluation of their performance based on R-squared score
Our overall goal of the talk was to demonstrate this data science workflow using Python API and then perform the same workflow from Sparkling Water, combining Scala, Spark, and H2O APIs.
Cliff showed the workflow using Python API directly from iPython notebook. The notebook source is available in H2O’s GitHub here and the raw Python code is here.
Michal (that’s me) demonstrated Sparkling Water by using Sparkling Shell (regular Spark shell with additional Sparkling Water library) and went step-by-step through the workflow described by a script available in H2O GitHub. also showed our new H2O UI and used it to explore data.
The entire presentation was recorded by Strata and will be available soon at Strata Proceedings. However, the presentation deck is already available here:

Additional Resources

Leave a Reply

What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow
Regístrese para su prueba gratuita y podrá explorar H2O AI Hybrid Cloud

Recientemente, lanzamos nuestra prueba gratuita de 14 días de H2O AI Hybrid Cloud, lo que

May 17, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today