July 16th, 2015

useR! Aalborg 2015 conference

RSS icon RSS Category: Uncategorized
matt_dowle

The H2O team spent most of the useR! Aalborg 2015 conference at the booth giving demos and discussing H2O. Amy had a 16 node EC2 cluster running with 8 cores per node, making a total of 128 CPUs. The demo consisted of loading large files in parallel and then running our distributed machine learning algos in parallel.
At an R conference, most people wanted to script H2O from R, which is of course built-in (as is Python) but we also conveyed the benefits that our user interface Flow can provide in this space (even for programmers) by automating and accelerating common tasks. We enjoyed discussing future directions with and bouncing ideas off of the attendees. There is nothing like seeing people’s first reaction to the product, live and in person! As an open source platform, H2O thrives on suggestions and contributions from our community.
All components of H2O are developed-in-the-open on GitHub.

H2O contributed 3 talks:

Matt Dowle on Scalable Radix Sorting

Matt Dowle presented on the details and benchmarks of the fast and stable radix sort implementation in data.table:::forderv. On 500 million random numerics (4 GB), base R takes approximately 22 minutes vs forder at 2 minutes. He discussed the pros and cons of most-significant-digit (forwards) and least-significant-digit (backwards) as well as application to all types: integer with large range (>1e5), numeric and character. We hope to find a sponsor from the R core team to help us include this method in base R where it could benefit the community automatically. The work builds on articles by Terdiman, 2000 and Herf, 2001 and is joint work with Arun Srinivasan.
Slides: Fast, stable and scalable true radix sorting with Matt Dowle at useR! Aalborg

matt_dowle
Photo courtesty of flickr user Rhaen

Erin LeDell on h2oEnsemble

Erin presented an overview of scalable ensemble learning in R using the h2oEnsemble R package. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity or training time. This R interface provides easy access to scalable ensemble learning using H2O. The H2O Ensemble software implements the Super Learner, or stacking, ensemble algorithm, using distributed base learning algorithms from the open source machine learning platform, H2O. The following base learner algorithms are currently supported in h2oEnsemble: Generalized linear models with elastic net regularization, Gradient Boosting (GBM) with regression and classification trees, Random Forest and Deep Learning (multi-layer feed-forward neural networks). Erin provided code examples and some simple benchmarks.
Slides: h2oensemble with Erin Ledell at useR! Aalborg

erin_ledell
Photo courtesty of flickr user Rhaen

Amy Wang on H2O Architecture

Amy presented H2O at the useR! sponsor talk and went over the architecture of our product. Her live demo showed the speed and scale of H2O through an R interface. On top of reading in data and aggregating columnar data at lightning fast speed, H2O also comes with a suite of sophisticated models with all the parameters exposed to the front end for ease of use. This attracted discussion at our booth even as the conference came to a close and we began packing up our banners. Many academics expressed interest in using H2O to teach students Machine Learning algorithms, while people in the industry discussed partnerships and use cases. The emphasis of the talk is to encourage R users to try H2O and build a community of users with interesting questions, ideas, and feedback who can ultimately help provide a better open source H2O experience for everyone.
Slides: H2O Overview with Amy Wang at useR! Aalborg

amy_wang
Photo courtesty of Matt Dowle

Matt also stopped by Copenhagen to give a talk at the R Summit. You can find his R Summit slides on our Slideshare

Want to try one of the demos we ran at the useR! booth?

Check out our Github page for instructions, scripts, and datasets.
Click here for R demos
Special thanks to the useR! organizing committee and all the people who stopped by our booth!

Leave a Reply

AI-Driven Predictive Maintenance with H2O Hybrid Cloud

According to a study conducted by Wall Street Journal, unplanned downtime costs industrial manufacturers an

August 2, 2021 - by Parul Pandey
What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today