Thursday night (August 29) at 7, resident math hacker Spencer A. is leading a hands on workshop on using H2O to analyze real-world data. For those of you who are new to the math side of H2O, we have notes below to help you get prepared.
H2O is a distributed math platform featuring a set of analytical tools that can be accessed through an HTML based UI or through R. It’s built to handle really big data sets by analyzing subsets of big data in parallel. If you would like to read more about our performance, I highly recommend looking at Spence’s blog post on performance: http://0xdata.com/blog/2013/08/rf_on_mnist/
You can access instructions and resources for running H2O from your laptop here: http://docs.0xdata.com/quickstart/quickstart_jar.html, and you can download the .jar file here: http://0xdata.com/h2O/. There are tutorials on the documents page as well, if you are inclined to take a deeper look at some of the algorithms we will be working with.
Specifically, Spence will be walking through the famous MNIST data set and Allstate data similar to that presented in a recent Kaggle competition. If you would like to download those data sets before you get here, you can find all of the information you need at: http://www.meetup.com/H2Omeetup/events/124513412/
For the MNIST data, we will walk through the data to develop a better intuitive understand of the Random Forest (RF) algorithm. Generalized linear modeling (GLM) and it’s parallelization will be discussed through a hands on analysis of the Allstate data. By the end of the workshop you will be prepared to analyze and interpret data of your own.