October 22nd, 2013

GBM on Ecology – Recreating a model made for R

RSS icon RSS Category: Uncategorized
GBMmodelview

In the last couple of weeks we’ve had two meetups on GBM (gradient boosted classification and regression), and hence a lot of excitement about running the algorithm as presented by Cliff, Earl and Dr. Hastie. You can find the hella cool videos of both presentations here: http://www.youtube.com/0xdata
One of my favorite articles on GBM is a great case study from ecology, Elith, Leathwick & Hastie (2008). You can find the original article here: http://onlinelibrary.wiley.com/store/10.1111/j.1365-2656.2008.01390.x/asset/j.1365-2656.2008.01390.x.pdf;jsessionid=5B5FE919D24D8C3EA12FCB74BF352C62.f04t04?v=1&t=hn3iw9wm&s=29c201e8d1d94504ec9e07dcb12bfb2cb539fe7e
The authors kindly made their data and process in R publicly available, so you can get the data and try the model for yourself.
Here is the final model presented – carried out in H2O. Note that data were originally split into testing and training data (called model and eval data respectively in their available download).
The model was originally specified on 14 variables and 1000 observations. The dependent variable is found in column 2, named “Angaus”, and about 80% of the data in the column are 0. In the original paper the family was specified as Bernoulli, with a complexity of 5, and a learning rate of .01.
We recreated the original model in H2O. The specification is depicted below, as well as the output.   Note that the X variable field asks for opt out variable specification, and that both the training and testing data sets are set in the model specification page (so your model output is automatically applied to the testing data if you specify it – which is a feature I’m pretty fond of). Also notice that the model is specified as a classification because the dependent variable is a binomial.

Request GBM
Ntrees form data
And here are the results (I only requested 650 trees – which keeps with the model given in the paper, but it’s pretty trivial to request over 1000. I did it earlier with a 20gig heap and it took about as long as making a cup of coffee .)
GBMmodelview

Leave a Reply

Amazon Redshift Integration for H2O.ai Model Scoring

We consistently work with our partners on innovative ways to use models in production here

November 22, 2021 - by Eric Gudgion
Building Resilient Supply Chains with AI

A global pandemic, a fundamental shift in the demand for goods and services worldwide, and

November 11, 2021 - by Adam Murphy
Introducing the H2O.ai Wildfire Challenge

We are excited to announce our first AI competition for good - H2O.ai Wildfire Challenge. We’ve

November 5, 2021 - by
MLB Player Digital Engagement Forecasting

Are you a baseball fan? If so, you may notice that things are heating up

October 29, 2021 - by Jo-Fai Chow
Announcing the H2O AI Feature Store

We’re really excited to announce the H2O AI Feature Store - The only intelligent feature

October 28, 2021 - by Vinod Iyengar
An Introduction to Time Series Modeling:
Time Series Preprocessing and Feature Engineering

Time is the only nonrenewable resource - Sri Ambati, Founder and CEO, H2O.ai. Prediction is very

October 26, 2021 - by Adam Murphy

Start your 14-day free trial today