October 28th, 2014

Running Your First Droplet on H2O

RSS icon RSS Category: Uncategorized
TestPassed

A number of us were at Strata in New York City this October, and one of the major benefits of these events is getting lots of in-person time with people who use your product.

Michal and Amy spent some time with a developer who was trying to build on top of the h2o-dev repo, and we realized that we didn’t have a really basic example yet of using an h2o-dev artifact as a dependency in a brand new project.

So Michal put one together for everyone to share, and I’ll walk you through a quick introduction in this post.

1. Cloning the examples repository

The h2o-droplets repository on github contains some very simple starter projects for different languages. Let’s get started by cloning the h2o-droplets repository, and changing to that directory.

$ git clone https://github.com/0xdata/h2o-droplets.git

Cloning into 'h2o-droplets'...
remote: Counting objects: 53, done.
remote: Compressing objects: 100% (33/33), done.
remote: Total 53 (delta 10), reused 39 (delta 0)
Unpacking objects: 100% (53/53), done.
Checking connectivity... done.

$ cd h2o-droplets

2. A quick look at the repo contents

As of this writing, the repo contains a java example and a scala example. Each of these is an independent starter project.

$ ls -al

total 8
drwxr-xr-x   6 tomk  staff   204 Oct 28 08:40 .
drwxr-xr-x  35 tomk  staff  1190 Oct 28 08:40 ..
drwxr-xr-x  13 tomk  staff   442 Oct 28 08:40 .git
-rw-r--r--   1 tomk  staff   322 Oct 28 08:40 README.md
drwxr-xr-x  11 tomk  staff   374 Oct 28 08:40 h2o-java-droplet
drwxr-xr-x  11 tomk  staff   374 Oct 28 08:40 h2o-scala-droplet

Let’s take a closer look at the java example:

$ find h2o-java-droplet -type f

h2o-java-droplet/.gitignore
h2o-java-droplet/build.gradle
h2o-java-droplet/gradle/wrapper/gradle-wrapper.jar
h2o-java-droplet/gradle/wrapper/gradle-wrapper.properties
h2o-java-droplet/gradle.properties
h2o-java-droplet/gradlew
h2o-java-droplet/gradlew.bat
h2o-java-droplet/README.md
h2o-java-droplet/settings.gradle
h2o-java-droplet/src/main/java/water/droplets/H2OJavaDroplet.java
h2o-java-droplet/src/test/java/water/droplets/H2OJavaDropletTest.java

As you can see, the java example contains of a build.gradle file, a java source file, and a java test file.

Look at the build.gradle file and you will see the following sections, which link the java droplet sample project to a version of h2o-dev published in MavenCentral:

repositories {
    mavenCentral()
}
ext {
  h2oVersion = '0.1.8'
}
dependencies {
    // Define dependency on core of H2O
    compile "ai.h2o:h2o-core:${h2oVersion}"
    // Define dependency on H2O algorithm
    compile "ai.h2o:h2o-algos:${h2oVersion}"
    // Demands web support
    compile "ai.h2o:h2o-web:${h2oVersion}"
    // H2O uses JUnit for testing
    testCompile 'junit:junit:4.11'
}

This is all very standard gradle stuff. In particular, note that this example depends on three different H2O artifacts, all of which are built in the h2o-dev repository.

  • h2o-core contains base platform capabilities like H2O’s in-memory distributed key/value store and mapreduce frameworks (the “water” package).
  • h2o-algos contains math algorithms like GLM and Random Forest (the “hex” package).
  • h2o-web contains the browser web UI (lots of javascript).

3. Preparing the example for use in your IDE

Let’s walk through an example using IntelliJ IDEA. The first step is to use gradle to build your IntelliJ project file.

$ cd h2o-java-droplet$ ./gradlew idea

:ideaModule
Download http://repo1.maven.org/maven2/ai/h2o/h2o-core/0.1.8/h2o-core-0.1.8.pom
Download http://repo1.maven.org/maven2/ai/h2o/h2o-algos/0.1.8/h2o-algos-0.1.8.pom
Download http://repo1.maven.org/maven2/ai/h2o/h2o-web/0.1.8/h2o-web-0.1.8.pom
[... many more one-time downloads not shown ...]
:ideaProject
:ideaWorkspace
:idea
BUILD SUCCESSFUL
Total time: 51.429 secs

You will see three new files created with IDEA extensions. The .ipr file is the project file.

$ ls -al

total 168
drwxr-xr-x  15 tomk  staff    510 Oct 28 10:03 .
drwxr-xr-x   6 tomk  staff    204 Oct 28 08:40 ..
-rw-r--r--   1 tomk  staff    273 Oct 28 08:40 .gitignore
drwxr-xr-x   3 tomk  staff    102 Oct 28 10:03 .gradle
-rw-r--r--   1 tomk  staff   1292 Oct 28 08:40 README.md
-rw-r--r--   1 tomk  staff   1409 Oct 28 08:40 build.gradle
drwxr-xr-x   3 tomk  staff    102 Oct 28 08:40 gradle
-rw-r--r--   1 tomk  staff     23 Oct 28 08:40 gradle.properties
-rwxr-xr-x   1 tomk  staff   5080 Oct 28 08:40 gradlew
-rw-r--r--   1 tomk  staff   2404 Oct 28 08:40 gradlew.bat
-rw-r--r--   1 tomk  staff  33316 Oct 28 10:03 h2o-java-droplet.iml
-rw-r--r--   1 tomk  staff   3716 Oct 28 10:03 h2o-java-droplet.ipr
-rw-r--r--   1 tomk  staff   9299 Oct 28 10:03 h2o-java-droplet.iws
-rw-r--r--   1 tomk  staff     39 Oct 28 08:40 settings.gradle
drwxr-xr-x   4 tomk  staff    136 Oct 28 08:40 src

4. Opening the project

Since we have already created the project file, start up IDEA and choose Open Project.

1-OpenProject
Choose the h2o-java-droplet.ipr project file that we just created with gradle.

2-JavaDroplet

5. Running the test inside the project

Rebuild the project.

3-RebuildProject
Run the test by right-clicking on the test name.

4-RunTest
Watch the test pass!

5-TestPassed

6. Summary

This small example demonstrated how to create a new project with H2O as a dependency. Thanks to Michal for putting this example together! If you are working on a good example you’d like to share with the community, please send us a note or make a pull request to the h2o-droplets repository.

Tell us about this or other topics that interest you by writing to h2ostream@googlegroups.com.

Leave a Reply

An Introduction to Time Series Modeling:
Time Series Preprocessing and Feature Engineering

Time is the only nonrenewable resource - Sri Ambati, Founder and CEO, H2O.ai. Prediction is very

October 26, 2021 - by Adam Murphy
New Features Now Available with the Latest Release of the H2O AI Hybrid Cloud 21.10

The Makers here at H2O.ai have been busy building new features and enhancing capabilities across

October 18, 2021 - by
Time Series Forecasting Best Practices

Earlier this year, my colleague Vishal Sharma gave a talk about time series forecasting best

October 15, 2021 - by Jo-Fai Chow
Improving NLP Model Performance with Context-Aware Feature Extraction

I would like to share with you a simple yet very effective trick to improve

October 8, 2021 - by Jo-Fai Chow
Feature Transformation with the H2O AI Hybrid Cloud

It is well known throughout the data science community that data preparation, pre-processing, and feature

October 7, 2021 - by Benjamin Cox
Introducing DatatableTon – Python Datatable Tutorials & Exercises

Datatable is a python library for manipulating tabular data. It supports out-of-memory datasets, multi-threaded data

September 20, 2021 - by Rohan Rao

Start your 14-day free trial today