Get started with Sparkling Water for desktop

  1. Download Spark (if not already installed) from the Spark Downloads Page

    Choose Spark release : 1.6.2

    Choose a package type: Pre-built for Hadoop 2.4 and later
  2. Point SPARK_HOME to the existing installation of Spark and export variable MASTER.

    export SPARK_HOME="/path/to/spark/installation"

    # To launch a local Spark cluster with 3 worker nodes with 2 cores and 1g per node.

    export MASTER="local-cluster[3,2,1024]"
  3. From your terminal, run:

    cd ~/Downloads
    unzip sparkling-water-1.6.8.zip
    cd sparkling-water-1.6.8
    bin/sparkling-shell --conf "spark.executor.memory=1g"
  4. Create an H2O cloud inside the Spark cluster:

    import org.apache.spark.h2o._
    val h2oContext = new H2OContext(sc).start()
    import h2oContext._
  5. Follow this demo, which imports airlines and weather data and runs predictions on delays.

Launch Sparkling Water on Hadoop using Yarn.

  1. Download Spark (if not already installed) from the Spark Downloads Page

    Choose Spark release : 1.6.2

    Choose a package type: Pre-built for Hadoop 2.4 and later
  2. Point SPARK_HOME to an existing installation of Spark:

    export SPARK_HOME='/path/to/spark/installation'?id=6d5d5654-7dd6-2bc5-80ee-99223aea4594 -O undefined
  3. Set the HADOOP_CONF_DIR and Spark MASTER environmental variables.

    export HADOOP_CONF_DIR=/etc/hadoop/conf
    export MASTER="yarn-client"
  4. Download Spark and Use spark-submit to launch Sparkling Shell on YARN.

    wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.6/8/sparkling-water-1.6.8.zip
    unzip sparkling-water-1.6.8.zip
    cd sparkling-water-1.6.8
    bin/sparkling-shell --num-executors 3 --executor-memory 4g --driver-memory 4g --master yarn-client
  5. Create an H2O cloud inside the Spark cluster:

    import org.apache.spark.h2o._
    val h2oContext = new H2OContext(sc).start()
    import h2oContext._

Launch H2O on a Standalone Spark Cluster

  1. Download Spark (if not already installed) from the Spark Downloads Page

    Choose Spark release : 1.6.2

    Choose a package type: Pre-built for Hadoop 2.4 and later
  2. Point SPARK_HOME to an existing installation of Spark:

    export SPARK_HOME='/path/to/spark/installation'
  3. From your terminal, run:

    cd ~/Downloads
    unzip sparkling-water-1.6.8.zip
    cd sparkling-water-1.6.8
    bin/launch-spark-cloud.sh
    export MASTER="spark://localhost:7077"
    bin/sparkling-shell
  4. Create an H2O cloud inside the Spark cluster:

    import org.apache.spark.h2o._
    val h2oContext = new H2OContext(sc).start()
    import h2oContext._

Gradle-style specification for Maven artifacts

repositories {
mavenCentral()
}

dependencies {
compile "ai.h2o:sparkling-water-core_2.10:1.6.8"
}

Get started with PySparkling

  1. Download Spark (if not already installed) from the Spark Downloads Page

    Choose Spark release : 1.6.2

    Choose a package type: Pre-built for Hadoop 2.4 and later
  2. Point SPARK_HOME to the existing installation of Spark and export variable MASTER.

    export SPARK_HOME="/path/to/spark/installation"

    # To launch a local Spark cluster with 3 worker nodes with 2 cores and 1g per node.

    export MASTER="local-cluster[3,2,1024]"
  3. From your terminal, run:

    cd ~/Downloads
    unzip sparkling-water-1.6.8.zip
    cd sparkling-water-1.6.8

    #To start an interactive Python terminal-
    bin/pysparkling

    #To start a notebook
    IPYTHON_OPTS="notebook" bin/pysparkling
  4. Create an H2O cloud inside the Spark cluster:

    from pysparkling import *
    hc= H2OContext(sc).start()
    import h2o
  5. Follow this demo, which imports Chicago crime, census and weather data and predicts the probability of arrest.

  6. To launch on YARN:

    wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.6/8/sparkling-water-1.6.8.zip
    unzip sparkling-water-1.6.8.zip

    export SPARK_HOME="/path/to/spark/installation"
    export HADOOP_CONF_DIR=/etc/hadoop/conf
    export SPARKLING_HOME="/path/to/SparklingWater/installation"
    $SPARKLING_HOME/bin/pysparkling --num-executors 3 --executor-memory 20g --executor-cores 10 --driver-memory 20g --master yarn-client

    #Create an H2O cloud inside the Spark cluster and import H2O-Python package:
    from pysparkling import *
    hc= H2OContext(sc).start()
    import h2o
  7. To launch as a Spark Package application:

    $SPARK_HOME/bin/spark-submit
    --packages ai.h2o:sparkling-water-core_2.10:1.6.8
    --py-files $SPARKLING_HOME/py/dist/pySparkling-1.6.8-py2.7.egg $SPARKLING_HOME/py/examples/scripts/H2OContextDemo.py