March 24th, 2016

Connecting to Spark & Sparkling Water from R & Rstudio

RSS icon RSS Category: Uncategorized
Spark commands

Sparkling Water offers the best of breed machine learning for Spark users. Sparkling Water brings all of H2O’s advanced algorithms and capabilities to Spark. This means that you can continue to use H2O from Rstudio or any other ide of your choice. This post will walk you through the steps to get running on plain R or R studio from Spark.
It works just the same the same way as regular H2O. You just need to call h2o.init() from R with the right parameters i.e. IP, PORT
For example: we start sparkling shell (bin/sparkling-shell) here and create an H2OContext:
Spark commands
Now H2OContext is running and H2O’s REST API is exposed on 172.162.223:54321
So we can open RStudio and call h2o.init() (make sure you have the right R H2O package installed):
Rstudio-start
Let’s now create a Spark DataFrame, then publish it as H2O frame and access it from R:
This is how you achieve that in sparkling-shell:
val df = sc.parallelize(1 to 100).toDF // creates Spark DataFrame
val hf = h2oContext.asH2OFrame(df) // publishes DataFrame as H2O's Frame

Scala val df code
You can see that the name of the published frame is frame_rdd_6. Now let us go to RStudio and list all the available frames via h2o.ls() function:
Alternatively you could also name the frame during the transformation from Spark to H2O as shown below:
h2oContext.asH2OFrame(df) -> val hf = h2oContext.asH2OFrame(df, "simple.frame")
Rstudio-frames
We can fetch the frame as well or invoke a R function on it:
Rstudio-rdd
Keep hacking!

Leave a Reply

What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow
Regístrese para su prueba gratuita y podrá explorar H2O AI Hybrid Cloud

Recientemente, lanzamos nuestra prueba gratuita de 14 días de H2O AI Hybrid Cloud, lo que

May 17, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today