Contents

Section Title Page
1 What is H2O? 6
2 Sparkling Water Introduction 8
2.1 Typical Use Cases 8
2.1.1 Model Building 8
2.1.2 Data Munging 9
2.1.3 Stream Processing 9
2.2 Features 11
2.3 Supported Data Sources 11
2.4 Supported Data Formats 11
2.5 Supported Spark Execution Environments 12
2.6 Sparkling Water Clients 12
2.7 Sparkling Water Requirements 13
3 Design 14
3.1 Data Sharing between H2O and Spark 15
3.2 H2OContext 15
4 Starting Sparkling Water 17
4.1 Setting Up The Environment 17
4.2 Starting Interactive Shell with Sparkling Water 17
4.4 Starting Sparkling Water with Internal Backend 18
4.4 External Backend 19
4.4.1 Automatic Mode of External Backend 19
4.4.2 Manual Mode of External Backend on Hadoop 21
4.4.3 Manual Mode of External Backend on Hadoop (standalone) 22
4.5 Memory Management 24
5 Data Manipulation 26
5.1 Creating H2O Frames 26
5.1.1 Convert from RDD, DataFrame or Dataset 26
5.1.2 Creating H2OFrame from an Existing Key 27
5.1.3 Create H2O Frame Directly 27
5.2 Converting H2O Frames to Spark entities 28
5.2.1 Convert to RDD 28
5.2.2 Convert to DataFrame 28
5.3 Mapping between H2OFrame And Data Frame Types 29
5.4 Mapping between H2OFrame and RDD[T] Types 30
5.5 Using Spark Data Sources with H2OFrame 30
5.5.1 Reading from H2OFrame 30
5.5.2 Saving to H2OFrame 31
5.5.3 Specifying Saving Mode 32
6 Calling H2O Algorithms 33
7 Productionizing MOJOs from H2O-3 37
7.1 Loading the H2O-3 MOJOs 37
7.2 Exporting the loaded MOJO model using Sparkling Water 41
7.3 Importing the previously exported MOJO model from Sparkling Water 41
7.4 Accessing additional prediction details 41
7.5 Customizing the MOJO Settings 41
7.6 Methods available on MOJO Model 42
7.6.1 Obtaining Domain Values 42
7.6.2 Obtaining Model Category 42
7.6.3 Obtaining Feature Types 42
7.6.4 Obtaining Feature Importances 43
7.6.5 Obtaining Scoring History 43
7.6.6 Obtaining Training Params 43
7.6.7 Obtaining Metrics 43
7.6.8 Obtaining Leaf Node Assignments 44
7.6.9 Obtaining Stage Probabilities 44
8 Productionizing MOJOs from Driverless AI 44
8.1 Requirements 45
8.2 Loading and Score the MOJO 45
8.3 Predictions Format 48
8.4 Customizing the MOJO Settings 48
8.5 Troubleshooting 49
9 Deployment 50
9.1 Referencing Sparkling Water 50
9.1.1 Using Assembly Jar 50
9.1.2 Using PySparkling Zip 51
9.1.3 Using the Spark Package 51
9.2 Target Deployment Environments 52
9.2.1 Local cluster 52
9.2.2 On a Standalone Cluster 52
9.2.3 On a YARN Cluster 53
9.3 DataBricks Cloud 53
9.3.1 Creating a Cluster 54
9.3.2 Running Sparkling Water 54
9.3.3 Running PySparkling 55
9.3.4 Running RSparkling 56
10 Running Sparkling Water in Kubernetes 57
10.1 Internal Backend 57
10.1.1 Scala 58
10.1.2 Python 60
10.1.3 R 62
10.2 Manual Mode of External Backend 63
10.2.1 Scala 63
10.2.2 Python 66
10.2.3 R 68
10.3 Automatic Mode of External Backend 70
10.3.1 Scala 70
10.3.2 Python 72
10.3.3 R 75
11 Sparkling Water Configuration Properties 77
11.1 Configuration Properties Independent of Selected Backend 77
11.2 Internal Backend Configuration Properties 83
11.3 External Backend Configuration Properties 85
12 Building a Standalone Application 88
13 A Use Case Example 90
13.1 Predicting Arrival Delay in Minutes – Regression 90
14 FAQ 93
15 References 98

 

To read the eBook, click the download link above. 

14-Day Free Access to the H2O AI Hybrid Cloud