Return to page

Machine Learning with Sparkling Water: H2O + Spark

January 2023: Fifth Edition

Contents

SectionTitlePage
1What is H2O?6
2Sparkling Water Introduction8
2.1Typical Use Cases8
2.1.1Model Building8
2.1.2Data Munging9
2.1.3Stream Processing9
2.2Features11
2.3Supported Data Sources11
2.4Supported Data Formats11
2.5Supported Spark Execution Environments12
2.6Sparkling Water Clients12
2.7Sparkling Water Requirements13
3Design14
3.1Data Sharing between H2O and Spark15
3.2H2OContext15
4Starting Sparkling Water17
4.1Setting Up The Environment17
4.2Starting Interactive Shell with Sparkling Water17
4.4Starting Sparkling Water with Internal Backend18
4.4External Backend19
4.4.1Automatic Mode of External Backend19
4.4.2Manual Mode of External Backend on Hadoop21
4.4.3Manual Mode of External Backend on Hadoop (standalone)22
4.5Memory Management24
5Data Manipulation26
5.1Creating H2O Frames26
5.1.1Convert from RDD, DataFrame or Dataset26
5.1.2Creating H2OFrame from an Existing Key27
5.1.3Create H2O Frame Directly27
5.2Converting H2O Frames to Spark entities28
5.2.1Convert to RDD28
5.2.2Convert to DataFrame28
5.3Mapping between H2OFrame And Data Frame Types29
5.4Mapping between H2OFrame and RDD[T] Types30
5.5Using Spark Data Sources with H2OFrame30
5.5.1Reading from H2OFrame30
5.5.2Saving to H2OFrame31
5.5.3Specifying Saving Mode32
6Calling H2O Algorithms33
7Productionizing MOJOs from H2O-337
7.1Loading the H2O-3 MOJOs37
7.2Exporting the loaded MOJO model using Sparkling Water41
7.3Importing the previously exported MOJO model from Sparkling Water41
7.4Accessing additional prediction details41
7.5Customizing the MOJO Settings41
7.6Methods available on MOJO Model42
7.6.1Obtaining Domain Values42
7.6.2Obtaining Model Category42
7.6.3Obtaining Feature Types42
7.6.4Obtaining Feature Importances43
7.6.5Obtaining Scoring History43
7.6.6Obtaining Training Params43
7.6.7Obtaining Metrics43
7.6.8Obtaining Leaf Node Assignments44
7.6.9Obtaining Stage Probabilities44
8Productionizing MOJOs from Driverless AI44
8.1Requirements45
8.2Loading and Score the MOJO45
8.3Predictions Format48
8.4Customizing the MOJO Settings48
8.5Troubleshooting49
9Deployment50
9.1Referencing Sparkling Water50
9.1.1Using Assembly Jar50
9.1.2Using PySparkling Zip51
9.1.3Using the Spark Package51
9.2Target Deployment Environments52
9.2.1Local cluster52
9.2.2On a Standalone Cluster52
9.2.3On a YARN Cluster53
9.3DataBricks Cloud53
9.3.1Creating a Cluster54
9.3.2Running Sparkling Water54
9.3.3Running PySparkling55
9.3.4Running RSparkling56
10Running Sparkling Water in Kubernetes57
10.1Internal Backend57
10.1.1Scala58
10.1.2Python60
10.1.3R62
10.2Manual Mode of External Backend63
10.2.1Scala63
10.2.2Python66
10.2.3R68
10.3Automatic Mode of External Backend70
10.3.1Scala70
10.3.2Python72
10.3.3R75
11Sparkling Water Configuration Properties77
11.1Configuration Properties Independent of Selected Backend77
11.2Internal Backend Configuration Properties83
11.3External Backend Configuration Properties85
12Building a Standalone Application88
13A Use Case Example90
13.1Predicting Arrival Delay in Minutes – Regression90
14FAQ93
15References98

To read the eBook, click the download link above.