Contents

Section Title Page
1 Introduction 6
2 What is H2O? 6
3 Installation 7
3.1 Installation in R 7
3.2 Installation in Python 8
3.3 Pointing to a Different H2O Cluster 9
3.4 Example Code 9
3.5 Citation 10
4 Generalized Linear Models 10
4.1 Model Components 10
4.2 GLM in H2O 11
4.3 Model Fitting 13
4.4 Model Validation 13
4.5 Regularization 14
4.5.1 Lasso and Ridge Regression 14
4.5.2 Elastic Net Penalty 15
4.6 GLM Model Families 15
4.6.1 Linear Regression (Gaussian Family) 15
4.6.2 Logistic Regression (Binomial Family) 17
4.6.3 Fractional Logit Model (Fraction Binomial) 19
4.6.4 Logistic Ordinal Regression (Ordinal Family) 20
4.6.5 Multi-class classification (Multinomial Family) 23
4.6.6 Poisson Models 24
4.6.7 Gamma Models 26
4.6.8 Tweedie Models 27
4.6.9 Negative Binomial Models 30
4.7 Hierarchical GLM 32
4.7.1 Gaussian Family and Random Family in HGLM 33
4.7.2 H2O Implementation 34
4.7.3 Fixed and Random Coefficients Estimation 35
4.7.4 Estimation of Fixed Effect Dispersion Parameter/Variance 35
4.7.5 Estimation of Random Effect Dispersion Parameter/-Variance 35
4.7.6 Fitting Algorithm Overview 35
4.7.7 Linear Mixed Model with Correlated Random Effect 36
4.7.8 HGLM Model Metrics 37
4.7.9 Mapping of Fitting Algorithm to the H2O-3 Implementation 38
5 Building GLM Models in H2O 38
5.1 Classification and Regression 38
5.2 Training and Validation Frames 39
5.3 Predictor and Response Variables 39
5.3.1 Categorical Variables 39
5.4 Family and Link 40
5.5 Regularization Parameters 40
5.5.1 Alpha and Lambda 40
5.5.2 Lambda Search 40
5.6 Solver Selection 43
5.6.1 Solver Details 43
5.6.2 Stopping Criteria 44
5.7 Advanced Features 46
5.7.1 Standardizing Data 46
5.7.2 Auto-remove collinear columns 46
5.7.3 P-Values 47
5.7.4 K-fold Cross-Validation 47
5.7.5 Grid Search Over Alpha 49
5.7.6 Grid Search Over Lambda 50
5.7.7 Offsets 52
5.7.8 Row Weights 52
5.7.9 Coefficient Constraints 52
5.7.10 Proximal Operators 53
6 GLM Model Output 53
6.1 Coefficients and Normalized Coefficients 56
6.2 Model Statistics 57
6.3 Confusion Matrix 59
6.4 Scoring History 59
7 Making Predictions 60
7.1 Batch In-H2O Predictions 60
7.2 Low-latency Predictions using POJOs 63
8 Best Practices 64
8.1 Verifying Model Results 65
9 Implementation Details 66
9.1 Categorical Variables 67
9.1.1 Largest Categorical Speed Optimization 67
9.2 Performance Characteristics 67
9.2.1 IRLSM Solver 67
9.2.2 L-BFGS solver 68
9.3 FAQ 69
10 Appendix: Parameters 69
11 Acknowledgments 73
12 References 73
13 Authors 74

 

Start Your 21-Day Free Trial Today

Get It Now
Desktop img