April 10th, 2019

H2O-3, Sparkling Water and Enterprise Steam Updates

RSS icon RSS Category: Community, Data Science, H2O Release, Technical
Fallback Featured Image

We are excited to announce the new release of H2O Core, Sparkling Water and Enterprise Steam.

Below are some of the new features we have added:

H2O-3

Yates (3.24.0.1) – 3/31/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html

Bug

  • [PUBDEV-6159] – The AutoMLTest.java test suite now runs correctly on a local machine.
  • [PUBDEV-6189] – Fixed an issue in as_date that occurred when the column included NAs.
  • [PUBDEV-6208] – AutoML no longer fails if one of the Stacked Ensemble models is deleted.
  • [PUBDEV-6230] – Removed ellipses after the H2O server link when launching the Python client..
  • [PUBDEV-6231] – In Deep Learning, fixed an issue that occurred when running one-hot-encoding on categoricals.
  • [PUBDEV-6266] – In predictions, fixed an issue that resulted in a “Categorical value out of bounds error” when calling a model.
  • [PUBDEV-6284] – The Python API no longer reverses the labels for positive and negative values in the standardized coefficients plot legend.
  • [PUBDEV-6346] – In R, fixed an issue that cause group_by mean to only calculate one column when multiple columns were specified.
  • [PUBDEV-6350] – Fixed an issue that caused the confusion_matrix method to return matrices for other metrics.
  • [PUBDEV-6357] – Fixed an issue that resulted in a “Categorical value out of bounds error” when calling a model using Python.
  • [PUBDEV-6360] – Improved the error message that displays when a user attempts to modify an Enum/categorical column as if it were a string.
  • [PUBDEV-6367] – Rows that start with a # symbol are no longer dropped during the import process.
  • [PUBDEV-6368] – Fixed an SVM import failure.
  • [PUBDEV-6376] – Fixed an issue that caused the default StackedEnsemble prediction to fail when applied to a test dataset without a response column.
  • [PUBDEV-6379] – Fixed handling of BAD state in CategoricalWrapperVec.

New Feature

  • [PUBDEV-4680] – Added Blending mode to Stacked Ensembles, which can be specified with the `blending_frame` parameter. With Blending mode, you do not use cross-validation preds to train the metalearner. Instead you score the base models on a holdout set and use those predicted values.
  • [PUBDEV-5801] – Model output now includes column names and types.
  • [PUBDEV-5809] – AutoML now includes a max_runtime_secs_per_model option.
  • [PUBDEV-5925] – In GLM, added support for negative binomial family.
  • [PUBDEV-6056] – For GBM and XGBoost models, users can now generate feature contributions (SHAP values).
  • [PUBDEV-6136] – Added support for Generic Models, which provide a means to use external, pretrained MOJO models in H2O for scoring. Currently only GBM, DRF, IF, and GLM MOJO models are supported.
  • [PUBDEV-6180] – Added the blending_frame parameter to Stacked Ensembles in Flow.
  • [PUBDEV-6196] – Added an include_algos parameter to AutoML in the R and Python APIs. Note that in Flow, users can specify exclude_algos only.
  • [PUBDEV-6339] – In the R and Python clients, added a function that calculates the chunk size based on raw size of the data, number of CPU cores, and number of nodes.
  • [PUBDEV-6344] – Added ability to import from Hive using metadata from Metastore.
  • [PUBDEV-6358] – Users can now choose the database where import_sql_select creates a temporary table.
  • [PUBDEV-6365] – Added support for monotonicity constraints for binomial GBMs.
  • [PUBDEV-6374] – Users can now define custom HTTP headers using an `-add_http_header` option.
  • [PUBDEV-6386] – XGBoost MOJO now uses Java predictor by default.

Task

  • [PUBDEV-4982] – Fixed an issue that caused Python tests to sometimes fail when run inside a Docker container.
  • [PUBDEV-5876] – Simplified and improved the GLM COD implementation.

Improvement

  • [PUBDEV-5491] – SQLite support is available via any JDBC driver in streaming mode.
  • [PUBDEV-5993] – Updated Retrofit and okHttp dependencies.
  • [PUBDEV-6129] – Target Encoding is now available in the Python client.
  • [PUBDEV-6176] – Moved StackedEnsembleModel to hex.ensemble packages. In prior versions, this was in a root hex package.
  • [PUBDEV-6188] – Secret key ID and secret key are available for s3:// AWS protocol.
    • This can be done in the R client using: h2o.setS3Credentials(accessKeyId, accesSecretKey)
    • And in Python client using: from h2o.persist import set_s3_credentials set_s3_credentials(access_key_id, secret_access_key)
  • [PUBDEV-6217] – Users can now specify AWS credentials at runtime.
  • [PUBDEV-6254] – The new blending_frame parameter is now available in AutoML.
  • [PUBDEV-6334] – Fixed an error in the Javadoc for the Frame.java sort function.
  • [PUBDEV-6363] – Fixed Hive delegation token generation.
  • [PUBDEV-6388] – Reordered the algorithms train in AutoML and prioritized hardcoded XGBoost models.

Docs

  • [PUBDEV-4977] – Removed FAQ indicating that Java 9 was not yet supported.
  • [PUBDEV-6136] – Added a “Generic Models” chapter to the Algorithms section.
  • [PUBDEV-6179] – Added the blending_frame parameter to Stacked Ensembles documentation.
  • [PUBDEV-6280] – Added information about the Negative Binomial family to the GLM booklet and the user guide.
  • [PUBDEV-6289] – Improved the R and Python client documentation for the `sum` function.
  • [PUBDEV-6331] – Added include_algos,e xclude_algos, max_models, and max_runtime_secs_per_model examples to the Parameters appendix.
  • [PUBDEV-6362] – In the User Guide and R an Python documentation, replaced references to “H2O Cloud” with “H2O Cluster”.
  • [PUBDEV-6375] – Added information about predict_contributions to the Performance and Prediction chapter.
  • [PUBDEV-6381] – In the GBM chapter, noted that monotone_constraints is available for Bernoulli distributions in addition to Gaussian distributions.
  • Improved the GBM Reproducibility FAQ.

Xu (3.22.1.6) – 3/13/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/6/index.html

Bug

  • [PUBDEV-6335] – In GBM, added a check to ensure that monotonicity constraints can only be used when distribution=”gaussian”.
  • [PUBDEV-6342] – Fixed an issue that caused decreasing monotonic constraints to fail to work correctly. Min-Max bounds are now properly propagated to the subtrees.

Improvement

  • [PUBDEV-6343] – Added internal validation of monotonicity of GBM trees.

Docs

Sparking Water:

v2.4.9 – 04/03/2019

Download at: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/9/index.html

Bug

  • SW-1162 – Exception when there is a column with BOOLEAN type in dataset during H2OMOJOModel transformation
  • SW-1177 – In Pysparkling script, setting –driver-class-path influences the environment
  • SW-1178 – Upgrade to h2O 3.24.0.1
  • SW-1180 – Use specific metrics in grid search, in the same way as H2O Grid
  • SW-1181 – Document off heap memory configuration for Spark in Standalone mode/IBM conductor
  • SW-1182 – Fix random project name generation in H2OAutoML Spark Wrapper

New Feature

  • SW-1167 – Expose search_criteria for H2OGridSearch
  • SW-1174 – Expose H2OGridSearch models
  • SW-1183 – Add include Algos to H2o AutoML pipeline stage & ability to ignore XGBoost

Improvement

  • SW-1164 – Add Sparkling Water to Jupyter spark/pyspark kernels in EMR terraform template
  • SW-1171 – Upgrade build to Gradle 5.2.1
  • SW-1175 – Integrate with H2O native hive support

v2.3.26 – 03/15/2019

Download at: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.3/26/index.html

Bug

  • SW-1163 – Expose missing variables in shared TF EMR SW template

Improvement

  • SW-1145 – Start jupyter notebook with Scala & Python Spark in AWS EMR Terraform template
  • SW-1165 – Upgrade to H2O 3.22.1.6

v2.3.25 – 03/07/2019

Download at: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.3/25/index.html

Bug

  • SW-1150 – hc.stop() shows ‘exit’ not defined error
  • SW-1152 – Fix RSparkling in case the jars are being fetched from maven
  • SW-1156 – H2OXgboost pipeline stage does not define updateH2OParams method
  • SW-1159 – Unique project name in automl to avoid sharing one leaderboard
  • SW-1161 – Fix grid search pipeline step on pyspark side

Improvement

  • SW-1052 – Document teraform scripts for AWS
  • SW-1089 – Document using Google Cloud Storage In Sparkling Water
  • SW-1135 – Speed up conversion between sparse spark vectors and h2o frames by using sparse new chunk
  • SW-1141 – Improve terraform templates for AWS EMR and make them part of the release process
  • SW-1147 – Integrate with Spark 2.3.3
  • SW-1149 – Allow login via ssh to created cluster using terraform
  • SW-1153 – Add H2OGridSearch pipeline stage to PySpark
  • SW-1155 – Test GBM Grid Search Scala pipeline step
  • SW-1158 – Generalize H2OGridSearch Pipeline step to support other available algos
  • SW-1160 – Upgrade H2O to 3.22.1.5

Enterprise Steam:

Version 1.4.7 – 04/03/2019

  • Fix Sparkling Water proxy issue with uppercase usernames
  • Improve uploading h2o-3 engines
  • Set SPARK_YARN_MODE correctly based on the Hadoop distribution

Version 1.4.6 – 04/01/2019

  • Added ability to choose H2O-3 Leader Node when starting a cluster
  • Added ability to control the number of clusters a user can spin per cluster profile
  • Added option to select default Sparkling Water backend
  • Added automatic redirection back to login with an expired session cookie
  • Added an ability to auto-assign Steam profiles according to SAML profiles
  • Docs: Add “Before you begin installation” section
  • Docs: Documented steam.yaml configuration options
  • Docs: Updated documentation
  • Fix an issue when Steam was hitting API endpoints of dead clusters
  • Fix and issue when hadoop-unjar files were not deleted from temp directory
  • Fix issue with uppercase usernames and Sparkling Water on Hadoop

Version 1.4.5 – 03/22/2019

  • Added Configurable Steam Web UI timeout (STEAM_WEB_UI_TIMEOUT_MIN)

Version 1.4.4 – 02/20/2019

  • Make log file permissions configurable (STEAM_LOG_PERMISSIONS)
  • H2O: Communicate with cluster using leader node only
  • SW: Added support for Hive tables
  • SW: Disable Spark dynamic allocation for internal backend
  • SW: Bundle and distribute all pysparkling dependencies
  • LDAP group configuration is no longer mandatory
  • Bug fixes for Jupyterhub
  • Bug fixes for Sparkling Water params
  • Bug fixes for CDH5

Please see links below for additional details on H2O & Sparkling Water.

Release Notes:

https://github.com/h2oai/h2o-3/blob/master/Changes.md

http://docs.h2o.ai/sparkling-water/2.3/latest-stable/doc/CHANGELOG.html

https://s3.amazonaws.com/steam-release/enterprise-steam/STEAM-1.4.3.82/docs/user-docs/_build/html/ReleaseNotes.html

H2O & Sparkling Water Documentation:

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html

http://docs.h2o.ai/sparkling-water/2.3/latest-stable/doc/index.html

If you have any questions, please reach out to support@h2o.ai

Thanks,
Venkatesh Yadav

About the Author

venkatesh yadav
Venkatesh Yadav

Software Engineering Leader at heart with a focus on building great teams that delivers amazing products and customer happiness. Venkatesh serves H2O as VP of Customer Success. He joined the company from Adobe Systems, where he held a number of positions in the Software Engineering and Leadership space including his latest role as Sr. Manager, Software Engineering and Product Management with primary focus on Master Data Management and Data Science. Venkatesh played an instrumental Engineering and Product Management leadership role as an “Entrepreneur in Residence” in the various key strategic programs and initiatives like Adobe@Adobe, Adobe.io and Adobe.Data. Experience of managing and working with teams across the globe in US, Canada, Switzerland, Romania, India with a focus on value creation. Prior to Adobe Systems Venkatesh has served technology companies in various engineering roles in companies like Philips, HP and IBM. Venkatesh holds a Bachelor of Commerce degree from Mumbai University India and has successfully completed Product Management program from UC Berkeley and General Business Administration and Management program from McGill University. Connect with Venkatesh (@venkateshai)

Leave a Reply

Exploring the Next Frontier of Automatic Machine Learning with H2O Driverless AI

At H2O.ai, it is our goal to democratize AI by bridging the gap between the

July 28, 2020 - by Jo-Fai Chow
In a World Where… AI is an Everyday Part of Business

Imagine a dramatically deep voice-over saying “In a world where…” This phrase from old movie

July 22, 2020 - by Ellen Friedman, PhD
Running Sparkling Water in Kubernetes

Sparkling Water can now be executed inside the Kubernetes cluster. Sparkling Water provides a Beta

July 10, 2020 - by Jakub Hava
From GLM to GBM – Part 2

How an Economics Nobel Prize could revolutionize insurance and lending Part 2: The Business Value of

July 9, 2020 - by Patrick Hall and Michael Proksch
A Inteligência Artificial está transformando e alavancando negócios. Entenda como e por quê

Você sabia que inteligência artificial e machine learning não são conceitos novos? Pois eles surgiram

June 26, 2020 - by Daniel Garbuglio
On-Ramp to AI

The path to democratize AI starts with one class Artificial Intelligence (AI) is like a superhighway,

June 11, 2020 - by Rafael Coss

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img