November 12th, 2019

Accelerate Machine Learning workflows with H2O.ai Driverless AI on Red Hat OpenShift, Enterprise Kubernetes Platform

RSS icon RSS Category: Driverless AI, Kubernetes
Fallback Featured Image

Organizations globally are operationalizing containers and Kubernetes to accelerate Machine Learning lifecycles as these technologies provide data scientists and software developers with much needed agility, flexibility, portability, and scalability to train, test, and deploy ML models in production.

Red Hat OpenShift is the industry’s most comprehensive Kubernetes hybrid cloud platform. It provides the above benefits by leveraging Kubernetes Operators, integrating DevOps capabilities, and integrating with GPU hardware accelerators. Red Hat OpenShift enables better collaboration between data scientists and software developers, accelerating the roll out of intelligent applications across hybrid cloud. 

Kubernetes Operators codify operational knowledge and workflows to automate the install and lifecycle management of containerized applications with Kubernetes. For further details on Red Hat OpenShift Kubernetes Platform for accelerating AI/ML workflows, please visit the AI/ML on OpenShift webpage.

Similarly, H2O.ai presents its enterprise product Driverless AI. An automated machine learning platform designed to provide data science practitioners the ability to quickly train, evaluate, and productionalize new machine learning models..

Driverless AI comes with a powerful suite of built in feature engineering techniques designed by several of the world’s top Kaggle Grandmasters and expert data scientists, for enhancing and enriching a user’s original data, as well as model tuning and stacking recipes for generating the best possible models. In a matter of hours, Driverless AI will test hundreds of new derived features, and a multitude of different models. Saving practitioners hours, days, or months of effort. Finally, the platform can evaluate the resulting model to generate explanations to help practitioners understand what decisions the model made under the hood.

On top of its automatic feature engineering and hyperparameter tuning, Driverless AI applies intelligent feature selection on the original data plus the newly derived, engineered features, to eliminate any unnecessary computations. The resulting model artifact, MOJO, is easily deployable and executable in Java, Python, or R. Its language flexibility, lightweight design, and low latency scoring ability (sub-millisecond) make it easily deployable in Kubernetes environments such as OpenShift that allow for automatic scaling on demand.

This is a great moment to announce that the H2O.ai Driverless AI Kubernetes Operator is now Certified on Red Hat OpenShift. This integration is made possible by engineering collaboration across Red Hat and H2O.ai to help simplify and accelerate turning on instances of Driverless AI, as needed by the data scientists.

Now, let’s walk through the key steps to roll out new instances of Driverless AI on OpenShift.

This is made easier by the implementation of Kubernetes Operators. Red Hat and the community provide a curated list of Operators called OperatorHub.io, and OperatorHub embedded for Red Hat OpenShift enables both community and Red Hat OpenShift Certified Operators. OpenShift Certified Operators have been validated beyond the basic tests performed for the community Operators. H2O.ai has worked in tandem with the Red Hat team to develop the H2O.ai Driverless AI Operator, a Red Hat OpenShift Certified Operator to assist in the deployment of new Driverless AI instances within an OpenShift cluster (Image 1 and 2).

Image 1: H2O.ai Driverless AI Operator can be found in the AI/Machine Learning category of OpenShift OperatorHub

Image 2: Driverless AI Operator can be installed via the OpenShift UI or alternatively using OpenShift command line tool `oc`. Make sure the cluster has proper pull credentials.

Once the H2O Driverless AI Operator has been successfully installed, users with access to the Developer Catalog will have the ability to launch a Driverless AI instance using the running operator (Image 3 and 4). Users will only be expected to update the required input fields: CPU, Memory, Persistent Disk Size, and Driverless AI container image to use in order to launch an instance. The Red Hat Container Catalog has a published, certified Driverless AI container image at: registry.connect.redhat.com/h2oai/driverlessai-rhelubi7:latest, which is recommended. This container image runs the LTS (long term support) version of Driverless AI v1.6.4.

Image 3: Click `create new` link to be redirected to deployment specifications yaml (image 4)

Image 4: Request the necessary resources (CPU, Memory, Persistent Disk Size), as well as state which image of Driverless AI us wish to use by changing the values in the input fields.

Upons submission of the yaml template, the Driverless AI Operator will assume the task of resolving all the requisite processes to successfully launch an instance of Driverless AI. This includes a Deployment, which encompasses the global specifications of how Driverless AI will run within the cluster, a service, which defines how users will interact with the running application, and a pod which houses the running container image (Image 5).

Image 5: Shows all the running processes in the cluster associated with successful deployment of Driverless AI via Driverless AI Operator

By default, the Driverless AI Operator will generate an Ingress service for accessing the running application, but alternatively you can generate a Route, which will provide users with a URL to access the running Driverless AI (Image 6).

Image 6: Screenshot of Route configuration to deploy unencrypted route to the Driverless AI application, encryption can be configured by ticking the `Secure Route` checkbox

Once the Driverless AI Deployment reports that the application has successfully started up and is healthy, users can go to the exposed url to access the Driverless AI UI (Image 7).

Image 7: Login screen of Driverless AI

The above walkthrough provides the quickest way to start up a Driverless AI instance, however, for additional configurations, users can instruct Driverless AI to startup with user configuration overrides by providing a configmap name in the deployment yaml. Obtain a copy of the Driverless AI config.toml and create a configmap in the OpenShift cluster. Insert the name of the configmap into the `config` field of the deployment yaml and Driverless AI will start up with any user configurations seen within the config.toml file. Additionally, if users already have a license key for Driverless AI, the same can be done for the license key. Create a Secret inside the OpenShift cluster and insert the name of the resulting secret into the deployment yaml `licensekey` field. Driverless AI will observe the license key and not request it at runtime (Image 8).

Image 8: Deployment yaml containing additional fields for `licencekey` and `config`

Driverless AI user configurations can be put to use for enabling certain data connectors, whitelisting/blacklisting certain machine learning algorithms, or simply enabling user authentication. 

To wrap things up, organizations can now accelerate Machine Learning workflows with H2O.ai Driverless AI and Red Hat OpenShift Kubernetes Platform. 

For more information regarding Driverless AI, feel free to reach out to H2O.ai via our community slack channel, or visit the website.For more information on AI/ML on Red Hat OpenShift, please visit the website.

About the Author

Nick
Nicholas Png

Nicholas Png is a Partnerships Software Engineer at H2O.ai. Prior to working at H2O, he worked as a Quality Assurance Software Engineer, developing software automation testing. Nicholas holds a degree in Mechanical Engineering, and has experience working with customers across multiple industries, identifying common problems, and designing robust, automated solutions.

Leave a Reply

Interview with Patrick Hall | Machine Learning, H2O.ai & Machine Learning Interpretability

Audio Link: In this episode of Chai Time Data Science, Sanyam Bhutani interviews Patrick Hall, Sr.

February 20, 2020 - by Sanyam Bhutani
Key Takeaways from the 2020 Gartner Magic Quadrant for Data Science and Machine Learning

We are named a Visionary in the Gartner Magic Quadrant for Data Science and Machine

February 17, 2020 - by Rafael Coss
Blink: Data to AI/ML Production Pipeline Code in Just a Few Clicks

You have the data and now want to build a really really good AI/ML model

February 11, 2020 - by Karthik Guruswamy
Speed up your Data Analysis with Python’s Datatable package

A while ago, I did a write up on Python’s Datatable library. The article was an overview

February 5, 2020 - by Parul Pandey
Parallel Grid Search in H2O

H2O-3 is, at its core, a platform for distributed, in-memory computing. On top of the

February 4, 2020 - by Pavel Pscheidl
The Super Bowl and Data Science: Changing the NFL with the Power of Machine Learning

Super Bowl LIV came and went. The San Francisco 49ers vs the Kansas City Chiefs.

January 31, 2020 - by Rafael Coss

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img