October 14th, 2020

Combining the power of KNIME and H2O.ai in a single integrated workflow

RSS icon RSS Category: AutoML, Community, Driverless AI, Partners, Technical Posts, Tutorials

KNIME and H2O.ai, the two data science pioneers known for their open source platforms, have partnered to further democratize AI. Our approaches are about being open, transparent, and pushing the leading edge of AI. We believe strongly that AI is not for the select few but for everyone. We are taking another step in democratizing AI by integrating our award-winning H2O Driverless AI and KNIME Analytics Platform, to make it even easier, faster, and cheaper to deliver expert data science as a force multiplier for every enterprise.

KNIME and H2O.ai started collaborating in 2017 by integrating H2O-3 and Sparkling Water with a collection of KNIME nodes. By the way, if you want to learn more about this integration, check out the resources at the end of this blog.

Today, we are excited to announce that we have expanded our partnership and collaboration. Now you can now seamlessly use H2O Driverless AI in KNIME via a new KNIME Driverless AI extension available from the KNIME Hub. This new integration empowers data scientists or data analysts to work on machine learning projects faster and more efficiently using automation and state-of-the-art computing power to accomplish tasks that can take humans months in just minutes or hours.

  • Develop an integrated data science workflow in KNIME Analytics Platform, from data discovery, data preparation to production-ready predictive models
  • Deliver the power of automatic machine learning to business analysts, enabling more citizen data scientists with H2O Driverless AI
  • Reduce model deployment times, leveraging H2O Driverless AI and KNIME Server for reliably managing production deployment process

KNIME users can leverage Driverless AI in a workflow to provide automatic feature engineering, model validation, model tuning, model selection, machine learning interpretability, time-series, NLP, computer vision, and automatic pipeline generation for model scoring. H2O Driverless AI provides companies with a data science platform that addresses the needs of various use cases for every enterprise in every industry.

We have been working with a few early adopters to get their feedback. The response has been overwhelmingly positive and a feeling of excitement about the integration and productivity gains. Vision Banco has been a long term user of H2O.ai and KNIME.  The data science team is looking forward to the improved simplification and even more rapid development of data science projects.  Below is a quote by Alejandro Lopes, the Data Science Leader at Vision Banco on how he thinks it will help them:

We have been using KNIME and H2O Driverless AI for years, and we are very excited about this new integration and the automation and simplification that it will bring to our data science workflow.” (Alejandro Lopez, Data Science Leader at Vision Banco)

This blog will provide you more details about the integration, how to get started, how various personas can leverage this integration, a sample workflow, and pointers to further resources.

If you are new to KNIME, you can learn more from the KNIME product page.

If you are new to H2O Driverless AI, explore the product page or tutorials.

The KNIME H2O Driverless AI Extension

In order to use H2O Driverless AI within KNIME Analytics Platform, all you need to do is install the H2O Driverless AI extension, and you’re ready to go. Check this video, if you do not know how to install a KNIME extension.

The integration of H2O Driverless AI in KNIME offers an extensive number of nodes and encapsulating functionalities of the H2O Driverless AI automatic machine learning (AutoML) platform, making it easy to use H2O Driverless AI autoML capabilities from a KNIME workflow without touching any code – each of the H2O Driverless AI nodes looks and feels just like a normal KNIME node – but the workflow reaches out to the high-performance libraries of H2O during execution.

Use Cases By Persona

This new integration between H2O Driverless AI and KNIME helps various personas in the data science life cycle. Below will provide a short overview of key personas and how this new integration improves the workflow and productivity.

Data Engineers

For Data Engineers, this solution enables seamless data preprocessing connected into DriverlessAI using the popular, easy to use, and free KNIME Analytics Platform.  You can also use KNIME Server to provide additional deployment capabilities, automation, collaboration, cloud execution, and IT administration. With the new KNIME to H2O.ai connectors,  customers can do data blending with hundreds of data sources, including Salesforce, Sharepoint, Oracle, SAP, SAP Hana, Snowflake, Spark, DataBricks, Hadoop, Tibco, Tableau, PowerBI, AWS, Azure, and GCP.

Data Scientist

For data scientists and model operation teams, this solution provides additional flexibility by enabling a mix and match of automated and custom machine learning approaches.  Data scientists can now collaborate with business stakeholders, gaining valuable input to achieve the optimal result. Upon initial model creation, they can ensure that it is streamlined using Integrated Deployment from KNIME and the Driverless AI AutoML and MOJO deployment artifacts. The addition of Driverless AI natively within a KNIME workflow now provides data scientists an integrated visual drag and drop ability to create such a pipeline. Data Scientists can now leverage the industry-leading AutoML in Driverless AI to quickly train high quality and explainable models that are production-ready in less time.

Deployment Teams

For Deployment Teams, there is now additional flexibility in how and where the H2O Driverless AI trained models are automatically deployed as workflows, from visualization to being deployed as RESTful services, to web applications, to BI dashboards, to 3rd party tools, and all with a no-code approach.  Teams will now be able to automatically and continuously deploy and update models including automated data access, preparation, and pre-processing of workflows, ensuring that there is no loss in translation between the creation and deployment of the model and that ideal compute resources are utilized for ongoing deployment.

Data Science Team Leaders

For Leaders of Data Science teams, this solution enables you to make the best use of your people, time, and technology resources in order to meet the needs of both the team and the enterprise. It provides an environment which empowers your data science team to use best in class AutoML with other best in class approaches and to collaborate on complex projects with the granular permissions and logging needed for team and project management. Productionize data science applications and services in a way that is transparent, secure, and able to be audited and governed as needed.  The deployment and management functionalities make it easy to productionize data science applications and services and deliver usable, reliable, and reproducible insights for the business.

Line of Business Leaders

This solution provides Line of Business Leaders to have insight into the entire process and data lineage so that you can understand how and why decisions are made from data access to deployment and bring your domain expertise to bear in the process.  This allows you to mitigate risks and ensure the best results are delivered quickly and at scale to drive the desired business outcome.

4 Steps to Getting Started

The 4 Steps to get started with the KNIME Analytics Platform and H2O Driverless AI integration are:

  1. Get the tools
  2. Get KNIME Extension
  3. Configure KNIME to connect to H2O Driverless AI server
  4. Start Building your workflow

Below we will provide a quick overview of each step.

1. Get the tools

Download and install KNIME Analytics Platform

Download, get trial license, and install H2O Driverless AI

If you are interested in trying the Driverless AI integration with KNIME server please fill out this form.

2. Get Driverless AI KNIME Extension

Download and Install Driverless AI KNIME Extension via the KNIME Analytics Platform.

Or get it from the KNIME HUB.

3. Configure KNIME to connect to H2O Driverless AI

You are almost ready to start, now you just need to enter the Driverless AI license key and configure KNIME to connect to H2O Driverless AI. Follow these instructions.

4. Start Building your workflow

Once you have successfully installed the Driverless AI Extension, restart KNIME Analytics Platform and you should see the following nodes in the node repository under KNIME Labs:

Get an overview of how to starting building your flow below and follow the KNIME H2O Driverless AI Integration User Guide

Combining the power of KNIME and H2O in a single workflow example

In this section, we will walk through an example of the major steps of an end-to-end data science workflow using KNIME Analytics Platform and Driverless AI.

Step 1: Import the Driverless AI license

In order to utilize the H2O Driverless Al nodes, you will need to import an H2O Driverless Al license file into your KNIME preferences.  You will find the Driverless AI license key typically under the following path: /opt/h2oai/dai/home/.driverlessai/license.sig.  Copy this file to where your KNIME Analytics Platform is installed. Import this file into KNIME by navigating to File -> Preferences -> KNIME-> H2O Driverless Al and, as shown below:

Uploading Driverless AI license to KNIME

Step 2: Importing Data

KNIME supports a wide array of data types. From flat files to dynamic Spark connections, KNIME can make it simple to read disparate data types and make them work together for use in machine learning algorithms. In the below example, joining a CSV file, two database tables, and a KNIME table is a simple drag and drop process.

Step 3: Data Preparation

KNIME provides a rich set of data source connectors and data preparation nodes with a no-code drag and drop canvas to simplify data access and preparation. This empowers data analysts, data engineers and data scientists to quickly build data preparations flows to prepare, wrangle, clean, join, and filter the data and get it ready for machine learning.  Once the data is prepared it can be connected to Driverless AI to build the machine learning models within the same drag and drop canvas.

Step 4: Building Models with Driverless AI

In order to send KNIME data tables to Driverless AI, connect your workflow to the “Send to Driverless AI” node. Right-click the node and select “Configure” from the context menu.

Example workflow to push data from KNIME Analytics Platform to H2O Driverless AI

Before you push the data to Driverless AI you need to configure the connection.

After you send the data to Driverless AI you can right-click on the “Send to Driverless AI” node and select “Interactive View: H2O Driverless AI Experiment View” to bring up the Driverless AI and use this interface to build an experiment, view AutoReport, and generation Machine Learning Interpretability (MLI) metrics and graphs.

Below is what the Driverless AI UI looks like within KNIME

Step 5: Deploy Model and Score New Data

KNIME can build Machine Learning production workflows to consume the models that were trained.  H2O.ai provides production-ready low latency models and pipelines in the MOJO deployment artifact.  MOJO (stands for Model Object, Optimized) is a standalone, low-latency model object designed to be easily embeddable in production environments.  Add an H2O Driverless AI MOJO Predictor node to score data within a KNIME Workflow via drag and drop interface.

Conclusion

The expanded integration between H2O.ai and KNIME brings together all-encompassing, intuitive, automated machine learning from H2O.ai with the guided analytics from KNIME. Customers of H2O.ai and KNIME can now:

  • Develop an integrated data science workflow in KNIME Analytics Platform and KNIME Server, from data discovery, data preparation to production-ready predictive models
  • Deliver the power of automatic machine learning to business analysts, enabling more citizen data scientists with H2O Driverless AI
  • Reduce model deployment times, leveraging H2O Driverless AI and KNIME Server for reliably managing workflow, the model creation process, and production deployment

Additional Resources

Blogs

KNIME H2O.ai Extensions

Community

  • H2O Machine Learning with KNIME Analytics Platform – Christian Dietz – H2O AI World London (Slides)(Video)
  • Meetup: Leveraging H2O Machine Learning with KNIME Analytics Platform – Paolo Tamagnini, Marten Pfannenschmidt
  • H2O in KNIME: Integrating High Performance Machine Learning – Jo-Fai Chow (H2O.ai), Marten Pfannenschmidt (KNIME), Christian Dietz (KNIME)

Docs

Partner Pages

About the Authors

Rafael Coss

Rafael Coss is a Community and Partner Maker at H2O.ai. Prior to joining H2O.ai, he was technical marketing and community Director and a developer advocate at Hortonworks. He was also the DataWorks Summit Program Co-Chair for the past 3 years. Prior to Hortonworks he was a Senior Solution Architect and Manager of IBM’s WW Big Data Enablement team. At IBM he was responsible for the technical product enablement for BigInsights and Streams. Previously, he held several other positions in IBM, where he worked on tools, XML db, federated db and Object-Relational db.

Stefan
Stefan Pacinda

Stefan Pacinda is a solution architect at H2O.ai. Located in Prague, Czech Republic, he is responsible for making sure H2O.ai prospects and customers adopt Machine Learning solutions and implement them within their IT infrastructure - both on premise or in the cloud. Prior to joining H2O.ai, he was working at Hewlett Packard, HPE, and Microfocus in the engineering team to build Service Virtualization.

Leave a Reply

Building an AI Aware Organization

Responsible AI is paramount when we think about models that impact humans, either directly or

October 26, 2020 - by
Making AI a Reality

This blog post focuses on the content discussed in more depth in the free ebook

October 16, 2020 - by Ellen Friedman, PhD
H2O on Kubernetes using Helm

Deploying real-world applications using bare YAML files to Kubernetes is a rather complex task, and

October 16, 2020 - by Pavel Pscheidl
H2O Release 3.32 (Zermelo)

There’s a new major release of H2O, and it’s packed with new features and fixes! Among

October 14, 2020 - by Michal Kurka
The Challenges and Benefits of AutoML

Machine Learning and Artificial Intelligence have revolutionized how organizations are utilizing their data. AutoML or

October 14, 2020 - by Eve-Anne Tréhin
5 Key Elements to Detecting Fraud Quicker With AI

The number of transactions using electronic financial instruments has been increasing by about 23% year

October 13, 2020 - by Ashrith Barthur

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img