December 10th, 2018

For Today’s BI Analyst – Accelerating your AI/ML efforts with Driverless AI

RSS icon RSS Category: Data Science, Driverless AI

Whether you are starting out as a novice data scientist or a veteran in AI and Machine Learning, modern tools can guide you in creating some of the best models from your data. Not to mention, ease of moving models to production.

Also don’t forget the experienced BI Analysts in your organization, who wants to play with data science, only to be overwhelmed by the jargon and complexities that come with Machine Learning. How do I know what I’m doing is correct and that I didn’t miss something important in the process? How can I get help?

Most data scientists and analysts take weeks and months to build complex models and they might back away because of the lack of accuracy, iterative process of creating useful features, choice of algorithms, and understanding the nuances of each algorithm, languages, and frameworks. that takes time to get modeling right. For example:

  • Should I use a tree-based or neural network?
  • Keras or PyTorch? Python or R?
  • How about the classical GLM approach?
  • What about the settings for each of the algorithm?
  • After the “final complex ensemble” model pipeline is built, how do I find out what features are important to decision making? Can I be confident to share it with business?
  • How do I configure my GPUs right?

Those are the questions advanced Auto AI/ML tools like H2O.ai’s Driverless AI addresses. While navigating the maze of possibilities of tuning parameters, checking distribution, outliers, missing data, it also tries different types of algorithms to learn about the data and make decisions on building winning models. Driverless AI creates new features from the original features, yet keeping the explainability of the models grounded to the unique features for interpretation.

How about global explanation vs. local explanations for every single row that’s scored on the test data set? Driverless AI does that with its MLI (Machine Learning Interpretation) feature steering away from the black box trap every data scientist walks into when aiming for higher accuracy.

You don’t have to compromise higher AI model Accuracy with difficulty in Explanations …

Ease of Deployment & Scoring Performance

Driverless AI creates scoring-artifacts that you can put in production after the model is built. The MOJO JAVA code that’s generated can be used for real-time scoring as well as batch scoring in the form of Java UDFs in Hive, Relational Database, etc., Insert GitHub between Driverless AI and your production DevOps deployment, you can have a versioned model pipeline.

Shortening the Model Deployment cycle

For AI models to be production quality, time to deploy models is very crucial to keep the models fresh and performing. If it takes few weeks or months to build models, things like data drift or concept drift could potentially set in leading to bad predictions. Driverless AI models usually are built very quickly sometimes minutes to a few hours thanks to algorithms optimized to run on GPUs. So you can fully exploit GPU parallelism and even run multiple projects at the same time.

Jupyter Notebook and Driverless AI

While coding is not necessary to build AI models, there is a Python interface to interact with Driverless AI to do::

  • Packaging and shipping data frames to Driverless Instance
  • Set Accuracy etc., on how long you want to iterate to do feature engineering, feature evolution, pipeline building, etc.,
  • Download java and python scoring code when the model is done
  • Download experiment summary and MLI artifacts

With Python, you can incorporate AI modeling and scoring into your existing automation or REST framework. A data scientist can use for instance use pandas for munging the data from a data source and create features, cleanup, some basic transformations in addition to what Driverless AI would do automatically.

Going from BI to AI with Driverless …

While BI tools are great for tracking KPI, viewing trends etc., every analyst and data scientist aspirant is only a few steps away to take the similar data and do AI modeling and prediction – with very basic data science knowledge. As an analyst, if you are looking at 10s and 100s of BI charts/reports to understand the connections of variables to certain outcomes, how about having an AI tool figure it out for you often, helping you drive towards better and sure-footed decisions?

About me: I am a Solution Architect/Data Scientist with H2O.ai, helping customers get to the finish line quickly with their AI initiatives.

Some Useful links:

H2O’s Driverless AI website
Download a 21-day trial from here
Documentation

About the Author

Karthik Guruswamy

Karthik is a Principal Pre-sales Solutions Architect with H2O. In his role, Karthik works with customers to define, architect and deploy H2O’s AI solutions in production to bring AI/ML initiatives to fruition.

Karthik is a “business first” data scientist. His expertise and passion have always been around building game-changing solutions - by using an eclectic combination of algorithms, drawn from different domains. He has published 50+ blogs on “all things data science” in Linked-in, Forbes and Medium publishing platforms over the years for the business audience and speaks in vendor data science conferences. He also holds multiple patents around Desktop Virtualization, Ad networks and was a co-founding member of two startups in silicon valley.

Leave a Reply

What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow
Regístrese para su prueba gratuita y podrá explorar H2O AI Hybrid Cloud

Recientemente, lanzamos nuestra prueba gratuita de 14 días de H2O AI Hybrid Cloud, lo que

May 17, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today