July 25th, 2019

A Driverless Approach to Make Forecasting Easy — Part 1

RSS icon RSS Category: Driverless AI

You are from the supply chain department or in a role in charge of creating future estimates on Product Sales, Patient admission, Retail Store Staffing, Energy use, Ticket sales, etc., based on historical data. A common problem is to forecast numbers one week, 4 weeks, 6 months or 1–5 year, etc., in future — basically short term & long term forecasts. Needless to say, well-informed forecasting allows creating optimized budgets, avoid excess inventory & wasteful expenditure, and in general planning for success & profitability.

For data scientists & business analysts, who have done time-series forecasting, some of this may sound familiar on why it causes an inordinate amount of time to get to an acceptable & repeatable solution.

  • Getting the data in a good format is always a common challenge. But that’s just the tip of the iceberg.
  • Understanding seasonality, short term and long term trends, holidays and factoring that in the model.
  • Dealing with categorical data.
  • Creating lag features — Which lag features should I build and use?
  • What algorithms and feature transformers to choose? How do I pick the best one for the business problem? How do I tune the models? How to slide in bleeding-edge algorithms into the prediction process, w/o much effort?
  • How do I explain my future point predictions to the business? My model is predicting that sales will be $1.5M 10 days from today and the rest of the days averaging $ 0.5M? Why is it deciding that way and what did it infer from historical data?
  • How do I constantly retrain my model and make it production deployable?

The above is a common list, if not all. In this blog series, I will try to address some of the above issues. The rest of the topics will be covered in Part 2 of the blog post.

I used H2O.ai’s latest version of Driverless AI 1.7.0 for forecasting Google’s Stock Price for the month of July 2019. It’s a simple univariate forecast and great to see how to do that first.

Disclaimer: I do not recommend that you trade stock or derivatives based on the example here. It’s for illustration only. Future values are never guaranteed or deemed to be reliable by any model, so …

Driverless AI is an Automatic Machine Learning/Automatic Feature Engineering tool that can do time series forecasting, besides regression, classification, etc., It not only has built in time series Kaggle Grandmaster recipes, you can also bring your own algorithms, feature engineering code to enhance the model building process, aka BYOR (Bring Your Own Recipe). Through the custom recipe feature, you can bring in additional algorithms, feature engineering code such as Auto ARIMA, FB Prophet, etc., and run the Automatic ML to predict a target value.

Here’s a link to the open-source BYOR GitHub (we will use some of this in Part2 of the blog post): https://github.com/h2oai/driverlessai-recipes 

For my experiment, I downloaded 5 years of Google Stock Price daily stock price data from Yahoo! Finance portal.

We split the downloaded CSV in an XLS to training and test data sets.

We will build the Driverless AI model on a training data set with daily closing stock price from:

07/23/14 to 06/28/19

Columns Dropped: Open, High, Low, Adj Close, Volume

Target Column: Close

Time Column was: Date

The test data set, has values from:

07/01/19 to 07/23/19

We choose a forecast horizon as 16 days (you can obviously play with this number) and choose SMAPE as the scorer. The Driverless recipe picked LightGBM and XGBoost along with potential feature engineering that could be used in the Automatic ML model/feature selection. Clicking the “Launch Experiment” (the horizontal yellow bar), you will see a screen similar to below after the experiment has finished. Clearly, the features for the final model are Exponential Moving Averages and some Derived Date Features.

We can then Click on “SCORE ON ANOTHER DATASET” and pick the Test Data Set to score and chart the actual vs predicted.

Here’s how my model predicted vs actual on the test data set.

[Close], blue is the original value.

[Close.predicted], green is what the model guessed for the test set, by only learning from the training set and not looking at the test set 🙂

It’s stunning to see with approximately $25 difference, it caught the upward trend for 3 weeks out!

So, how exactly the time series recipe in Driverless AI works? See link here.

The stock price example can be extended to any time series data to create future estimates!

What about Custom Recipes?

When Algorithms and Features compete, your forecast project(s) wins!

In this blog, we did not talk about adding other traditional and popular time series algorithms and transformers like FB Prophet, AutoArima, MACD, etc., Adding those would make Driverless AI try even more algorithms, do model tuning, do evolutionary feature engineering on new features — that would lead to more accuracy in your final model. I plan to cover the Custom Recipe settings for forecasting problems in A Driverless Approach to make Forecasting Easy -Part 2.

Want to play with Driverless AI on your time-series data? Here’s the 21-day trial link

About the Author

Karthik Guruswamy

Karthik is a Principal Pre-sales Solutions Architect with H2O. In his role, Karthik works with customers to define, architect and deploy H2O’s AI solutions in production to bring AI/ML initiatives to fruition.

Karthik is a “business first” data scientist. His expertise and passion have always been around building game-changing solutions - by using an eclectic combination of algorithms, drawn from different domains. He has published 50+ blogs on “all things data science” in Linked-in, Forbes and Medium publishing platforms over the years for the business audience and speaks in vendor data science conferences. He also holds multiple patents around Desktop Virtualization, Ad networks and was a co-founding member of two startups in silicon valley.

Leave a Reply

How a Passion for Numbers Turned This Mechanical Engineer into a Kaggle Grandmaster

In conversation with Sudalai Rajkumar: A Kaggle Double Grandmaster and a Data Scientist at H2O.ai It

January 23, 2020 - by Parul Pandey
How H2O propels data scientists ahead of itself: enhancing Driverless AI models with advanced options, recipes and visualizations

H2O.ai engineers continually innovate and introduce new techniques by adopting latest research, working on cutting

January 6, 2020 - by Gregory Kanevsky
H2O Release 3.28 (Yu)

There’s a new major release of H2O, and it’s packed with new features and fixes! Among

December 20, 2019 - by Michal Kurka
Why you should care about debugging machine learning models

This blog post was originally published here. Authors: Patrick Hall and Andrew Burt For all the excitement about machine learning

December 12, 2019 - by Patrick Hall
Interview with Arno Candel | AutoML | Physics | H2O.ai | CTDS.Show

In this episode, Sanyam Bhutani interviews Dr. Arno Candel: CTO at H2O.ai They talk about Arno’s

December 12, 2019 - by Sanyam Bhutani
How to Effectively Employ an AI Strategy in your Business

Artificial Intelligence has evolved from being a buzz word to a reality today. Companies with

December 11, 2019 - by Parul Pandey

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img