July 25th, 2019

A Driverless Approach to Make Forecasting Easy — Part 1

RSS icon RSS Category: Driverless AI

You are from the supply chain department or in a role in charge of creating future estimates on Product Sales, Patient admission, Retail Store Staffing, Energy use, Ticket sales, etc., based on historical data. A common problem is to forecast numbers one week, 4 weeks, 6 months or 1–5 year, etc., in future — basically short term & long term forecasts. Needless to say, well-informed forecasting allows creating optimized budgets, avoid excess inventory & wasteful expenditure, and in general planning for success & profitability.

For data scientists & business analysts, who have done time-series forecasting, some of this may sound familiar on why it causes an inordinate amount of time to get to an acceptable & repeatable solution.

  • Getting the data in a good format is always a common challenge. But that’s just the tip of the iceberg.
  • Understanding seasonality, short term and long term trends, holidays and factoring that in the model.
  • Dealing with categorical data.
  • Creating lag features — Which lag features should I build and use?
  • What algorithms and feature transformers to choose? How do I pick the best one for the business problem? How do I tune the models? How to slide in bleeding-edge algorithms into the prediction process, w/o much effort?
  • How do I explain my future point predictions to the business? My model is predicting that sales will be $1.5M 10 days from today and the rest of the days averaging $ 0.5M? Why is it deciding that way and what did it infer from historical data?
  • How do I constantly retrain my model and make it production deployable?

The above is a common list, if not all. In this blog series, I will try to address some of the above issues. The rest of the topics will be covered in Part 2 of the blog post.

I used H2O.ai’s latest version of Driverless AI 1.7.0 for forecasting Google’s Stock Price for the month of July 2019. It’s a simple univariate forecast and great to see how to do that first.

Disclaimer: I do not recommend that you trade stock or derivatives based on the example here. It’s for illustration only. Future values are never guaranteed or deemed to be reliable by any model, so …

Driverless AI is an Automatic Machine Learning/Automatic Feature Engineering tool that can do time series forecasting, besides regression, classification, etc., It not only has built in time series Kaggle Grandmaster recipes, you can also bring your own algorithms, feature engineering code to enhance the model building process, aka BYOR (Bring Your Own Recipe). Through the custom recipe feature, you can bring in additional algorithms, feature engineering code such as Auto ARIMA, FB Prophet, etc., and run the Automatic ML to predict a target value.

Here’s a link to the open-source BYOR GitHub (we will use some of this in Part2 of the blog post): https://github.com/h2oai/driverlessai-recipes 

For my experiment, I downloaded 5 years of Google Stock Price daily stock price data from Yahoo! Finance portal.

We split the downloaded CSV in an XLS to training and test data sets.

We will build the Driverless AI model on a training data set with daily closing stock price from:

07/23/14 to 06/28/19

Columns Dropped: Open, High, Low, Adj Close, Volume

Target Column: Close

Time Column was: Date

The test data set, has values from:

07/01/19 to 07/23/19

We choose a forecast horizon as 16 days (you can obviously play with this number) and choose SMAPE as the scorer. The Driverless recipe picked LightGBM and XGBoost along with potential feature engineering that could be used in the Automatic ML model/feature selection. Clicking the “Launch Experiment” (the horizontal yellow bar), you will see a screen similar to below after the experiment has finished. Clearly, the features for the final model are Exponential Moving Averages and some Derived Date Features.

We can then Click on “SCORE ON ANOTHER DATASET” and pick the Test Data Set to score and chart the actual vs predicted.

Here’s how my model predicted vs actual on the test data set.

[Close], blue is the original value.

[Close.predicted], green is what the model guessed for the test set, by only learning from the training set and not looking at the test set 🙂

It’s stunning to see with approximately $25 difference, it caught the upward trend for 3 weeks out!

So, how exactly the time series recipe in Driverless AI works? See link here.

The stock price example can be extended to any time series data to create future estimates!

What about Custom Recipes?

When Algorithms and Features compete, your forecast project(s) wins!

In this blog, we did not talk about adding other traditional and popular time series algorithms and transformers like FB Prophet, AutoArima, MACD, etc., Adding those would make Driverless AI try even more algorithms, do model tuning, do evolutionary feature engineering on new features — that would lead to more accuracy in your final model. I plan to cover the Custom Recipe settings for forecasting problems in A Driverless Approach to make Forecasting Easy -Part 2.

Want to play with Driverless AI on your time-series data? Here’s the 21-day trial link

About the Author

Karthik Guruswamy

Karthik is a Principal Pre-sales Solutions Architect with H2O. In his role, Karthik works with customers to define, architect and deploy H2O’s AI solutions in production to bring AI/ML initiatives to fruition.

Karthik is a “business first” data scientist. His expertise and passion have always been around building game-changing solutions - by using an eclectic combination of algorithms, drawn from different domains. He has published 50+ blogs on “all things data science” in Linked-in, Forbes and Medium publishing platforms over the years for the business audience and speaks in vendor data science conferences. He also holds multiple patents around Desktop Virtualization, Ad networks and was a co-founding member of two startups in silicon valley.

Leave a Reply

AI-Driven Predictive Maintenance with H2O Hybrid Cloud

According to a study conducted by Wall Street Journal, unplanned downtime costs industrial manufacturers an

August 2, 2021 - by Parul Pandey
What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today