July 25th, 2019

A Driverless Approach to Make Forecasting Easy — Part 1

RSS icon RSS Category: Driverless AI

You are from the supply chain department or in a role in charge of creating future estimates on Product Sales, Patient admission, Retail Store Staffing, Energy use, Ticket sales, etc., based on historical data. A common problem is to forecast numbers one week, 4 weeks, 6 months or 1–5 year, etc., in future — basically short term & long term forecasts. Needless to say, well-informed forecasting allows creating optimized budgets, avoid excess inventory & wasteful expenditure, and in general planning for success & profitability.

For data scientists & business analysts, who have done time-series forecasting, some of this may sound familiar on why it causes an inordinate amount of time to get to an acceptable & repeatable solution.

  • Getting the data in a good format is always a common challenge. But that’s just the tip of the iceberg.
  • Understanding seasonality, short term and long term trends, holidays and factoring that in the model.
  • Dealing with categorical data.
  • Creating lag features — Which lag features should I build and use?
  • What algorithms and feature transformers to choose? How do I pick the best one for the business problem? How do I tune the models? How to slide in bleeding-edge algorithms into the prediction process, w/o much effort?
  • How do I explain my future point predictions to the business? My model is predicting that sales will be $1.5M 10 days from today and the rest of the days averaging $ 0.5M? Why is it deciding that way and what did it infer from historical data?
  • How do I constantly retrain my model and make it production deployable?

The above is a common list, if not all. In this blog series, I will try to address some of the above issues. The rest of the topics will be covered in Part 2 of the blog post.

I used H2O.ai’s latest version of Driverless AI 1.7.0 for forecasting Google’s Stock Price for the month of July 2019. It’s a simple univariate forecast and great to see how to do that first.

Disclaimer: I do not recommend that you trade stock or derivatives based on the example here. It’s for illustration only. Future values are never guaranteed or deemed to be reliable by any model, so …

Driverless AI is an Automatic Machine Learning/Automatic Feature Engineering tool that can do time series forecasting, besides regression, classification, etc., It not only has built in time series Kaggle Grandmaster recipes, you can also bring your own algorithms, feature engineering code to enhance the model building process, aka BYOR (Bring Your Own Recipe). Through the custom recipe feature, you can bring in additional algorithms, feature engineering code such as Auto ARIMA, FB Prophet, etc., and run the Automatic ML to predict a target value.

Here’s a link to the open-source BYOR GitHub (we will use some of this in Part2 of the blog post): https://github.com/h2oai/driverlessai-recipes 

For my experiment, I downloaded 5 years of Google Stock Price daily stock price data from Yahoo! Finance portal.

We split the downloaded CSV in an XLS to training and test data sets.

We will build the Driverless AI model on a training data set with daily closing stock price from:

07/23/14 to 06/28/19

Columns Dropped: Open, High, Low, Adj Close, Volume

Target Column: Close

Time Column was: Date

The test data set, has values from:

07/01/19 to 07/23/19

We choose a forecast horizon as 16 days (you can obviously play with this number) and choose SMAPE as the scorer. The Driverless recipe picked LightGBM and XGBoost along with potential feature engineering that could be used in the Automatic ML model/feature selection. Clicking the “Launch Experiment” (the horizontal yellow bar), you will see a screen similar to below after the experiment has finished. Clearly, the features for the final model are Exponential Moving Averages and some Derived Date Features.

We can then Click on “SCORE ON ANOTHER DATASET” and pick the Test Data Set to score and chart the actual vs predicted.

Here’s how my model predicted vs actual on the test data set.

[Close], blue is the original value.

[Close.predicted], green is what the model guessed for the test set, by only learning from the training set and not looking at the test set 🙂

It’s stunning to see with approximately $25 difference, it caught the upward trend for 3 weeks out!

So, how exactly the time series recipe in Driverless AI works? See link here.

The stock price example can be extended to any time series data to create future estimates!

What about Custom Recipes?

When Algorithms and Features compete, your forecast project(s) wins!

In this blog, we did not talk about adding other traditional and popular time series algorithms and transformers like FB Prophet, AutoArima, MACD, etc., Adding those would make Driverless AI try even more algorithms, do model tuning, do evolutionary feature engineering on new features — that would lead to more accuracy in your final model. I plan to cover the Custom Recipe settings for forecasting problems in A Driverless Approach to make Forecasting Easy -Part 2.

Want to play with Driverless AI on your time-series data? Here’s the 21-day trial link

About the Author

Karthik Guruswamy

Karthik is a Principal Pre-sales Solutions Architect with H2O. In his role, Karthik works with customers to define, architect and deploy H2O’s AI solutions in production to bring AI/ML initiatives to fruition.

Karthik is a “business first” data scientist. His expertise and passion have always been around building game-changing solutions - by using an eclectic combination of algorithms, drawn from different domains. He has published 50+ blogs on “all things data science” in Linked-in, Forbes and Medium publishing platforms over the years for the business audience and speaks in vendor data science conferences. He also holds multiple patents around Desktop Virtualization, Ad networks and was a co-founding member of two startups in silicon valley.

Leave a Reply

Introducing H2O AI Hybrid Cloud

Organizations have made large investments in modernizing their data infrastructure and operations, but most still

January 26, 2021 - by Benjamin Cox and Jo-Fai Chow
Using AI to unearth the unconscious bias in job descriptions

“Diversity is the collective strength of any successful organization Unconscious Bias in Job Descriptions Unconscious bias affects

January 19, 2021 - by Parul Pandey and Shivam Bansal
H2O Driverless AI 1.9.1: Continuing to Push the Boundaries for Responsible AI

At H2O.ai, we have been busy. Not only do we have our most significant new

January 18, 2021 - by Benjamin Cox
Meet the Data Scientist who just cannot stop winning on Kaggle.

In conversation with Philipp Singer: A Data Scientist, Kaggle Double Grandmaster, and a Ph.D. in

January 15, 2021 - by Parul Pandey
Liqui.do Speeds Credit Scoring for Fair Lending with H2O.ai

Liqui.do is a technological and innovative company developing a platform for leasing equipment for small

January 12, 2021 - by Eve-Anne Tréhin
New Improvements in H2O 3.32.0.2

There is a new minor release of H2O that introduces two useful improvements to our

December 17, 2020 - by Veronika Maurerova

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img