Beta Testing

Driverless AI Time Series Tips

By James Medel posted 06-04-2020 09:00


H2O Driverless AI handles time-series forecasting problems out of the box.

All you need to do when starting a time-series experiment is to provide a regular columnar dataset containing your features. Then pick a target column and also pick a “time column” - a designated column containing time stamps for every record (row) such as “April 10 2019 09:13:41” or “2019/04/10”. If you have a test set for which you want predictions for every record, make sure to provide future time stamps and features as well.

In most cases, that’s it. You can launch the experiment and let Driverless AI do the rest. It will even auto-detect multiple time series in the same dataset for different groups such as weekly sales for stores and departments (by finding the columns that identify stores and departments to group by). Driverless AI will also auto-detect the time period including potential gaps during weekends, as well as the forecast horizon, a possible time gap between training and testing time periods (to optimize for deployment delay) and even keeps track of holiday calendars. Of course, it automatically creates multiple causal time-based validation splits (sliding time windows) for proper validation, and incorporates many other related grand-master recipes such as automatic target and non-target lag feature generation as well as interactions between lags, first and second derivatives and exponential smoothing.

  • If you find that the automatic lag-based time-series recipe isn’t performing well for your dataset, we recommend that you try to disable the creation of lag-based features by disabling “Time-series lag-based recipe” in the expert settings. This will lead to regular feature engineering but with time-based causal validation splits. Especially for small datasets and short forecast periods, this can lead to better results.

  • If the target column is present in the test set and has partially filled information (non-missing values), then Driverless AI will automatically augment the model with those future target values to make better predictions. This can be used to extend the usable lifetime of the model into the future without the need for retraining by providing past known outcomes. Contact us if you’re interested in learning more about test-time augmentation.

  • For now, training and test datasets should have the same input features available, so think about which of the predictors (input features) will be available during production time and drop the rest (or create your own lag features that can be available to both train and test sets).

  • For datasets that are non-stationary in time, create a test set from the last temporal portion of data, and create time-based features. This allows the model to be optimized for your production scenario.

  • We are working on further improving many aspects of our time-series recipe. For example, we will add support to automatically generate lags for features that are only available in the training set, but not in the test set, such as environmental or economic factors. We’ll also improve the performance of back-testing using rolling windows.

  • In 1.7.x, you will have the option to bring your own recipes (BYOR) for features, models and scorers, and that includes time-series recipes! We are very excited about that. Please contact us if you are interested in learning more about BYOR.