Driverless AI Tips for Running an Experiment

By James Medel posted 06-04-2020 09:02

H2O Driverless AI is an automatic machine learning platform designed to create highly accurate modeling pipelines from tabular training data. The predictive performance of the pipeline is a function of both the training data and the parameters of the pipeline (details of feature engineering and modeling). During an experiment, Driverless AI automatically tunes these parameters by scoring candidate pipelines on held out (“validation”) data. This important validation data is either provided by the user (for experts) or automatically created (random, time-based or fold-based) by Driverless AI. Once a final pipeline has been created, it should be scored on yet another held out dataset (“test data”) to estimate its generalization performance. Understanding the origin of the training, validation and test datasets (“the validation scheme”) is critical for success with machine learning, and we welcome your feedback and suggestions to help us create the right validation schemes for your use cases.