Beta Testing

Driverless AI Scorer Tips

By James Medel posted 06-04-2020 09:00


A core capability of H2O Driverless AI is the creation of automatic machine learning modeling pipelines for supervised problems. In addition to the data and the target column to be predicted, the user can pick a scorer. A scorer is a function that takes actual and predicted values for a dataset and returns a number. Looking at this single number is the most common way to estimate the generalization performance of a predictive model on unseen data by comparing the model’s predictions on the dataset with its actual values. There are more detailed ways to estimate the performance of a machine learning model such as residual plots (available on the Diagnostics page in Driverless AI), but we will focus on scorers here.

For a given scorer, Driverless AI optimizes the pipeline to end up with the best possible score for this scorer. The default scorer for regression problems is RMSE (root mean squared error), where 0 is the best possible value. For example, for a dataset containing 4 rows, if actual target values are [1, 1, 10, 0], but predictions are [2, 3, 4, -1], then the RMSE is sqrt((1+4+36+1)/4) and the largest misprediction dominates the overall score (quadratically). Driverless AI will focus on improving the predictions for the third data point, which can be very difficult when hard-to-predict outliers are present in the data. If outliers are not that important to get right, a metric like the MAE (mean absolute error) can lead to better results. For this case, the MAE is (1+2+6+1)/4 and the optimization process will consider all errors equally (linearly). Another scorer that is robust to outliers is RMSLE (root mean square logarithmic error), which is like RMSE but after taking the logarithm of actual and predicted values - however, it is restricted to positive values. For price predictions, scorers such as MAPE (mean absolute percentage error) or MER (median absolute percentage error) are useful, but have problems with zero or small positive values. SMAPE (symmetric mean absolute percentage error) is designed to improve upon that.

For classification problems, the default scorer is either the AUC (area under the receiver operating characteristic curve) or LOGLOSS (logarithmic loss) for imbalanced problems. LOGLOSS focuses on getting the probabilities right (strongly penalizes wrong probabilities), while AUC is designed for ranking problems. Gini is similar to the AUC, but measures the quality of ranking (inequality) for regression problems. For general imbalanced classification problems, AUCPR and MCC are good choices, while F05, F1 and F2 are designed to balance recall against precision.

We highly suggest experimenting with different scorers and to study their impact on the resulting models. Using the Diagnostics page in Driverless AI, all applicable scores can be computed for any given model, no matter which scorer was used during training.

1 view