Running an Experiment

  1. After Driverless AI is installed and started, open a Chrome browser and navigate to <server>:12345.
Note: Driverless AI is only supported on Google Chrome.
  1. The first time you log in to Driverless AI, you will be prompted to read and accept the Evaluation Agreement. You must accept the terms before continuing. Review the agreement, then click I agree to these terms to continue.
  2. Log in by entering unique credentials. For example:
Username: h2oai
Password: h2oai

Note that these credentials do not restrict access to Driverless AI; they are used to tie experiments to users. If you log in with different credentials, for example, then you will not see any previously run experiments.

  1. As with accepting the Evaluation Agreement, the first time you log in, you will be prompted to enter your License Key. Click the Enter License button, then paste the License Key into the License Key entry field. Click Save to continue. This license key will be saved in the host machine’s /license folder that was created during installation.
Note: Contact sales@h2o.ai for information on how to purchase a Driverless AI license.
  1. The Home page appears, showing all experiments that have previously been run. Start a new experiment and/or add datasets by clicking the New Experiment button.
New experiment
  1. Click the Select or import a dataset button, then click the Browse button at the bottom of the screen. In the Search for files field, enter the location for the dataset. Note that Driverless AI autofills the browse line as type in the file location. When you locate the file, select it, then click Import at the bottom of the screen.
Search for file

Note: To import additional datasets, click the Show Experiments link in the top-right corner of the UI, then click New Experiment again to browse and add another dataset.

  1. Optionally specify whether to drop any columns (for example, an ID column).
  2. Optionally specify a test dataset. Keep in mind that the test dataset must have the same number of columns as the training dataset.
  3. Specify the target (response) column.
Select target column
  1. When the target column is selected, Driverless AI automatically provides the target column type and the number of rows. If this is a classification problem, then the UI shows unique and frequency statistics for numerical columns. If this is a regression problem, then the UI shows the dataset mean and standard deviation values. At this point, you can configure the following experiment settings. Refer to the Experiment Settings section that follows for more information about these settings.
  • Accuracy value (defaults to 5)
  • Time setting (defaults to 5)
  • Interpretability of the model (defaults to 5)
  • Specify the scorer to use for this experiment. A scorer value is not selected by default.

Additional settings:

  • If this is a classification problem, then click the Classification button.
  • Click the Reproducible button to build this with a random seed.
  • Specify whether to enable GPUs. (Note that this option is ignored on CPU-only systems.)
Experiment settings
  1. Click Launch Experiment. This starts the Driverless AI feature engineering process.

As the experiment runs, a running status displays in the upper middle portion of the UI. In addition the status, the UI also displays details about the dataset, the iteration score (internal validation) for each cross validation fold along with any specified scorer value, the variable importance values, and CPU/Memory and GPU Usage information.

You can stop experiments that are currently running. Click the Finish button to end the experiment at its current spot and build a scoring package.

Experiment

Experiment Settings

This section describes the settings that are available when running an experiments.

Test Data

Test data is used to create test predictions only. This dataset is not used for model scoring.

Dropped Columns

Dropped columns are columns that you do not want to be used as predictors in the experiment.

Accuracy

The following table describes how the Accuracy value affects a Driverless AI experiment.
Accuracy Max Rows Ensemble Level Target Transformation Tune Parameters Num Individuals CV Folds Only First CV Model Strategy
1 100K 0 False False Default 3 True None
2 500K 0 False False Default 3 True None
3 1M 1 False False Default 3 True None
4 2.5M 1 False False Default 3 True None
5 5M 1 True False Default 3 True None
6 10M 2 True True Default 3 True FS
7 20M 2 True True 4 4 False FS
8 20M 2 True True 4 4 False FS
9 20M 2 True True 4 4 False FS
10 None 2 True True 8 4 False FS

The list below includes more information about the parameters that are used when calculating accuracy.

  • Max Rows: The maximum number of rows to use in model training
  • For classification, stratified random sampling is performed
  • For regression, random sampling is perfoemd
  • Ensemble Level: The level of ensembling done
  • 0: single final model
  • 1: 3 3-fold final models ensembled together
  • 2: 5 5-fold final models ensembled together
  • Target Transformation: Try target transformations and choose the transformation that has the best score
  • Possible transformations: identity, log, square, square root, inverse, Anscombe, logit, sigmoid
  • Tune Parameters: Tune the parameters of the XGBoost model
  • Only max_depth is tuned, and the range is 3 to 10.
  • Max depth chosen by penalized_score, which is a combination of the model’s accuracy and complexity.
  • Num Individuals: The number of individuals in the population for the genetic algorithms
  • Each individual is a gene. The more genes, the more combinations of features are tried.
  • Default is automatically determined. Typical values are 4 or 8.
  • CV Folds: The number of cross validation folds done for each model
  • If the problem is a classification problem, then stratified folds are created.
  • Only First CV Model: Equivalent to splitting data into a training and testing set
  • Example: Setting CV Folds to 3 and Only First CV Model = True means you are splitting the data into 66% training and 33% testing.
  • Strategy: Feature selection strategy
  • None: No feature selection
  • FS: Feature selection permutations

Time

Time Epochs
1 10
2 20
3 30
4 40
5 50
6 100
7 150
8 200
9 300
10 500

Interpretability

Interpretability Strategy
<= 5 None
> 5 FS