Appendix B: Using the Driverless AI Python Client

This section describes how to run Driverless AI using the Python client.

Notes:

  • This is an early and experimental release of the Driverless AI Python client.
  • Python 3.6 is the only supported version.
  • You must install the h2oai_client wheel to your local Python. Contact sales@h2o.ai for information on how to retrieve the h2oai_client wheel.

Running an Experiment

  1. Import the required modules and log in.
import h2oai_client
import numpy as np
import pandas as pd
import requests
import math
from h2oai_client import Client, ModelParameters

address = 'http://ip_where_driverless_is_running:12345'
username = 'username'
password = 'password'
h2oai = Client(address = address, username = username, password = password)
# Be sure to use the same credentials that you use when signing in through the GUI
  1. Upload training and testing datasets from the Driverless AI /data folder.
train_path = '/data/CreditCard/CreditCard-train.csv'
test_path = '/data/CreditCard/CreditCard-test.csv'

train = h2oai.create_dataset_sync(train_path)
test = h2oai.create_dataset_sync(test_path)
  1. Set the target (response) column and any ignored column or columns.
# set the parameters you want to pass to the UI
target = "default payment next month"
drop_cols = ['ID']
  1. Specify the experiment settings. Refer to the Experiment Settings for more information about these settings.
# Pre-set parameters to pass model
is_classification = True
enable_gpus = True
seed=True
scorer_str = 'auc'

# Pre-sent accuracy knobs
accuracy_value = 5
time_value = 5
interpretability = 1
  1. Launch the experiment to run feature engineering and final model training. In addition to the settings previously defined, be sure to also specify the imported training dataset. Adding a test dataset is optional.
experiment = h2oai.start_experiment_sync(ModelParameters(
    dataset_key=train.key,
    testset_key=test.key,
    target_col=target,
    is_classification=is_classification,
    cols_to_drop=drop_cols,
    enable_gpus=enable_gpus,
    seed=seed,
    accuracy=accuracy_value,
    time= time_value,
    interpretability=interpretability,
    scorer=scorer_str
))
  1. View the results for an iteration. Note that the Web UI shows a graph of the iteration scores. You can retrieve the scores of each iteration from the experiment object using the Python client. The example below retrieves the score for the last iteration:
score = experiment.iteration_data[-1].scores # gets the ScoresTable
score = score.score[-1]
print(score)

0.7875823819933607
  1. View the final model score for the train and test datasets. When feature engineering is complete, an ensemble model can be built depending on the accuracy setting. The experiment object also contains the score on the train and test data for this ensemble model.
print("Final Model Score on Train Data: " + str(round(experiment.train_score, 3)))
print("Final Model Score on Test Data: " + str(round(experiment.test_score, 3)))

Final Model Score on Train Data: 0.782
Final Model Score on Test Data: 0.803
  1. Download the test predictions.
h2oai.download(src_path = experiment.test_predictions_path, dest_dir = ".")
'./test_preds.csv'

test_preds = pd.read_csv("./test_preds.csv")
test_preds.head()

default payment next month.1
0       0.514850
1       0.136738
2       0.062433
3       0.481917
4       0.126809

Access an Experiment Object that was Run through the Web UI

It is also possible to use the Python API to examine an experiment that was started through the Web UI using the experiment key.

You can get a pointer to the experiment by referencing the experiment key in the Web UI.

experiment = h2oai.get_model_job("56507f").entity

Score on New Data

You can use the python API to score on new data. This is equivalent to the SCORE ON ANOTHER DATASET button in the Web UI. The example below scores on the test data and then downloads the predictions.

Pass in any dataset that has the same columns as the original training set. If you passed a test set during the H2OAI model building step, the predictions already exist. Its path can be found with experiment.test_predictions_path.

prediction = h2oai.make_prediction_sync(experiment.key, test_path)
pred_path = h2oai.download(prediction.predictions_csv_path, '.')
pred_table = pd.read_csv(pred_path)
pred_table.head()

default payment next month.1
0       0.514850
1       0.136738
2       0.062433
3       0.481917
4       0.126809