Appendix B: Using the Driverless AI Python Client¶
This section describes how to run Driverless AI using the Python client.
- This is an early and experimental release of the Driverless AI Python client.
- Python 3.6 is the only supported version.
- You must install the h2oai_client wheel to your local Python. Contact firstname.lastname@example.org for information on how to retrieve the h2oai_client wheel.
Running an Experiment¶
- Import the required modules and log in.
import h2oai_client import numpy as np import pandas as pd import requests import math from h2oai_client import Client, ModelParameters address = 'http://ip_where_driverless_is_running:12345' username = 'username' password = 'password' h2oai = Client(address = address, username = username, password = password) # Be sure to use the same credentials that you use when signing in through the GUI
- Upload training and testing datasets from the Driverless AI /data folder.
train_path = '/data/CreditCard/CreditCard-train.csv' test_path = '/data/CreditCard/CreditCard-test.csv' train = h2oai.create_dataset_sync(train_path) test = h2oai.create_dataset_sync(test_path)
- Set the target (response) column and any ignored column or columns.
# set the parameters you want to pass to the UI target = "default payment next month" drop_cols = ['ID']
- Specify the experiment settings. Refer to the Experiment Settings for more information about these settings.
# Pre-set parameters to pass model is_classification = True enable_gpus = True seed=True scorer_str = 'auc' # Pre-sent accuracy knobs accuracy_value = 5 time_value = 5 interpretability = 1
- Launch the experiment to run feature engineering and final model training. In addition to the settings previously defined, be sure to also specify the imported training dataset. Adding a test dataset is optional.
experiment = h2oai.start_experiment_sync(ModelParameters( dataset_key=train.key, testset_key=test.key, target_col=target, is_classification=is_classification, cols_to_drop=drop_cols, enable_gpus=enable_gpus, seed=seed, accuracy=accuracy_value, time= time_value, interpretability=interpretability, scorer=scorer_str ))
- View the results for an iteration. Note that the Web UI shows a graph of the iteration scores. You can retrieve the scores of each iteration from the experiment object using the Python client. The example below retrieves the score for the last iteration:
score = experiment.iteration_data[-1].scores # gets the ScoresTable score = score.score[-1] print(score) 0.7875823819933607
- View the final model score for the train and test datasets. When feature engineering is complete, an ensemble model can be built depending on the accuracy setting. The experiment object also contains the score on the train and test data for this ensemble model.
print("Final Model Score on Train Data: " + str(round(experiment.train_score, 3))) print("Final Model Score on Test Data: " + str(round(experiment.test_score, 3))) Final Model Score on Train Data: 0.782 Final Model Score on Test Data: 0.803
- Download the test predictions.
h2oai.download(src_path = experiment.test_predictions_path, dest_dir = ".") './test_preds.csv' test_preds = pd.read_csv("./test_preds.csv") test_preds.head() default payment next month.1 0 0.514850 1 0.136738 2 0.062433 3 0.481917 4 0.126809
Access an Experiment Object that was Run through the Web UI¶
It is also possible to use the Python API to examine an experiment that was started through the Web UI using the experiment key.
You can get a pointer to the experiment by referencing the experiment key in the Web UI.
experiment = h2oai.get_model_job("56507f").entity
Score on New Data¶
You can use the python API to score on new data. This is equivalent to the SCORE ON ANOTHER DATASET button in the Web UI. The example below scores on the test data and then downloads the predictions.
Pass in any dataset that has the same columns as the original training set. If you passed a test set during the H2OAI model building step, the predictions already exist. Its path can be found with
prediction = h2oai.make_prediction_sync(experiment.key, test_path) pred_path = h2oai.download(prediction.predictions_csv_path, '.') pred_table = pd.read_csv(pred_path) pred_table.head() default payment next month.1 0 0.514850 1 0.136738 2 0.062433 3 0.481917 4 0.126809