November 19th, 2020

Automate your Model Documentation using H2O AutoDoc

RSS icon RSS Category: Data Science, Driverless AI

Create model documentation for Supervised learning models in H2O-3 and Scikit-Learn — in minutes.

The Federal Reserve’s 2011 guidelines state that without adequate documentation, model risk assessment and management would be ineffective. A similar requirement is put forward today by many regulatory and corporate governance bodies. Thus model documentation today is more of a necessity than a choice. However, there is still no denying the fact that it is one of the most time-consuming jobs for a data scientist. As opposed to building and validating machine learning models, describing how a model works in detail is tedious and takes a considerable amount of time and effort. There are also issues of consistency, clarity, and collaboration.

What if there was a way to automate the entire documentation process? Well, this is precisely the issue that the H2O AutoDoc tries to address by creating comprehensive, high-quality model documentation in minutes. H2O AutoDoc frees up the user from the time-consuming task of documenting and summarizing their workflow while building machine learning models. Additionally, it also increases the consistency of model documentation by applying a standard template across all models, essential for model governance, reproducibility, and regulatory compliance. In a way, it is using AI to explain AI.

Challenges in creating a robust documentation

Image for post
Challenges associating with manually documenting models

But creating good documentation isn’t a piece of cake, and at times, many teams struggle with it. The process is often tedious and time-consuming for the business because the data scientist could be using that time to build additional models and create more value. Additionally, inconsistent or inaccurate model documentation can be an issue for model validation, governance, and regulatory compliance.

A better idea: Automate the documentation process itself with H2O AutoDoc.

H2O AutoDoc

Image for post

Automated Model Documentation (H2O AutoDoc) is a new time-saving ML documentation product from H2O.ai. H2O AutoDoc can automatically generate model Documentation for supervised learning models created in H2O-3 and Scikit-Learn. Interestingly, automated documentation is already used in production in H2O Driverless AI. This industry-leading capability is now available as a new standalone commercial module.

Key Features of H2O AutoDoc

  • Distributed automatic document generation in Microsoft Word (.docx) and Markdown (.md) formats.
  • Out-of-the-box documentation template included
  • Customizable templates to fit unique business needs, internal best practices, and compliance requirements
  • Support for a variety of supervised models generated in H2O-3 and Scikit-Learn

Advantages of using H2O AutoDoc

Image for post

H2O AutoDoc provides various advantages over the traditional method of manual documentation:

  • H2O AutoDoc ensures compliance and provides a consistentaccurate, and thorough approach to model documentation.
  • It can be shared with production teams and other data scientists, thereby improving collaboration amongst teams.
  • Saves time and money by automatically creating model documents instead of having valuable resources writing and editing documents

H2O AutoDoc in Action

1. H2O AutoDoc for models created in H2O-3

Image for post

H2O-3 is a fully open-source, distributed in-memory machine learning platform with linear scalability. The speed, quality, ease-of-use, and model-deployment for the various cutting-edge algorithms make H2O a highly sought-after API for big data data science. H2O also has an industry-leading AutoML functionality that can be used for automating the machine learning workflow.

The documentation can be generated in an editable word or a markdown format as follows:

Image for post
from h2o_autodoc import Config
from h2o_autodoc import render_autodoc# get the H2O-3 model object required to create an H2O AutoDoc
model = h2o.get_model(“my_gbm_model”)# configure and render an AutoDoc
Config = Config(output_path=”full/path/AutoDoc_H2O3.docx”)
render_autodoc(h2o, config, model)
Image for post

2. H2O AutoDoc for models created in Scikit-learn

Image for post

Scikit-learn is an open-source software machine learning library for the Python programming language. It features various classification, regression, and clustering algorithms. The process to create automatic documentation for models created in scikit learn is also pretty similar to the ones created in H2O-3 and is as follows:

Image for post
from h2o_autodoc import Config
from h2o_autodoc.scikit.autodoc import render_autodoc# build a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)# configure and render an AutoDoc
Config = Config(output_path=”full/path/AutoDoc_ScikitLearn.docx”)
render_autodoc(config, model, X_train, y_train)
Image for post

3. Steam: H2O AutoDoc

import h2osteam
from h2osteam.clients import H2oClient# login to steam
h2osteam.login(url=”https://steam.h2o.ai:9555", username=”user01", password=”token-here”, verify_ssl=True)
cluster = H2oClient.get_cluster(“test-cluster”)from h2osteam import AutoDocConfig# get H2O-3 objects using their keys
model = h2o.get_model(“gbm_model”)
train = h2o.get_frame(“CreditCard_TRAIN”)# use default configuration settings
config = AutoDocConfig()# specify the path to the output file
output_file_path = “autodoc_report.docx”# download an H2O AutoDoc
cluster.download_autodoc(model, config, train, output_file_path)

Documentation Features

Image for post

* For supported algorithms

Try H2O AutoDocImage for postDo you want to get your hands dirty and experience the power that H2O AutoDoc brings to your machine learning project? We have made it easy for you. You can :

  • Register for the trial license here and then try H2O AutoDoc in your environment.
  • Our team will reach out, provide a 30-day trial license, and help you get up and running.
  • Experiment and use it with your H2O-3 and scikit-learn models.

Conclusion

H2O AutoDoc automatically generates comprehensive model documentation in minutes using out-of-the-box or custom templates. H2O AutoDoc saves data science teams weeks of tedious work and increases data science productivity by allowing them to focus on model building. H2O AutoDoc increases the consistency of model documentation by applying a standard template across all models and teams, which is essential for model governance, reproducibility, and compliance with regulations.

About the Author

Parul Pandey

Parul is a Data Science Evangelist here at H2O.ai. She combines Data Science, evangelism, and community in her work. She is also a Kaggle Grandmaster in the notebooks category and was one of Linkedin’s Top Voice in the Software Development category in 2019.

Leave a Reply

Introducing H2O AI Hybrid Cloud

Organizations have made large investments in modernizing their data infrastructure and operations, but most still

January 26, 2021 - by Benjamin Cox and Jo-Fai Chow
Using AI to unearth the unconscious bias in job descriptions

“Diversity is the collective strength of any successful organization Unconscious Bias in Job Descriptions Unconscious bias affects

January 19, 2021 - by Parul Pandey and Shivam Bansal
H2O Driverless AI 1.9.1: Continuing to Push the Boundaries for Responsible AI

At H2O.ai, we have been busy. Not only do we have our most significant new

January 18, 2021 - by Benjamin Cox
Meet the Data Scientist who just cannot stop winning on Kaggle.

In conversation with Philipp Singer: A Data Scientist, Kaggle Double Grandmaster, and a Ph.D. in

January 15, 2021 - by Parul Pandey
Liqui.do Speeds Credit Scoring for Fair Lending with H2O.ai

Liqui.do is a technological and innovative company developing a platform for leasing equipment for small

January 12, 2021 - by Eve-Anne Tréhin
New Improvements in H2O 3.32.0.2

There is a new minor release of H2O that introduces two useful improvements to our

December 17, 2020 - by Veronika Maurerova

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img