Scoring Pipeline


What is a Scoring Pipeline?

In machine learning, pipelines are the automation of sequential steps in a workflow. These steps may include data preparation, model training, validation, packaging, deployment as well as monitoring. A scoring pipeline is usually a part of the deployment routine where trained models are used to make predictions on new data.

Model deployment is a key component in enterprise machine learning platforms. H2O’s platforms offer a variety of deployment options to suit different production environments.

Deployment Options in H2O-3

  • Plain Old Java Object (POJO) is our first generation, low-latency, and production-ready model deployment module which translates H2O-3 models into plain Java code [1]. Users can deploy POJOs within their Java environments and create their own applications.
  • Having worked with customers and listened to their issues with large POJOs, we improved and optimized the deployment module and created Model Object, Optimized (MOJO). While POJOs continue to be supported, we would recommend our users to use MOJOs for model deployment [2].
  • We also provide the option for Python and R users to save H2O-3 models as binary models and use them in native Python / R environments. [3]

Deployment Options in Driverless AI

Like the options in H2O-3, we developed the deployment options in Driverless AI with speed, flexibility, and ease of use in mind. When compared to H2O-3, the scoring pipeline in Driverless AI has one additional and crucial step - data transformation.

In order to streamline the experience from machine learning experiments to production-ready deployment, we package both data transformation (based on automatic feature engineering during experiments) and model scoring into one single scoring pipeline. This saves time for our users as they will not need to replicate the often complex data transformation process in their workflow before utilizing our models.

Driverless AI provides several scoring pipelines for experiments and/or interpreted models [4].

  • A standalone Python Scoring Pipeline for experiments and interpreted models. The pipeline is implemented as a Python whl file. While this allows for a single process scoring engine, the scoring service is generally implemented as a client/server architecture and supports interfaces for TCP and HTTP.
  • A low-latency, standalone MOJO Scoring Pipeline for experiments, with both Java and C++ backends. This scoring pipeline converts experiments to MOJOs, which can be scored in real time. It is available as either a Java runtime or a C++ runtime. For the C++ runtime, both Python and R wrappers are provided.


See also


  • Scoring Pipeline Deployment Introduction (link coming soon…)




Related Links

No Related Resource entered.