October 12th, 2020
Empowering Snowflake Users with AI using SQLRSS Share Category: Community, Machine Learning, Partners, Technical, Tutorials
By: Vinod Iyengar and Yves Laurent
At H2O.ai we work with many enterprise customers, all the way from Fortune 500 giants to small startups. What we heard from all these customers as they embark on their data science and machine learning journey is the need to capture and manage more data cost-effectively, and the ability to share that data across their organization to make better business decisions. The cloud provides many benefits to build a data platform, but the danger of vendor lock-in always lurks in the corner. That’s why many customers are looking to Snowflake as their data platform, so they can use their choice of cloud provider for their data strategy. The same is true when customers are looking to select the best automatic machine learning technology. Having the flexibility to choose the cloud infrastructure on which to run data science workloads provides customers the flexibility of using best of breed solutions that give them a competitive edge with cloud-neutral, innovative technology platforms.
Making AI Accessible to Snowflake SQL Users
The challenge for many companies is how to extract more value from the data they capture and store in the Snowflake Data Cloud. Data science and machine learning is a great way to provide predictive insights from data to make better business decisions. Companies are highly dependent on data scientists for extracting new predictive insights from the data they have. The implementation of the entire process tends to be difficult, tedious and requires a number of different skilled resources. It’s not only the data scientist that is a key player in that process, but also other functions such as data engineers and analysts that are very familiar with SQL for querying data. Making AI and ML available to these users in their familiar SQL environment opens up a range of new possibilities to accelerate the adoption of AI. This is why H2O.ai worked closely with Snowflake to bring the power of Driverless AI at the fingertips of Snowflake users.
Removing Barriers to Deploying Models in Production
Organizations depend on data ops people, as well as data engineers to extract the business value from the models that data scientists are building. The whole idea behind the integration of H2O Driverless AI with Snowflake is to streamline that end to end machine learning process, from right at the start of developing machine learning models all the way to putting those models into production and scoring new data that is being captured about customers.
The question is how much can we automate the model development process within the ML platform? With Driverless AI it’s all about automation of data science and machine learning tasks that can speed up the creation of highly accurate models. Once the model is built, then it needs to go into production where it will actually generate business value. And so the whole process from model development to model deployment introduces complicated tasks where different resources come into the picture in addition to data scientists. Data engineers or data ops people have the responsibility to take those models and ensure they can be operationalized in a production environment.
Using Driverless AI from Within Snowflake
Let’s first talk about the common process of model building and deployment with data in a Snowflake environment. The data scientist would use the Driverless AI GUI to train a model with data imported using the Snowflake connector. That model was then deployed in a scoring engine for production use. To make predictions on new data, you had to export that data into a .csv file (or any other file format) and push it into the scoring engine. Then the predictions made in the scoring engine have to be written back into the Snowflake environment. So even though this might seem simple and straightforward, it is a tedious and cumbersome process to set up and manage. In addition, this batch process does not lend itself to real-time scoring on fresh new data for AI-enabled applications that need in-the-moment predictions.
With Snowflake introducing external functions earlier this year, H2O got an opportunity to make this whole process much more efficient. By using external functions we can make Driverless AI available as a remote service to users from within Snowflake. Driverless AI can be invoked from within Snowflake to train or retrain a model, automatically deploy it as a REST server, and make it available to score new data. All this is executed by using familiar SQL statements and commands to score the data from within Snowflake. With the use of external functions, there is no longer the need for exporting data from Snowflake to score data. By calling the function in SQL using the Snowflake user interface it is now possible to update tables with predictions directly in Snowflake.
The integration of H2O Driverless AI with Snowflake using external functions makes automatic machine learning available at the fingertips of every Snowflake user, including data engineers and data analysts. They no longer need to learn a new technology platform to use the full power of ML to extract meaningful insights from their data. This results in a more efficient, flexible and cost-effective machine learning process that will accelerate the adoption of AI.
To know more, visit our Snowflake page at: https://www.h2o.ai/partner/snowflake/