Return to page

H2O.ai Blog

Filter By:

45 results Category: Year:
Unsupervised Learning Metrics
by Adam Murphy | February 28, 2022 Machine Learning , Technical

That which is measured improves – Karl Pearson , Mathematician. Almost everyone has heard of accuracy, precision, and recall – the most common metrics for supervised learning . But not as many people know the metrics for unsupervised learning . So, in this article, we will take you through the most common methods and how to implement th...

Read more
A Quick Introduction to PyTorch: Using Deep Learning for Stock Price Prediction

Torch is a scalable and efficient deep learning framework. It offers flexibility and speed to build large scale applications. It also includes a wide range of libraries for developing speech, image, and video-based applications. The basic building block of Torch is called a tensor. All the operations defined in Torch use a tensor. Ok, l...

Read more
How to Create Your Spotify EDA App with H2O Wave

In this article, I will show you how to build a Spotify Exploratory Data Analysis (EDA) app using H2O Wave from scratch.H2O Wave is an open-source Python development framework for interactive AI apps. You do not need to know Flask, HTML, CSS, etc. H2O Wave has ready-to-use user-interface components and charts, including dashboard templa...

Read more
An Introduction to Unsupervised Machine Learning
by Adam Murphy | January 31, 2022 Machine Learning , Technical

There are three major branches of machine learning (ML): supervised, unsupervised, and reinforcement. Supervised learning makes up the bulk of the models businesses use, and reinforcement learning is behind front-page-news-AI such as AlphaGo . We believe unsupervised learning is the unsung hero of the three, and in this article, we brea...

Read more
Install H2O Wave on AWS Lightsail or EC2

Note : this blog post was first published on Thomas’ personal blog Neural Market Trends . I recently had to set up H2O’s Wave Server on AWS Lightsail and build a simple Wave App as a Proof of Concept. If you’ve never heard of H2O Wave then you have been missing out on a new cool app development framework. We use it at H2O to build AI-ba...

Read more
Shapley Values - A Gentle Introduction
by Adam Murphy | January 11, 2022 Data Science , Shapley , Technical

If you can’t explain it to a six-year-old, you don’t understand it yourself. – Albert Einstein One fear caused by machine learning (ML) models is that they are blackboxes that cannot be explained. Some are so complex that no one, not even domain experts, can understand why they make certain decisions. This is of particular concern when s...

Read more
Time Series Forecasting Best Practices
by Jo-Fai Chow | October 15, 2021 H2O AI Cloud , Technical , Time Series

Earlier this year, my colleague Vishal Sharma gave a talk about time series forecasting best practices. The talk was well-received so we decided to turn it into a blog post. Below are some of the highlights from his talk. You can also follow the two software demos and try it yourself using our H2O AI Cloud .(Note : The video links with ...

Read more
Improving NLP Model Performance with Context-Aware Feature Extraction
by Jo-Fai Chow | October 08, 2021 H2O AI Cloud , NLP , Technical

I would like to share with you a simple yet very effective trick to improve feature engineering for text analytics. After reading this article, you will be able to follow the exact steps and try it yourself using our H2O AI Cloud .First of all, let’s have a look at the off-the-shelf natural language processing (NLP) recipes in H2O Driver...

Read more
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden from subject matter experts, in datasets that might not be apparent before, and use those 3rd party features to increase the accuracy of the model.A traditional way of doing this was to try and scrape and scour distributed, stagnant data sources on th...

Read more
H2O on Kubernetes using Helm
by H2O.ai Team | October 16, 2020 H2O-3 , Kubernetes , Technical

Deploying real-world applications using bare YAML files to Kubernetes is a rather complex task, and H2O is no exception. As demonstrated in one of the previous blog posts . Greatly simplified, a cluster of H2O open source machine learning nodes is brought up in the following manner: A headless service to make initial node discovery and ...

Read more
Combining the power of KNIME and H2O.ai in a single integrated workflow
by Rafael Coss, Stefan Pacinda | October 14, 2020 AutoML , Community , H2O Driverless AI , Partners , Technical , Tutorials

KNIME and H2O.ai , the two data science pioneers known for their open source platforms, have partnered to further democratize AI. Our approaches are about being open, transparent, and pushing the leading edge of AI. We believe strongly that AI is not for the select few but for everyone. We are taking another step in democratizing AI by ...

Read more
Empowering Snowflake Users with AI using SQL
by Vinod Iyengar, Yves Laurent | October 12, 2020 Community , Machine Learning , Partners , Technical , Tutorials

At H2O.ai we work with many enterprise customers, all the way from Fortune 500 giants to small startups. What we heard from all these customers as they embark on their data science and machine learning journey is the need to capture and manage more data cost-effectively, and the ability to share that data across their organization to mak...

Read more
Key Takeaways from the 2020 Gartner Magic Quadrant for Data Science and Machine Learning

We are named a Visionary in the Gartner Magic Quadrant for Data Science and Machine Learning Platforms (Feb 2020). We have been positioned furthest to the right for completeness of vision among all the vendors evaluated in the quadrant. So let’s walk you through the key strengths of our machine learning platforms. Automatic Machine Learn...

Read more
Blink: Data to AI/ML Production Pipeline Code in Just a Few Clicks
by Karthik Guruswamy | February 11, 2020 H2O Driverless AI , Machine Learning , Python , Technical

You have the data and now want to build a really really good AI/ML model and deliver to production. There are three options available today: Write the code yourself in a Jupyter notebook/R Studio etc., for training/validation and dev-ops model handoff. You decided to do the feature engineering also. Build your own features like above,...

Read more
Parallel Grid Search in H2O

H2O-3 is, at its core, a platform for distributed, in-memory computing. On top of the distributed computation platform, the machine learning algorithms are implemented. At H2O.ai, we design every operation, be it data transformation, training of machine learning models or even parsing to utilize the distributed computation model. In ord...

Read more
Scalable AutoML in H2O
by Sanyam Bhutani | November 27, 2019 AutoML , H2O World , Machine Learning , Technical

Note: I’m grateful to Dr. Erin LeDell for the suggestions, corrections with the writeup. All of the images used here are from the talks’ slides. Erin Ledell’s talk was aimed at AutoML : Automated Machine Learning , broadly speaking, followed by an overview of H2O’s Open Source Project and the library. H2O AutoML provides an easy-to-use ...

Read more
Climbing the AI and ML Maturity Model Curve
by Karthik Guruswamy | November 19, 2019 Data Science , Machine Learning , Technical

AI/ML Maturity Model Curve/StepsAI/ML Maturity models are published and updated periodically by a lot of vendors. The end goal is almost always about effecting transformation and automate processes in a short period and making AI the DNA/core of the business.One of the biggest challenges for businesses today is to clearly define what succ...

Read more
Importing, Inspecting, and Scoring With MOJO Models Inside H2O
by H2O.ai Team | November 08, 2019 H2O-3 , Technical

Machine-learning models created with H2O may be exported in two basic ways: Binary format, Model Object, Optimized (MOJO). An H2 O model can be saved in a binary format, which is tied to the very specific version of H2 O it has been created with. There are multiple reasons for such a restriction. One of the important reasons is that...

Read more
A Deep Dive into H2O’s AutoML
by Parul Pandey | October 16, 2019 AutoML , H2O-3 , Technical

The demand for machine learning systems has soared over the past few years. This is majorly due to the success of Machine Learning techniques in a wide range of applications. AutoML is fundamentally changing the face of ML-based solutions today by enabling people from diverse backgrounds to use machine learning models to address complex ...

Read more
Make your own AI — Add Your Game to Auto-ML Models
by Karthik Guruswamy | October 15, 2019 AutoML , H2O Driverless AI , Machine Learning , Technical

When Features and Algorithms compete, your Business Use Case(s) wins! H2O Driverless AI is an Automatic Feature Engineering /Machine Learning platform to build AI/ML models on tabular data. Driverless AI can build supervised learning models for Time Series forecasts, Regression , Classification , etc. It supports a myriad of built-i...

Read more
Predicting Failures from Sensor Data using AI/ML — Part 2
by Karthik Guruswamy | September 27, 2019 H2O Driverless AI , Recipes , Technical

This is Part 2 of the blog post series and continuation of the original post, Predicting Failures from Sensor Data using AI/ML — Part 1 .Missing Values & Data ImbalanceOne of the things to note is that the hard-disk data set has a lot of missing values across its columns. Check out the Missing Data Heat Map on the training data set — ...

Read more
H2O Driverless AI: The Workbench for Data Science

This blog was written by Rohan Gupta and originally published here. 1. IntroductionIn today’s world, being a Data Scientist is not limited to those without technical knowledge. While it is recommended and sometimes important to know a little bit of code, you can get by with just intuitive knowledge. Especially if you’re on H2O’s Driverle...

Read more
Custom recipes for Driverless AI: Prophet and pmdarima cases
by Marios Michailidis | September 24, 2019 H2O Driverless AI , Recipes , Technical

Last updated: 09/23/19 H2O Driverless AI provides a great new feature called “custom recipes”. These recipes are essentially custom snippets of code which can incorporate any machine learning algorithm , any scorer/metric and any feature transformer. A user can create custom recipes using python utilizing any external library or his/her o...

Read more
Predicting Failures from Sensor Data using AI/ML— Part 1
by Karthik Guruswamy | August 26, 2019 H2O Driverless AI , Machine Learning , Technical

Last updated: 08/26/19 Whether it’s healthcare, manufacturing or anything that we depend on either personal or in business, Prevention of a problem is always known to be better than cure! Classic prevention techniques involve time-based checks to see how things are progressing, positively or negatively. Time-based chec...

Read more
New Innovations in Driverless AI

What’s new in Driverless AIWe’re super excited to announce the latest release of H2O Driverless AI . This is a major release with a ton of new features and functionality. Let’s quickly dig into all of that: Make Your Own AI with Recipes for Every Use Case: In the last year, Driverless AI introduced time-series and NLP recipes to meet the...

Read more
Detecting Sarcasm is difficult, but AI may have an answer
by Parul Pandey | August 05, 2019 H2O Driverless AI , NLP , Recipes , Technical , Tutorials

Recently, while shopping for a laptop bag, I stumbled upon a pretty amusing customer review: “This is the best laptop bag ever. It is so good that within two months of use, it is worthy of being used as a grocery bag.” The innate sarcasm in the review is evident as the user isn’t happy with the quality of the bag. However, as the sentence...

Read more
Getting started with H2O using Flow
by Parul Pandey | July 16, 2019 Flow , H2O-3 , Technical

This blog was originally published on towardsdatascience: https://towardsdatascience.com/getting-started-with-h2o-using-flow-b560b5d969b8A look into H2O’s open-source UI for combining code execution, text, plots, and rich media in a single document. Data collection is easy. Decision making is hard. Today, we have access to a humungous...

Read more
An Overview of Python’s Datatable package

This blog originally appeared on Towardsdatascience.com “There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days”: Eric Schmidt If you are an R user, chances are that you have already been using the data.ta...

Read more
H2O-3, Sparkling Water and Enterprise Steam Updates
by Venkatesh Yadav | April 10, 2019 Community , Data Science , H2O Release , Technical

We are excited to announce the new release of H2O Core, Sparkling Water and Enterprise Steam.Below are some of the new features we have added:H2O-3 Yates (3.24.0.1) – 3/31/2019Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html Bug [PUBDEV-6159] – The AutoMLTest.java test suite now runs correctly on a local mach...

Read more
Building AI/ML models on Lending Club Data, with H2O.ai — Part 1
by Karthik Guruswamy, Vinod Iyengar | March 28, 2019 Beginners , Community , Data Journalism , Data Science , Technical , Tutorials

Lending Club publishes its basic loan databases to the public and a full version to its customers — anonymized of course. You can find the download page from this link (screenshot below): The publicly downloadable loan data has various attributes — roughly 150+ columns that have categorical, numeric, text and date fields. It also has a ‘...

Read more
AI/ML Model Scoring - What Good Looks Like in Production
by Karthik Guruswamy | March 10, 2019 H2O Driverless AI , Machine Learning , Technical

One of the main reasons why we build AI/Machine Learning models is for it to be used in production to support expert decision making. Whether your business is deciding what creatives your customers should be getting on emails or determining a product recommendation for a web page, AI/Models provide relevance/context to customers to drive ...

Read more
How This AI Tool Breathes New Life Into Data Science

Ask any data scientist in your workplace. Any Data Science Supervised Learning ML/AI project will go through many steps and iterations before it can be put in production. Starting with the question of “Are we solving for a regression or classification problem?” Data Collection & Curation Are there Outliers? What is the Distribu...

Read more
H2O’s AutoML in Spark
by Jakub Hava | July 23, 2018 AutoML , Sparkling Water , Technical , Tutorials

This blog post demonstrates how H2O’s powerful automatic machine learning can be used together with the Spark in Sparkling Water.We show the benefits of Spark & H2O integration, use Spark for data munging tasks and H2O for the modelling phase, where all these steps are wrapped inside a Spark Pipeline. The integration between Spark and...

Read more
H2O-3 on FfDL: Bringing deep learning and machine learning closer together
by Vinod Iyengar | June 25, 2018 Community , Deep Learning , H2O-3 , Technical

This post originally appeared in the IBM Developer blog here. This post is co-authored by Animesh Singh, Nicholas Png, Tommy Li, and Vinod Iyengar. Deep learning frameworks like TensorFlow, PyTorch, Caffe, MXNet, and Chainer have reduced the effort and skills needed to train and use deep learning models. But for AI developers and data ...

Read more
Scalable Automatic Machine Learning: Introducing H2O's AutoML
by H2O.ai Team | June 21, 2017 AutoML , Ensembles , H2O Release , Technical

Prepared by: Erin LeDell, Navdeep Gill & Ray Peck In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts and experts...

Read more
H2O announces GPU Open Analytics Initiative with MapD & Continuum
by H2O.ai Team | May 08, 2017 Community , GPU , Technical

H2O.ai, Continuum Analytics, and MapD Technologies have announced the formation of the GPU Open Analytics Initiative (GOAI) to create common data frameworks enabling developers and statistical researchers to accelerate data science on GPUs. GOAI will foster the development of a data science ecosystem on GPUs by allowing resident applicat...

Read more
Use H2O.ai on Azure HDInsight
by H2O.ai Team | April 18, 2017 Cloud , Sparkling Water , Technical , Tutorials

This is a repost from this article on MSDN. We’re hosting an upcoming webinar to present you how to use H2O on HDInsight and to answer your questions. Sign up for our upcoming webinar on combining H2O and Azure HDInsight. We recently announced that H2O and Microsoft Azure HDInsight have integrated to provide Data Scientists with a Lead...

Read more
Sparkling Water on the Spark-Notebook
by H2O.ai Team | April 10, 2017 Guest Posts , Sparkling Water , Technical

This is a guest post from our friends at Kensu. In the space of Data Science development in enterprises, two outstanding scalable technologies are Spark and H2O. Spark is a generic distributed computing framework and H2O is a very performant scalable platform for AI. Their complementarity is best exploited with the use of Sparkling Wat...

Read more
Stacked Ensembles and Word2Vec now available in H2O!

Prepared by: Erin LeDell and Navdeep Gill MathJax.Hub.Config({ tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]} }); Stacked Ensembles ensemble <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = my_models) Python:ensemble = H2OStackedEnsembleEstimator(base_models=my_models) ensemble.train(x=x, y=y, training...

Read more
Start Off 2017 with Our Stanford Advisors
by H2O.ai Team | January 09, 2017 Community , Technical

We were very excited to meet with our advisors (Prof. Stephen Boyd, Prof. Rob Tibshirani and Prof. Trevor Hastie) at H2O.AI on Jan 6, 2017. Professors Boyd, Tibshirani & Hastie in the house! @h2oai #elementsofstatisticallearning #MachineLearning pic.twitter.com/FnlCNrY7Hy — H2O.ai (@h2oai) January 6, 2017 Our CEO, Sri Ambati, ma...

Read more
Indexing 1 Billion Time Series with H2O and ISax
by H2O.ai Team | November 11, 2016 Solutions , Technical , Tutorials

At H2O, we have recently debuted a new feature called ISax that works on time series data in an H2O Dataframe. ISax stands for Indexable Symbolic Aggregate ApproXimation, which means it can represent complex time series patterns using a symbolic notation and thereby reducing the dimensionality of your data. From there you can run H2O’s ML...

Read more
Hyperparameter Optimization in H2O: Grid Search, Random Search and the Future
by H2O.ai Team | June 16, 2016 R-Bloggers , Technical , Tutorials

“Good, better, best. Never let it rest. ‘Til your good is better and your better is best.” – St. Jerome tl;drH2O now has random hyperparameter search with time- and metric-based early stopping. Bergstra and Bengio[1] write on p. 281: Compared with neural networks configured by a pure grid search, we find that random search over the s...

Read more
Spam Detection with Sparkling Water and Spark Machine Learning Pipelines
by H2O.ai Team | June 15, 2016 Sparkling Water , Technical , Tutorials

This short post presents the “ham or spam” demo, which has already been posted earlier by Michal Malohlava , using our new API in latest Sparkling Water for Spark 1.6 and earlier versions, unifying Spark and H2O Machine Learning pipelines. It shows how to create a simple Spark Machine Learning pipeline and a model based on the fitted pipe...

Read more
Red herring bites
by H2O.ai Team | May 06, 2016 Data Munging , R-Bloggers , Technical

At the Bay Area R User Group in February I presented progress in big-join in H2O which is based on the algorithm in R’s data.table package. The presentation had two goals: i) describe one test in great detail so everyone understands what is being tested so they can judge if it is relevant to them or not; and ii) show how it scales with...

Read more
Fast csv writing for R
by H2O.ai Team | April 24, 2016 Data Munging , R , R-Bloggers , Technical

R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table package in R solved csv import convenience and speed in 2...

Read more

ERROR