November 30th, 2021

H2O.ai Tools for a Beginner

RSS icon RSS Category: Beginners, Community, H2O, H2O Driverless AI, Wave

Note: this is a community blog post by Shamil Dilshan Prematunga. It was first published on Medium.

Hey, this is not a deep technical blog. I’d like to share the experience I had with H2O tools when I was studying Machine Learning. As a Research Engineer, I am currently working on an area based on Telecommunication. Day by day with my experience, I got to know that “data” plays a major role in the future world. And I thought about how a beginner can step into this stream. If there is nothing special in any stream it is boring to spend time with it.

Kaggle is the best platform to step into this area as a beginner. It follows all the basic components as a course basis. Following just a course will be boring for a beginner. If there is something new that will motivate a new user to dig deep into the particular area. In that case, I found this H2O.ai and their tools. It is a kind of addict for a fresh guy.

What I have Experienced

In the path of democratizing AI, H2O.ai has there own solutions to create explainable AI solutions. I am working in the Conda environment and it is very easy to set up their requirements. They have very clear documentation which any beginner can follow and set up in their environment. Here you can find relevant documentation (Link) to set up H2O in your working environment.

H2O Flow

After the installation, we can import and use h2o3 in our project to import data, split data, feature engineering, build and train models, performance evaluation, visualization, etc. Rather than performing each step in the notebook, H2O provides UI which we can easily study those concepts even without deep programming knowledge. That is called H2O flow.

Once initiating h2o as above, it will run an H2O instance at the local server and connect to it. After that, it will provide a URL that we can easily access through our browser and move forward with the UI which is as below.

Since they have clear documentation of setting up H2O and also since they provide interface in “H2O flow” what I feel is, as a beginner even without deep programming knowledge, a guy can experience what they learned from the courses like Kaggle very easily. That will push them to learn and explore more.

H2O Wave

After you gain some sort of knowledge the next step is to implement what you learned into real-life scenarios. Most of the time, mapping concepts with real-world situations is hard since people with different domain knowledge will involve in the end-to-end process. H2O wave provides facilities to the guys who work with ML, AI models to express their idea clearly. Why I mostly like Wave is we can use python language to build AI applications and do not need advanced knowledge in App development. It facilitates both frontend and backend development as well as it is easy to run the application.

H2O Wave SDK installation is very easy. They have documented each step need to follow (Link). After running the wave server you can view your applications from the link (http://localhost:10101/). Once after running the wave server we can run our AI application which was built using H2O Wave. The simple command we have to run is “wave run <Python file Name>. This will display your application with the content in your web browser. The cool thing here is, we can modify our application live to affect the user in real-time. We can modify our code and check how it affects the client in real-time.

Finally in the H2O Wave, for a beginner, it will help to express their ideas by developing an AI solution from initializing the concept to application development. Below you can find out simple demonstrations I have done by myself.

  • Calculate the accuracy of two models (KNN & SVM) for a submitted dataset. (Demo Video)
  • Data Visualization Application (Demo Video)
  • Automated Machine Learning Application (Demo Video)

Driverless AI

As data scientists, they have to spend more time optimizing their solutions to perform with good accuracy. Rather than doing this manually, it is better to have an automated process. H2O Driverless AI will solve this issue and help to accelerate the performance of the model.

In the H2O AI engine, we can create our own Driverless AI instances. Here we can upload our dataset and it will display as below.

Out of the given action list, we can play around with the dataset we used. The visualization and details will give us a clear understanding of the dataset we are using. For a beginner, rather than memorizing those patterns in the data set, it is easy to use this kind of tool to get a quick understanding of the dataset they use for their application.

The fantastic action within the action list is the “predict”. Here we can do multiple things to our dataset. We can select what kind of learning we expect (Supervised or Unsupervised), what is our target, what are our validation and test data sets and the parameter values which are adjustable. After setting up those features we can launch our experiment and get more understanding about the prediction.

At the end of the experiment, it gives a summary as below. From that we can download a complete report about the models it trained, their accuracies, feature importance, and many more. Isn’t it cool?

Main pipeline visualization of the experiment is as below.

If someone does these whole steps one by one can you imagine how long it will take? As a beginner can someone do these things if they haven’t a clear idea about what is the end of these steps?

So, what I feel is H2O tools can play a major role for the industries. But what I mostly like is it can become a good teacher for a beginner due to its simplicity, explainability, and user-friendly nature.

If you are a beginner TRY this. Surely you will addict to it…

About the Author

Jo-Fai Chow

Jo-fai (or Joe) has multiple roles (data scientist / evangelist / community manager) at H2O.ai. Since joining the company in 2016, Joe has delivered H2O talks/workshops in 40+ cities around Europe, US, and Asia. Nowadays, he is best known as the H2O #360Selfie guy. He is also the co-organiser of H2O's EMEA meetup groups including London Artificial Intelligence & Deep Learning - one of the biggest data science communities in the world with more than 11,000 members.

Leave a Reply

What Are Feature Stores and Why Are They Important?

Machine learning (ML) models are only as good as the data fed into them. In

January 18, 2022 - by Adam Murphy
A Beginner’s View of H2O MLOps

Note: this is a community blog post by Shamil Dilshan Prematunga. It was first published

January 15, 2022 - by Jo-Fai Chow
Shapley Values – A Gentle Introduction

If you can't explain it to a six-year-old, you don't understand it yourself. - Albert

January 11, 2022 - by Adam Murphy
The Bond Market & AI: How MarketAxess Brings it All Together

The vast majority of the equities market trades electronically while the bond market is still

January 11, 2022 - by Ian Gomez
H2O Release 3.36 (Zorn)

There’s a new major release of H2O, and it’s packed with new features and fixes! Among

January 7, 2022 - by Michal Kurka
1st Place Winner’s Blog – Kaggle 2021 Data Science and Machine Learning Survey

Kaggle, the largest global community of data scientists, conducted the 5th annual industry-wide survey that

January 4, 2022 - by Shivam Bansal and KunHao Yeh

Start your 14-day free trial today