December 14th, 2020

Grandmaster Series: The inspiring journey of the ‘Beluga’ of Kaggle World 🐋

RSS icon RSS Category: Kaggle, Machine Learning

In conversation with Gábor Fodor: A Data Scientist at H2O.ai and a Kaggle Competitions’ Grandmaster.

In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster.

In this interview, I shall be sharing my interaction with Gábor Fodorbetter known as Beluga in Kaggle worldHe is a Kaggle Competitions Grandmaster and a Data Scientist at H2O.ai. Gabor, who hails from Hungary, holds a master’s degree in Mathematics as well as Computer Engineering and has around ten years of experience in the Data Science domain. He joined Kaggle nine years ago and since then has made quite a mark there. His best global rank is 4th for competitions and 7th for notebooks.

Here is also a link to Gábor’s recent interview at CTDS.show where he discusses his 10th place solo gold in Cornell Birdcall competition on Kaggle

Here is an excerpt from my conversation with Gábor:

Q: You have a background in Mathematics. How did the transition from academia to industry happen?

Gábor: Doing a Master’s in mathematics with a stochastics major certainly provided a strong background (discrete math, probability theory, statistics, stochastic processes, etc.), although the courses mainly focused on theory. Fortunately, I was free to take some additional courses, and as a result, I got to learn about programming & data mining as well.

During my final year, I had a chance to intern as a Data Mining trainee in the telco industry. It was quite interesting to retrain and improve the old drifted churn models. However, the most valuable part was that I had direct access to their data warehouse, and I could learn and practice SQL with real-world data and business problems. After the internship, I stayed at the company and became a full-time data analyst. Since then, I have had a chance to work in different industries working on varied types of business problems.

Q: How did you get interested in Machine Learning?

Kaggle Competition Tutorial being presented by Gabor during Kaggle Days China, 2020

Gábor: I immensely enjoyed my data mining courses. My first data mining competition was in 2009, and it was quite fun. Then I found Kaggle and got addicted forever. At that time, I already had a full-time job and just started a new master’s in computer science, so finding time for new Kaggle challenges was not always easy. But the learning opportunity was enormous, and I could not resist trying to solve those unique data-driven problems.

Q: How hard is it to become a Kaggle Grandmaster? What initially attracted you to Kaggle, and When did the first win come your way?

Gábor’s Kaggle Profile

Gábor: Reaching a Grandmaster’s status in competitions is undoubtedly demanding. One needs five gold medals in different competitions, and at least one has to be a solo gold. It requires a lot of effort and hard work to earn gold in every competition, for instance, him.

My first competition win came in 2013. It was a small research competition with 81 teams. The task was to recognize bird species in audio recordings. We only had a few hundred audio files for training at that time, and we did not have all the comfortable deep learning tools. I was able to win the competition with template matching on the spectrograms and using random forests only.

The competitions became a bit more difficult since the good old days as the Kaggle community grew. Nowadays, it is hard to find competitions with less than a thousand teams.

Q: As a Data Scientist at H2O.ai, what are your roles, and in which specific areas do you work?

Gábor along with some of the GrandMasters at H2O.ai

Gábor: I just joined H2O.ai in August, and I like the flexibility to work on different projects here. Besides helping customers using H2O Driverless AI during POCs, I also create H2O Wave apps and test new Driverless AI features.

Q: What are some of the best things you have learned via Kaggle that you apply in your professional work at H2O.ai?

Gábor: I hear way too often that in Kaggle competitions, participants fight over the 4th decimals on the leaderboard, and the differences are not significant. Well, there are much bigger victories (e.g., in the recently finished Lyft Motion Prediction competition where Philipp and his team won by 8% improvement over the second team). Even if the race is much closer, you have to turn all the rocks and squeeze every possible gain from your features and models. In my experience, that also teaches you how to get a robust baseline model fast.

The other criticism that I hear is that the competitions reward overfitting and data leaks. While I agree that data leaks could be a significant issue, and I did have to exploit them to win competitions, overfitting is not rewarded at Kaggle. Quite the opposite! During the competition, you don’t receive feedback about the final test set. I saw (and have experienced) quite brutal shake-ups where only the best validation strategies and most stable models survived. Data leaks are quite common in the real world too. When you see a — too good to be true- AUC result, you should start to think immediately about the cause. Seeing all the possible data leaks in previous Kaggle challenges helps to debug the machine learning pipeline quicker.

Q: If you were to team up with grandmasters at H2O.ai, who would they be and why?

Gábor: Good question 😃. I recently created the membership network of the Kaggle team at H2O.ai. While we are mostly in a largely connected ecosystem, I did not team up directly with anyone before. I can’t pick a single person as we have so many talented kagglers but probably will team up with some of them in 2021.

Q: The Data Science domain is rapidly evolving. How do you manage to keep up with all the latest developments?

Gabor presenting during Kaggle Days Paris in 2019.

Gábor: I think it is impossible to keep up with everything. Besides the fun, I like Kaggle competitions because they show what tools work the best for specific problems. You can learn a lot just by reading the competition winning solutions. But trying to apply those tips and tricks in the next competition will teach you a lot more.

There is quite a few stuff to catch up for me regarding Natural Language Processing or Reinforcement Learning. Fortunately, the team at H2O.ai has experts in every field.

On the other hand, it also means that the tools are getting better. In the recent Cornell Birdcall Competition, I could train models with a few hundred code lines with PyTorch. Or look at Driverless AI; with a few clicks, you could solve all sorts of supervised machine learning problems.

Q: A word of advice for the Data Science aspirants who have just started or wish to start their Data Science journey?

Gábor: Don’t be afraid to start and prepare for the long run. The community is enormous and willing to share. If you already learned the basics and want to get your hands dirty, I can only recommend participating in Kaggle competitions.

There are personally a lot of takeaways from this interaction. Firstly, Data Science is an area where one needs to be self-motivated and eager to learn at every stage. Secondly, there is always so much to learn from every machine learning competition, even if you perform well or not. The important thing is to identify your weak points and work on them while leveraging your strengths. In the end, the community around you is always ready to help, and the flourishing Kaggle community is a testimony to that fact.

Originally published here.

About the Author

Parul Pandey

Parul is a Data Science Evangelist here at H2O.ai. She combines Data Science, evangelism, and community in her work. She is also a Kaggle Grandmaster in the notebooks category and was one of Linkedin’s Top Voice in the Software Development category in 2019.

Leave a Reply

Learning from others is imperative to success on Kaggle says this Turkish GrandMaster

In conversation with Fatih Öztürk: A Data Scientist and a Kaggle Competition Grandmaster. In this series

February 15, 2021 - by Parul Pandey
H2O-3 Improvements from Two University Projects

In September 2019 H2O.ai became a silver partner of the Faculty of Informatics at Czech

February 8, 2021 - by Veronika Maurerova
Data to Production Ready Models to Business Apps in Just a Few Steps

Building a Credit Scoring Model and Business App using H2O In the journey of a successful

February 5, 2021 - by Shivam Bansal
Using Python’s datatable library seamlessly on Kaggle

Managing large datasets on Kaggle without fearing about the out of memory error Datatable is a Python

February 3, 2021 - by Parul Pandey and Rohan Rao
Fallback Featured Image
Successful AI: Which Comes First, the Data or the Question?

Successful AI is a business process. Even the most sophisticated models, the latest algorithms, and highly

February 2, 2021 - by Ellen Friedman, PhD
Introducing H2O AI Hybrid Cloud

Organizations have made large investments in modernizing their data infrastructure and operations, but most still

January 26, 2021 - by Benjamin Cox and Jo-Fai Chow

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img