May 3rd, 2021

What it takes to become a World No 1 on Kaggle

RSS icon RSS Category: Data Science, Kaggle, Machine Learning, Makers

In conversation with Guanshuo Xu: A Data Scientist, Kaggle Competitions Grandmaster, and a Ph.D. in Electrical Engineering.

In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. The intention behind these interviews is to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster.

In this article, I shall be sharing my interaction with Guanshuo Xu. He is a Kaggle Competitions Grandmaster and a Data Scientist at H2O.ai. Guanshuo obtained his Ph.D. in Electrical & Electronics Engineering at the New Jersey Institute of Technology, focusing on machine learning-based image forensics and steganalysis.

Guanshuo is a man of many accomplishments. His methods for real-world image tampering detection and localization won second place in the First IEEE Image Forensics Challenge. His architectural design of deep neural networks outperformed traditional feature-based methods for the first time in image steganalysis. More recently, Guanshuo also achieved the world number one rank in the competition’s tier on Kaggle with a win in the Alaska2 Image Steganalysis and RSNA STR Pulmonary Embolism Detection competitions.  

Here is also a link to Guanshuo’s interview at CTDS.show where he discusses his achievements on Kaggle.


 

In this interview, we shall know more about his academic background, passion for Kaggle, and his journey to the number one title. Here is an excerpt from my conversation with Gunashuo:

You have a background in Ph.D. in Electrical Engineering. Did it somehow influence your decision to take up Machine Learning as a career?

Guanshuo: Yes, my doctoral research used machine learning techniques to solve problems like image tampering detection and hidden data detection. For example, my last Ph.D. research project was to use deep neural nets on image steganalysis. So my education and research are directly related to machine learning. Hence, machine learning was a natural choice of career for me.

How did your start with Kaggle, and what kept you motivated throughout your grandmaster’s journey?

Guanshuo: From the time I discovered Kaggle, I have been addicted to it. Some of the motivating factors for continuous competing on Kaggle would be the combined satisfaction of winning competitions and prize money, learning new techniques, widening and deepening my understanding of machine learning, and building surprisingly effective models.

How does it feel to be World No 1 in Competitions? Does that bring in an extra amount of pressure while competing?

The top 5 Kagglers in the Competition’s category as on date | Source: Kaggle’s website

Guanshuo: Honestly speaking, there is a lot more pressure to maintain the number one rank than achieve it. This is because it requires “smoother” performance. Sometimes I have to participate in more competitions simultaneously than I used to participate in before.

How do you typically approach a Kaggle problem? 

A glimpse of Guanshuo’s competition’s profile. : source: https://www.kaggle.com/wowfattie/competitions

Guanshuo: My approach varies based on the type of problem and the goal of the competition. Nowadays, what I often do is spend days or even weeks on understanding the data and the problem and thinking of a solution which includes, for instance, guessing the distribution of the private test data, proper validation scheme, detailed modeling steps, etc. Once I have a decent picture of the overall approach, I start coding and modeling. This process helps me to gain more understanding and make corrections or adjustments, if necessary, to the overall approach.

Could you give us a sneak peek into your toolkit like a favorite programming language, IDE, Algorithms, etc

Guanshuo: As far as my toolkit is concerned, I mostly use gedit, Python, and Pytorch for deep learning.

The Data Science domain is rapidly evolving. How do you manage to keep up with all the latest developments?

Guanshuo: I get to know about most of the new stuff and technologies through Kaggle, my colleagues, or even by mere googling. As far as new developments in machine learning are concerned, it depends on the actual needs. I tend to filter out anything not instantly helpful and maybe keep an eye on the potentially exciting stuff. Then I get back to it as and when needed. 

A word of advice for the Data Science aspirants who have just started or wish to start their Data Science journey?

A virtual panel where Guanshuo, along with fellow H2O.ai Kaggle GrandMasters shared his insights on Kaggle

Guanshuo: It basically depends on each person’s background and interests. However, finding a suitable platform to learn and develop skills can make things much easier in general. Additionally, taking part in Kaggle competitions can prove to be an additional helpful resource.


 

To achieve a world no 1 rank is no mean feat and Guanshuo’s relentless attitude and hard work deserve all the credit. A peek into his various winning solutions on Kaggle showcases his structured approach which is an essential element to be inculcated for problem-solving.

About the Author

Parul Pandey

Parul is a Data Science Evangelist here at H2O.ai. She combines Data Science, evangelism, and community in her work. She is also a Kaggle Grandmaster in the notebooks category and was one of Linkedin’s Top Voice in the Software Development category in 2019.

Leave a Reply

Introducing DatatableTon – Python Datatable Tutorials & Exercises

Datatable is a python library for manipulating tabular data. It supports out-of-memory datasets, multi-threaded data

September 20, 2021 - by Rohan Rao
H2O Release 3.34 (Zizler)

There’s a new major release of H2O, and it’s packed with new features and fixes!

September 15, 2021 - by Michal Kurka
From the game of Go to Kaggle: The story of a Kaggle Grandmaster from Taiwan

In conversation with Kunhao Yeh: A Data Scientist and Kaggle Grandmaster In these series of interviews,

September 13, 2021 - by Parul Pandey
Visualizing Large Datasets with H2O-3

Exploratory data analysis is one of the essential parts of any data processing pipeline. However,

September 9, 2021 - by Parul Pandey
Fallback Featured Image
Innovation with the H2O AI Hybrid Cloud

Consumer expectations for responsiveness, personalization, and overall efficiency have risen dramatically over the past several

September 2, 2021 - by
Interning with H2O.ai- Robie Gonzales

This blog post is by Robie Gonzales, who has interned with us for the last 8

August 31, 2021 - by

Start your 14-day free trial today