February 15th, 2021

Learning from others is imperative to success on Kaggle says this Turkish GrandMaster

RSS icon RSS Category: Makers

In conversation with Fatih Öztürk: A Data Scientist and a Kaggle Competition Grandmaster.

In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster.

In this interview, I shall be sharing my interaction with Fatih Öztürk. He is a Kaggle Competitions’ Grandmaster and a Data Scientist at H2O.ai. Fatih obtained a Bachelor’s in industrial engineering with honors at the Boğaziçi University, Istanbul. He worked as a Data Scientist at UrbanStat before joining H2O.ai. Fatih joined Kaggle almost four years ago and has won seven golds, including a solo one. He also holds the Master status in the discussion tier as well.

In this interview, we shall know more about his academic background, his passion for Kaggle, and his work as a Data Scientist. Here is an excerpt from my conversation with Fatih.


 

You have a background in Industrial Engineering. What prompted you to choose Data Science as a career?

Fatih: My primary focus in Industrial Engineering was on Operations Research(OR), Supply Chains, and Statistics. Apart from these main courses, we also had the option to choose specific electives based on our interests. In my last semester, I took “Data Mining’’ as one of my elective courses. One of the reasons for opting for this choice was its popularity. While studying data mining, it was for the first time that I came across concepts like the random forest, classification, predicting things, etc. I found it pretty interesting and analogous to playing some competitive game. I realized that my passion lay in the field of data analysis, and I instantly knew what field I had to pursue after my graduation.

How did your tryst with Kaggle begin, and what kept you motivated throughout your grandmaster’s journey?

Fatih’s Kaggle profile

Fatih: My first job was as a Junior Data Scientist in a tech-startup. I was the only data scientist there, and we were working only for insurance-related companies there. A few months after joining the company, my boss found out about the Porto Seguro competition on kaggle, and he asked me if I could look at it since it was an insurance use-case. I was pleased about what I found out in that competition because I saw that people were sharing a lot. So during that competition, I realized two main things: 

  • My learning rate was much higher when I was around kernels and discussions. 
  • My competitive side was triggered, and I learned that I liked competing a lot.

Competing and learning on kaggle go hand in hand. It is my primary motivation for participating in any competition. Being a Master or a GrandMaster is just a natural result of this process.

Can you tell us a little about your favorite Kaggle competition?

Fatih: I liked the Home Credit Default Risk competition. The datasets were not fully anonymized, and hence there was a lot of room for feature engineering. Trying to understand the domain of the competition and then being able to generate useful features was fun. Moreover, our team had a good validation strategy that turned out to be very successful for the private leaderboard in the end. We went from 29th place on the public leaderboard to 10th on the private one. 

How do you typically approach a Kaggle problem? 

Fatih: For any competition, my first attempt is always to have a reliable validation scheme on my side. Having a well correlated CV-LB relation is everything. So how to achieve this? It mostly depends on the right exploratory data analysis(EDA). Figuring out how the test set differs from the train set (if so) and then mimicking this in your validation scheme is a good starting point. Besides doing EDA with plots and numbers, I also check adversarial validation scores in this regard. 

After having a good validation strategy, I focus on finding useful things that are not shared on the public forum because having different tricks is crucial to land a good rank at the end.

Could you give us a sneak peek into your toolkit like a favorite programming language, IDE, Algorithms, etc

Fatih: I use Python and, most of the time, work with JupyterLab. I also have a Google Colab pro account to get access to GPUs since I don’t have a local one. I find it is a good investment since we have limited GPU hours per week on Kaggle notebooks.

My favorite modeling algorithm is Lightgbm. I still think that it is a very efficient and production-friendly algorithm given how easy it is to tune and how fast it can get sufficiently good scores.

 You regularly speak up in meetup events. How is the data science landscape in and around Turkey?

Fatih as one of the speakers at the Istanbul Tech Week event

Fatih: I find people’s interest in data science quite noteworthy in Turkey, and it’s increasing every day. More and more students are choosing Computer Science as their major over other engineering majors. The main reason for this popularity is the overall adoption of data science in every industry.

The number of Turkish people that I encounter in kaggle competitions is also growing quite fast. This is heartwarming since this was not the case a few years ago. A similar situation is reflected in the meetup community as well. There has also been a rapid rise in both the number of the events and the students involved. Recently, a lot of Turkish companies have started hosting in-class competitions on Kaggle.

As a Data Scientist at H2O.ai, what are your roles, and in which specific areas do you work?

Fatih, along with fellow kaggle Grandmasters at H2O.ai

Fatih: I’m involved in POCs and other customer-related projects to help them benefit more from Driverless AI. Besides, I develop new apps via the Wave framework and testing Driverless AI with new datasets.

ExploRNA wave app created by Fatih. You can read more about the app here.

The Data Science domain is rapidly evolving. How do you manage to keep up with all the latest developments?

Fatih: I think social networks are the key to this. It’s almost impossible to remain up to date just by yourself. However, if you are in the right Slack channels and have a meaningful LinkedIn feed, it’s easier to follow the news. . Apart from this, joining kaggle competitions and regularly following the threads in competition forums is another useful resource.

How do you plan to spend your time on kaggle in 2021? Any special milestones you want to achieve?

Fatih: I want to join Computer Vision competitions in 2021. I’d be delighted to be placed in the top 50 as a solo competitor in one of these competitions. A gold medal as a team would also be fantastic, of course. 😃

A word of advice for the Data Science aspirants who have just started or wish to start their Data Science journey?

Fatih: I would suggest not worry too much about questions like — where to start, which courses to take, which tools to learn etc. Instead of dealing with all these questions initially, it is advisable to directly jump into a data science project or a competition and learn from others’ code. This is the way I improved myself by getting my hands dirty early on. Analyzing other peoples’ code and asking questions like — What does this code snippet do here? Why did the author code like this? How does it help in this project/competition? etc were some of the ways which allowed me to hone my skills. The next task is to answer these questions then. One could either search for the answers on the internet or make use of the discussion forums. 


 

Fatih’s Kaggle’s achievements reflect his passion for problem-solving and his constant penchant for hard work. How he transitioned from industrial engineering into Data science and then went to achieve the title of a Kaggle GrandMaster in a span of two years is commendable. 


Read other interviews in this series:

About the Author

Parul Pandey

Parul is a Data Science Evangelist here at H2O.ai. She combines Data Science, evangelism, and community in her work. She is also a Kaggle Grandmaster in the notebooks category and was one of Linkedin’s Top Voice in the Software Development category in 2019.

Leave a Reply

What it takes to become a World No 1 on Kaggle

In conversation with Guanshuo Xu: A Data Scientist, Kaggle Competitions Grandmaster, and a Ph.D. in

May 3, 2021 - by Parul Pandey
Unwrap Deep Neural Networks Using H2O Wave and Aletheia for Interpretability and Diagnostics

The use cases and the impact of machine learning can be observed clearly in almost

April 28, 2021 - by Shivam Bansal
Fallback Featured Image
Sign up for your free trial and get hands-on experience with H2O AI Hybrid Cloud

Hey Makers, today we launched our 14-day free trial of H2O AI Hybrid Cloud, giving

April 26, 2021 - by Ana Visneski and Jo-Fai Chow
Shapley summary plots: the latest addition to the H2O.ai’s Explainability arsenal

It is impossible to deploy successful AI models without taking into account or analyzing the

April 21, 2021 - by Parul Pandey
H2O.ai logra gran posicionamiento en integridad de visión en el cuadrante Visionarios del Cuadrante Mágico de Gartner 2021 para Data Science y Machine Learning

En H2O.ai, nuestra misión es democratizar la IA y creemos que impulsar el valor de

April 11, 2021 - by Read Maloney, SVP of Marketing
Safer Sailing with AI

In the last week, the world watched as responders tried to free a cargo ship

April 1, 2021 - by Ana Visneski, Jo-Fai Chow and Kim Montgomery

Join the AI Revolution

Subscribe, read the documentation, download or contact us.

Subscribe to the Newsletter

Start Your 21-Day Free Trial Today

Get It Now
Desktop img