January 15th, 2021
Meet the Data Scientist who just cannot stop winning on Kaggle.RSS Share Category: Kaggle
By: Parul Pandey
In conversation with Philipp Singer: A Data Scientist, Kaggle Double Grandmaster, and a Ph.D. in Computer Science.
In this series of interviews, I present the stories of established Data Scientists and Kaggle Grandmasters at H2O.ai, who share their journey, inspirations, and accomplishments. These interviews are intended to motivate and encourage others who want to understand what it takes to be a Kaggle Grandmaster.
In this interview, I shall be sharing my interaction with Philipp Singer, better known as Psi in Kaggle world. He is a Kaggle Double Grandmaster and a Senior Data Scientist at H2O.ai. Philipp obtained his Ph.D. in computer science with honors at the Technical University of Graz, where he also finished his Master’s studies in Software Development and Business Management.
Philipp has several accomplishments, including multiple winning and top placements on Kaggle and several scientific honors, such as the best paper award at the renowned World Wide Web Conference. He is currently ranked 3rd globally in the Kaggle competitions tier, which is both pretty impressive and inspiring at the same time.
One of Philipp’s most notable achievements has been winning the NFL’s second annual Big Data Bowl competition by teaming together with a fellow H2O.ai Data Scientist –Dmitry Gordeev. More than 2,000 data scientists from all over the world competed on Kaggle to predict rushing play outcomes. Philipp Singer and Dmitry Gordeev captured the top prize of $50,000 with their approach.
In this interview, we shall know more about his academic background, his passion for Kaggle, and his work as a Data Scientist. Here is an excerpt from my conversation with Philipp:
- You have a Ph.D. in Computer Science. Why did you opt for Data Science as a career rather than sticking to academia’s research side?
Philipp: I obtained a Ph.D. in Computer Science at the Technical University of Graz in Austria and worked as a postdoctoral researcher in Germany. I touched on many different data science topics during my scientific career and published many papers and articles in renowned conferences and journals. As the next step in that career, I would have had to pursue a professorship, which sounded intriguing. However, even though I love teaching, I also wanted to delve into more applied work, meaning that I wanted my work to have more impact than what is mostly possible in research. This prompted me to take up data science as a career. That said, I thoroughly enjoyed my Ph.D. and learned a lot during that time, but now I am also delighted to be at the forefront of data science and machine learning and have a real maker role at H2O.ai.
- How did your tryst with Kaggle begin, and what kept you motivated throughout your grandmaster’s journey?
Philipp: I signed up on Kaggle around eight years ago, close to my first steps as a Ph.D. because I heard about the platform and wanted to check it out. But I did not do more than a sample submission and then stopped touching Kaggle for six years. Around two years ago, Dmitry (dott1718 on Kaggle and back then and now a work colleague) and I decided to try out a competition together on Kaggle as a side project at work. We went with zero expectations into it but ended up winning the competition, which got me hooked and began my Kaggle journey. My approach on Kaggle has always been to tackle new types of problems to stay motivated, and there are still new interesting problems to solve on a regular basis. I also enjoy meeting and working with talented people on Kaggle and seeing how the community strives.
- Lately, you have been killing the Kaggle leaderboard with some spectacular results, the latest being NFL 1st and Future — Impact Detection, where you finished 2nd. What is your approach towards solving such problems and faring well?
Philipp: People often ask me how they can win Kaggle competitions, and I do not think there is a general secret sauce that can be applied. A lot of success on Kaggle is based on experience and the willingness to touch and learn about things that, at first glance, you do not know much about. Over time, I have assembled a particular generic toolbox that incorporates building blocks from each competition that I have tackled. For example, I understand how to set up proper cross-validation, what libraries to use for my models, how to fit models properly, track their performance, and similar things. So I already have more time to focus on new and crucial aspects of recent competitions. I always try to improve my workflow after each competition to become more efficient and competitive.
“A lot of success on Kaggle is based on experience and the willingness to touch and learn about things that at first glance, you do not know much about.”
- How do you decide which competitions to participate in?
Philipp: I mostly try to tackle new types of problems or competitions that sound interesting concerning the data or the problem to solve. Sometimes I also try my luck with more standard competitions to stay informed about the art’s weekly changing state.
- How do you typically approach a Kaggle problem? Any favorite ML resources(MOOCS, Blogs, etc.) that you would like to share with the community?
Philipp: I try to resort to my repertoire of methods, tools, and experience I have already accumulated and then try to research the specific problem at hand. This means I will study previous solutions to similar problems on Kaggle and read relevant papers. The best way to learn about a problem is to go hands-on and learn along the way.
- As a Data Scientist at H2O.ai, what are your roles, and in which specific areas do you work?
Philipp: At H2O.ai, my role is very multi-faceted. I am regularly involved in customer-facing projects where my goal is to support projects with my data science expertise. Furthermore, as Kaggle Grandmasters, we always try to utilize our experience and knowledge about the state-of-the-art to continuously improve our products and develop new bleeding-edge prototypes and solutions. For example, this could mean that we make suggestions for new features in Driverless AI or develop AI applications in Wave demonstrating new techniques or full pipeline data science solutions.
- What are some of the best things you have learned via Kaggle that you apply in your professional work at H2O.ai?
Philipp: One important thing you learn on Kaggle is how to produce robust models that can generalize well and are not subject to strong overfitting. This is crucial on Kaggle as you need to perform well on unseen private data. This means that you learn a lot about robust cross-validation and care about other data facets like feature distribution shifts or certain essential aspects. I can utilize this knowledge well for my work at H2O.ai as this is also an integral part of our products. We want to enable customers to do robust machine learning supported by our expertise and knowledge in the area.
- The Data Science domain is rapidly evolving. How do you manage to keep up with all the latest developments?
Philipp: I mostly use Kaggle to keep up with the latest developments; it is an excellent filter of new techniques that either work on practical and applied problems or do not work. Usually, the robust methods survive, and the marginal techniques that only work occasionally get filtered out. At the same time, I try to keep up-to-date by following well-known researchers and practitioners on Twitter and other platforms.
- Are there any specific areas or problems where you would want to apply your expertise in ML?
Philipp: I have nothing specific in mind; I usually try to get surprised by interesting problems popping up either at work or Kaggle. It is quite essential to delve into problems that do not seem that interesting to you at first glance. You can also bring an unbiased view to the problem and probably also apply your experience gained from other issues to the data at hand.
- A word of advice for the Data Science and Kaggle aspirants who have just started or wish to start their Data Science journey?
Philipp: Get your hands dirty, don’t be afraid to fail, and always be eager to learn new things.
Philipp’s Kaggle journey has been quite remarkable. I’m sure, his journey, dedication, and achievements will be a source of inspiration for others working to trying to make a career in this field.
Read other interviews in this series:
- Rohan Rao: A Data Scientist’s journey from Sudoku to Kaggle
- Shivam Bansal: The Data Scientist who rules the ‘Data Science for Good’ competitions on Kaggle.
- Meet Yauhen: The first and the only Kaggle Grandmaster from Belarus.
- Sudalai Rajkumar: How a passion for numbers turned this Mechanical Engineer into a Kaggle Grandmaster
- Gabor Fodor: The inspiring journey of the ‘Beluga’ of Kaggle World 🐋