The Data Scientist who rules the "Data Science for Good" competitions on Kaggle

Published: October 17, 2019

min read

Written by: Parul Pandey

In conversation with Shivam Bansal: A Data Scientist, a Kaggle Kernel’s Grandmaster, and three times winner of Kaggle’s Data Science for Good Competition.

Communication is an art and a useful tool in the Data Science domain. Being able to communicate the insights is necessary so that others can take the required actions based on the results of these findings. One of the most effective ways to communicate data is through storytelling. But to be compelling storytellers, we need to simplify things and not complicate them so that the real essence of the analysis is not lost. In short, don’t simply show data, tell a story with it.

In this edition of the Kaggle Grandmasters’ interview, I bring to light the amazing and inspiring journey of a master storyteller: Shivam Bansal , a Kaggle Kernels Grandmaster and a Senior Data Scientist at H2O.ai. He is currently based out of Singapore and is involved in H2O.ai’s APAC activities. Shivam is a Computer Science graduate from India. He went on to obtain his Masters in Business Analytics from the National University of Singapore in 2019, where he also won the Outstanding Capstone Project Award.

Shivam has very interesting and a focussed Kaggle journey and in his own words :

“I joined kaggle in January, and by the end of the year, I became kernels Grandmaster, reached overall rank 2nd, won 10 kernel awards (including three weekly kernels awards and four swag prizes) , and also won 3 kernel competitions of data science for goods challenges.”

Here is an excerpt from my conversation with Shivam :

What initially attracted you to Kaggle, and When did the first win come your way?

Shivam : I got to know about Kaggle in 2014 when I was working in my first company. I was developing a text analytics framework, including sentiment analysis, and at the same time, a similar competition was also running on Kaggle. I did not join Kaggle at that time but used it as a reference for some good discussions and knowledge. I joined kaggle after four years, in early 2018 and took part in one of the competitions on toxic comment classification. In that competition, I learned a lot from many Kernels shared by others. I realized that Kaggle kernels are one of the most valuable tools for anyone trying to learn and practice data science. I also decided to share one of the kernels as part of the competition, and luckily it was selected as the winner for one of the awards.

In April 2018 , I participated in Data science for good competition organized by DonorsChoose , Kaggle, and Google . I created a hybrid recommendation engine in a kernel, and it was selected as the winner of the competition. This was my first major win on Kaggle.

Why did you decide to enter the Data Science for Good Competition?

Shivam’s Winning Submission: https://www.kaggle.com/shivamb/1-bulletin-structuring-engine-cola

Shivam : I enjoy participating in data science for good challenges. These competitions present an exceptional, unique, and challenging set of problem statements. Unlike traditional ML specific competitions in which a leaderboard does the evaluation, these competitions are more open-ended. Such competitions demand solutions that are complete in almost all aspects of data science — from data mining, cleaning, engineering, analysis, modeling, visualizations, insights, and, most importantly, storytelling. I like the fact that these competitions are very close to real-life data science projects.

As can be inferred from your Kernels, you typically break down a problem into smaller parts. How has this approach helped you to solve problems?

Shivam : Any analytics or a data science problem is inherently unstructured. This means that there are no clear goals and tasks to be done; hence, it is sometimes difficult to model and approach. I always try to bring a structure in the problem statement by breaking it down into smaller tasks. I then try to connect the tasks and prepare a rough pipeline containing the possible solutions/codes for every job.

As a first goal, I make sure to create an end to end version of my kernel with all components and tasks ready. At this point, the results may not be effective, and state of the art, but I iteratively improve my kernel by adding visualizations, insights, explanations, code refactoring, etc. This type of approach brings clarity in the entire problem state and the solutions and also helps in improving the skills related to analytical thinking, structured thinking, and critical thinking.

Apart from the thorough analysis that is evident in your kernels, you also have the ability to visualize the results. How important is Data Visualisation, a skill for Data Scientists?

Shivam : Visualizations are a crucial part of the entire data science workflow. The ability to clearly show information and insights present in a dataset is highly appreciated and sought after skill in the industry. Be it at the EDA stage or the post-modeling stage; visualizations play an essential role. The end-users of your work may not always understand the technical aspects of a data or the results, but when visualized using different plots, interactions, animations, and a variety of graphs, it becomes more apparent. There are many areas in which visuals show importance — Data Storytelling, presenting insights from the data to business users, and even explaining black-box machine learning models.

You are the latest addition to the pool of Grandmasters at H2O.ai. As a Data Scientist at H2O.ai, what are your roles and in which specific areas do you work?

Shivam with some of the fellow Kaggle Grandmasters at H2O.ai

Shivam : My goal is to contribute to the new products that H2O.ai is currently building and also help improve DriverlessAI with new features and recipes. Some of the areas that I am looking to make an impact are unsupervised machine learning , natural language processing , auto-insights, visual analytics, natural language generation for auto-narratives, and AI to do BI. I also actively work with sales and pre-sales teams in the APAC region to get more customers onboard, educate prospective customers, and assist in their specific projects and queries. Additionally, I also work with customer success teams to help different organizations enable AI in their projects.

Can you share some useful resources for Data Science beginners apart from Kaggle?

Shivam : Apart from Kaggle, I have learned a lot from blogs and websites like Analytics Vidhya, Towards Data Science, and KDnuggets. These blogs are good sources to get to know about a new concept. To get a more comprehensive knowledge of data science, I recommend courses from experts like deeplearning.ai and fast.ai .

Are there any specific areas or problems where you would want to apply your expertise in ML?

Shivam : Solving the micro-finance problems for the underbanked population in developing countries. Many individuals or small enterprises are denied loans because they do not have access to banks and thus have no credit history. I would want to use concepts like network science, graph theory, and unstructured data to develop models to assist this segment of the population.

A word of advice for the Data Science aspirants who have just started or wish to start their Data Science journey?

Shivam : Data science is all about ideas and experiments. It is all about trying those ideas and experiments, and re-iterating again and again until a successful stage is reached. It’s about developing a mindset of ‘willing to try’ different experiments and failing. It’s also about taking the first step and keep improving.

Shivam won the **Outstanding Capstone Project Award** from the National University of Singapore for creating a platform for **Alternative Credit Scoring for SMEs** using unstructured data and deep learning.

Another valuable advice would be to always think from an end to end perspective. This means that it is necessary to keep a business perspective in mind while developing a data science solution. This thinking helps in coming up with a creative and relevant solution for any business problem at hand.

Lastly, while starting to learn about data science, there are plenty of useful resources on the internet, take one, start one, stick to it, and complete it. With so many resources out there, it is easy to get distracted, and I have seen many people failing to achieve that. To get the most out of any course, the best way is to complete it.

Shivam’s kernels on Kaggle are always a great amalgamation of thorough research, crisp documentation, and good quality visualization. The hard work that he puts behind his work is tremendous and is pretty obvious. Ben Shneiderman once said that the purpose of visualizations is insight, not pictures, and Shivam’s kernels are a testimony to this fact.

By the way, Shivam is one of the speakers at the Meet the Kaggle Grandmasters panel during H2O World New York! Click here to learn more and register to attend.

Parul Pandey

Parul focuses on the intersection of H2O.ai, data science and community. She works as a Principal Data Scientist and is also a Kaggle Grandmaster in the Notebooks category.

BACK TO LIST