Kaggle is a fun platform that hosts a variety of data science and machine learning competitions, covering topics such as Sports, energy either autonomous driving.
In this post we will give an introduction to Kaggle and cover the introductory aspects. “Titanic” Challenge. We will explain how to approach and solve such a challenge, and we will demonstrate it with a 7% superior solution for “Titanic”.
You can find the full code at GitHub, and with that continue as I read this article, as well as reproducing my exact score. In it, we follow some things that I consider best practices for Python and use useful tools, such as mypy and poetry. That being said, let's dive into it.
Kaggle offers a wide range of data science/machine learning competencies; see the introduction for examples. It's a great way to test and improve your data science/ML knowledge and learn how to solve problems practically. Plus, you can even win monetary prizes! However, Kaggle is populated by some of the best data scientists and ML people out there, and prices are only given for the top few solutions (several hundreds or thousands), so winning here is extremely difficult and rare. , and it shouldn't be. your main motivation when starting out.
Every competition (most?) comes with a story, a purpose, and a set of data. You are then tasked with understanding the data and solving the desired problem. If you want, you can submit your solutions to the platform and get ranked on a public leaderboard; that is, your solution is classified into an available test suite. However, to avoid cheating or optimize this by simply spamming submissions, once the competition time (usually a few weeks or months) has expired, all competitors/teams are ranked against a private test suite, thus deciding the final winners.
Next, we will show how to understand the data, create a model, and submit it to Kaggle following the introduction. titanic competition.