What Is Kaggle? The New platform Simply Explained.

What Is Kaggle? The New platform Simply Explained.

Kaggle is a platform to show your data analysis and machine learning skills and compare yourself against others. Prize money over $ 10,000 is often offered as a reward.

What is Kaggle?

Kaggle is a competitive platform for data science and machine learning.

It is a platform specializing in data science with regular competitions. It is mostly about optimizing machine learning-based predictions, for example, time series forecasting or classification.

Real data and prize money provided by organizations, some of which reach millions of euros, results in a mutual measurement of the participants’ skills and the “hunt” for the top placements.

In general, a competition runs in such a way that a company or other organization posts data and a description of the problem (e.g., “forecast of sales in month X”). The participants or participating teams can develop and upload their solutions (mostly as an ID prediction pair).

These solutions are then automatically evaluated, and thus the leadership board is formed. The lower the error, i.e., the better the prediction, the higher the ranking. The errors are calculated depending on the competition, but mostly simply a squared mean error or a similar measure.

The story of Kaggle

It was founded in 2010 in Los Angeles, acquired by Google in 2017, and reached over a million members in the same year. From the beginning, It was recognized as a “Competition Platform” and dedicated itself to the challenge of marketing machine learning as an optimization problem.

In the meantime, you can find not only hundreds of competitions on Kaggle but also a database of publicly accessible data sets and courses. Thus, It takes on an increasingly central role in the careers of many data scientists, as the first practical experience can be gained here that goes beyond prepared standard data sets (Titanic, iris ..).

Who is the target audience?

While Kaggle was initially intended more for experienced data scientists and machine learning engineers, it now covers pretty much the entire spectrum of experience in data science and AI.

The challenging competitions for experienced data scientists remain the central component of Kaggle, but there are more and more interesting aspects for beginners due to the comprehensive range on offer.

Newcomers can quickly gain insights into other ways of thinking and analyzing and implementing their ideas, especially through the publishable notebooks containing code from participants. There are also relatively old but very accessible competitions that are well suited for expanding knowledge.

What makes it so special?

Kaggle was the first public platform that dealt with the topic of “machine learning as a competition.” The attractiveness of high prize money is a factor, but a very high placement in the competitions alone is often considered an award for the participants. Particularly noteworthy is the possibility of publishing notebooks, i.e., scripts.

Most of the time, there is a publicly available notebook in every competition that provides a basic analysis (exploratory data analysis with, if necessary, initial modeling). Based on this, refinements can be worked out. Of course, you can also work completely for yourself without having to publish scripts.

Frequently asked questions

IN WHICH PROGRAMMING LANGUAGES ​​DO YOU WORK ON?

Whether python or R or Java – the development does not influence the competitions . Since the script is not the evaluated solution, but only the predictions as .csv, you can generate this output with anything you can think of.

However, suppose you want to work directly with the Kaggle Notebook Environment. In that case, you have to rely on python or R., But he has the advantage of working directly on the resources provided by Kaggle.

HOW DO YOU BECOME A KAGGLE GRANDMASTER?

Grandmaster is the final stage of the Kaggle Progression System. To become a Kaggle Grandmaster, you must continuously excel in one of the four categories of competitions, datasets, notebooks, and discussion.

For example, to become a Notebook Grandmaster, you need 15 gold medals, one medal stands for 50 upvotes, new members and old posts are excluded. Consequently, to become a Kaggle Grandmaster, you must publish an exceptionally good basic analysis in 15 different competitions. Most, however, equate Kaggle Grandmaster with the category “Competitions,” as this is where the analyzes are evaluated. A top 10 placement in some competitions is usually required here, and that with several thousand participants.

Overall, the highest level in the Kaggle Progression System is 4x Kaggle Grandmaster, something that very few people have achieved so far. Strictly speaking, as of January 20th, 2021, exactly three of over 150,000 active participants: Chris Deotte, Vopani, and Abhishek Thakur.

THE KAGGLE TITANIC DATA SET

The Titanic Dataset is often used not only at Kaggle but also in ​​data science if you want to implement classification in practice. Kaggle guides its new users directly through the dataset analysis as a kind of tutorial on how Kaggle works as a platform and how to submit solutions.

IS IT FREE?

Yes, Kaggle membership is free. However, you have to be registered to download data sets or to take part in the competitions.

WHAT CAN YOU WIN AT KAGGLE?

Usually, Kaggle Competitions have cash prizes in the lower five-digit range, but higher prices are also possible. There are also competitions without a profit or other prizes such as memberships with companies or the like.

WHO OWNS THE KAGGLE PLATFORM?

The platform was founded and managed by Anthony Goldbloom and Ben Hamner. In the meantime, Google has bought the platform and is, therefore, the owner.

Who should join ?

We recommend trying Kaggle at least once. Only those who have a lot of time and experience will deliver good results, so prioritization is important as usual. In general, however, if someone has barely had any practical experience in the area of ​​machine learning, It can be a good starting point to deal with the problems in the area of ​​data science.

The Tech Spree

Leave a Reply

Your email address will not be published. Required fields are marked *