Editor’s Image
On May 12, 2023, Kaggle opened a ai-report/overview/timeline” rel=”noopener” target=”_blank”>competence where the Kaggle community can participate in creating a report that will summarize the rapid advances in ai over the past two years. The Kaggle community is a diverse group that has a variety of experiences within the depths of ai.
Participants were asked to write an essay on a particular topic based on the changes and developments of the last 2 years, for example, generative ai, ai ethics, and more.
He ai-Report-2023″ rel=”noopener” target=”_blank”>the report is here and is made up of the following sections:
- Generative ai
- Text data
- Image and video data
- Tabular and time series data
- Kaggle Competitions
- ai ethics
So let’s delve into what we’ve learned…
Generative ai has been a popular topic of conversation recently. This opening section delves into the rapid progress and applications of generative ai over the past 2 years. We have seen advancements such as text generation, image creation, and music development using tools and techniques such as GAN and LLM.
This has only been possible with the use of larger data sets and improved hardware to improve the algorithms during their training phase. Although generative ai is still in its infancy, the past year alone has shown how it is revolutionizing different industries. There are still ethical concerns that need to be taken into account, such as concerns about privacy, misinformation, and the use of these artificial intelligence systems.
Read more about the different essays:
- ai-report-generative-ai” rel=”noopener” target=”_blank”>Generative ai
- Understand, Generate and Transform the World
- ai” rel=”noopener” target=”_blank”>A look at the field of generative ai
With the hype around generative ai, there has been a huge increase in interest in natural language processing (NLP) due to the rise of large language models (LLM). Naturally, the next section of the Kaggle ai report focuses on NLP techniques and their use in various tasks such as summarization and translation.
If we go back, early approaches to text-based tasks included term- and frequency-based feature engineering along with non-neural network-based machine learning methods. We now deal with larger data sets that undergo word representation learning for model interpretation.
Using Internet data as a training corpus has allowed these models to learn better and produce better performance in areas such as transfer learning. Within Kaggle competitions, there has been a trend to fine-tune publicly available models that have been shown to outperform at a human level.
The following main essays focus on the emergence and recent techniques of LLMs:
- LLM on Contemporary Large Language Models
- Large language models: reasoning ability
- Minigiants: “small” language models
Just like text data being used in tasks like content generation, image and video generation has also been very popular. Computer vision has been around for a long time, but in recent years it has exploded. Now we can take care of tasks like object detection and more.
This section delves into model architectures as well as common practices used in computer vision, such as augmentation. Used in a variety of different industries, such as healthcare for medical imaging, computer vision still faces challenges in areas such as deepfakes, ethical and philosophical considerations, limitations of multimodal models, and more.
We have models like Segment Anything Model (SAM) and YOLO (You Only Look Once) that have demonstrated how generalized open source models can be adapted for different and unique tasks.
Dive into advances in image and video data with these essays:
- ai-vision-models-in-the-last-two-years” rel=”noopener” target=”_blank”>Advances in ai vision models in the last two years
- ai-report” rel=”noopener” target=”_blank”>Image and video data
The next section delves into the historical importance of tabular data and time series data. Both have not been very popular in recent years because they have not had the same impact as the deep learning revolution. However, there are still widely used and very effective trends in areas such as:
- Single approach for individual data sets/problems
- Importance of data preprocessing and feature engineering.
- The dominance of gradient boosted trees
Within the Kaggle community, these trends have been highly recognized and the following essays will delve into them, as well as the unique challenges that tabular and time series data face.
- Learnings from the typical tabular process
- ai-report” rel=”noopener” target=”_blank”>Time series and tabular data
- ai?scriptVersionId=141645554″ rel=”noopener” target=”_blank”>Tabular data in the age of ai
A part of this Kaggle community report was to also analyze Kaggle competitions by analyzing their developments and the community’s observations on them over the last 2 years. Kaggle competitions have been very popular over the years, as the community has used the platform to test their skills, build a portfolio, and prepare for the real world.
Observations of changes in Kaggle competitions are techniques such as pseudo-tagging, seed averaging, and hill climbing that were once considered “tricks” but have now become common practices. Kaggle competitions in the last 2 years have become more competitive and competitions like RSNA, Learning Agency and more are very popular.
Dive into the winning hacks from Kaggle competitions:
- ai/notebook” rel=”noopener” target=”_blank”>Towards a green ai
- How to win a Kaggle competition
- ai-report-medical-imaging-competitions” rel=”noopener” target=”_blank”>Medical Imaging Contests
Ethics around ai is also another area of concern, as many people in society have mixed emotions about the use and implementation of ai systems. Organizations are investigating the ethical principles of ai and creating new strategies to ensure they can not only understand ai systems but also monitor and mitigate risks.
It is not an academic study but a social one; There are many opinions that are important to understand the world of ai and how it can continue to be used while safeguarding the values of society. We have seen organizations undergo continuous audits of their ai systems with the adoption of ethics by design.
Learn more about the challenges around ai and the impact it is having on society:
- ai-ethics/notebook” rel=”noopener” target=”_blank”>Exploring the ai ethics landscape
- ai-and-ethics-in-the-past-2-years” rel=”noopener” target=”_blank”>Developments in ai and ethics in the last 2 years
- ai-is-all-we-need#contextual” rel=”noopener” target=”_blank”>Ethical ai is all we need!!
The Kaggle team has created a unique report in which they have allowed their community to express their opinions and experience of the world of ai and its changes in the last 2 years. Let us know if there was a particular section or essay that you found very interesting!
nisha arya is a data scientist and freelance technical writer. She is particularly interested in providing professional data science advice or tutorials and theory-based data science insights. She also wants to explore the different ways in which artificial intelligence can benefit the longevity of human life. A great student looking to expand her technological knowledge and writing skills, while she helps guide others.