Bird populations around the world are declining at an alarming rate, with approximately 48% of extant bird species known or suspected of being experiencing population declines. For example, the United States and Canada have reported 29% fewer birds since 1970.
Effective monitoring of bird populations is essential for the development of solutions that promote conservation. Monitoring allows researchers to better understand the severity of the problem for specific bird populations and assess whether existing interventions are working. To scale up monitoring, bird researchers have started to analyze ecosystems remotely using bird sound recordings instead of physically in person via passive acoustic monitoring. Researchers can collect thousands of hours of audio with remote recording devices and then use machine learning (ML) techniques to process the data. While this is an exciting development, existing ML models struggle with tropical ecosystem audio data due to the increased diversity of bird species and the overlap of bird sounds.
Annotated audio data is needed to understand the quality of the model in the real world. However, creating high-quality annotated data sets, especially for areas with high biodiversity, can be expensive and tedious, often requiring tens of hours of expert analyst time to annotate a single hour of audio. Furthermore, extant annotated data sets are rare and cover only a small geographic region, such as sapsucker forest or the Peruvian jungle. Thousands of unique ecosystems in the world still need to be analyzed.
In an effort to address this issue, for the past 3 years, we have been hosting ML competitions in Kaggle in alliance with specialized organizations focused on high impact ecologies. In each competition, participants are challenged to build ML models that can take sounds from an ecology-specific dataset and accurately identify bird species by sound. The best inputs can train reliable classifiers with limited training data. last year’s competition focused on Hawaiian bird species, which are some of the most most threatened in the world.
The BirdCLEF ML 2023 Competition
This year we partnered with The K. Lisa Yang Center for Bioacoustics for Conservation at Cornell Lab of Ornithology and NATURAL STATE to house the BirdCLEF ML 2023 Competition focused on the birds of Kenya. The total prize pool is $50,000, the registration deadline is May 17, 2023, and the final submission deadline is May 24, 2023. See the competitor website for detailed information about the data set to be used, deadlines, and rules.
Kenya is home to more than 1,000 species of birdscovering a wide range of ecosystemsof the sheets of Masai Mara toward kakamega jungle and even alpine regions in kilimanjaro and mount kenya. Tracking this large number of species with ML can be challenging, especially with minimal training data available for many species.
NATURAL STATE is working in pilot areas around northern Mount Kenya to test the effect of various management regimes and states of degradation on bird biodiversity in grassland systems. By using the ML algorithms developed within the scope of this competition, NATURAL STATE will be able to demonstrate the effectiveness of this approach in measuring the success and profitability of restoration projects. Additionally, the ability to cost-effectively monitor the impact of restoration efforts on biodiversity will allow NATURAL STATE to test and build some of the first biodiversity-focused financing mechanisms to channel much-needed investment into restoring and protecting this landscape on which so many people depend on. These tools are needed to cost-effectively scale this beyond the project area and achieve your vision of restoring and protecting the planet at scale.
In past contests, we’ve used metrics like F1 Score, which requires choosing specific detection thresholds for the models. This requires significant effort and makes it difficult to assess the quality of the underlying model: a bad thresholding strategy on a good model can underperform. This year we are using a thresholdless model quality metric: class mean mean precision. This metric treats the output of each bird species as a separate binary classifier to compute an average AUC Score for each, and then average these scores. Switching to an uncalibrated metric should increase the focus on the quality of the core model by removing the need to choose a specific detection threshold.
How to start
This will be the first Kaggle competition where participants can use the newly released Kaggle Models platform that provides access to over 2,300 pre-trained public models, including most of the TensorFlow Center models. This new resource will have deep integrations with the rest of Kaggle, including Kaggle Notebook, data setsand competitions.
If you are interested in participating in this competition, a great place to start quickly is to use our recently open source Bird Vocalization Classifier Model which is available from Kaggle Models. This global model of integration and classification of birds provides output logits for more than 10,000 bird species and also creates integration vectors that can be used for other tasks. Follow the steps shown in the figure below to use the Bird Vocalization Classifier model in Kaggle.
To test the model in Kaggle, navigate to the model here. 1) Click “New Notebook”; 2) click the “Copy code” button to copy the sample code lines needed to load the model; 3) click the “Add Model” button to add this model as a data source to your laptop; and 4) paste the sample code into the editor to load the model. |
Alternatively, the competition initiation notebook includes the model and additional code to more easily generate a competition submission.
We invite the research community to consider participating in the BirdCLEF Contest. As a result of this effort, we hope it will become easier for conservation researchers and professionals to study bird population trends and develop effective conservation strategies.
Thanks
Compiling these extensive data sets was a significant undertaking, and we are very grateful to the many domain experts who helped collect and manually annotate the data for this competition. Specifically, we would like to thank (institutions and individual collaborators in alphabetical order): Julie Cattiau and Tom Denton for the brain teamMaximilian Eibl and Stefan Kahl in Chemnitz University of TechnologyStefan Kahl and Holger Klinck of the K. Lisa Yang Center for Conservation Bioacoustics at Cornell Lab of OrnithologyAlexis Joly and Henning Muller in LifeCLEFJonathan Baillie’s NATURAL STATEHendrik Reers, Alain Jacot and Francis Cherutich from OekoPara GbRand Willem-Pier Vellinga of xeno-canto. We would also like to thank Ian Davies of the Cornell Lab of Ornithology for allowing us to use the hero’s image in this post.