This AI article introduces BioCLIP: leveraging the TreeOfLife-10M dataset to transform computer vision in biology and conservation

Many branches of biology, including ecology, evolutionary biology, and biodiversity, are increasingly turning to digital imaging and computer vision as research tools. Modern technology has greatly improved its ability to analyze large amounts of images from museums, camera traps and citizen science platforms. These data can then be used to delimit species, understand adaptive mechanisms, estimate population structure and abundance, and monitor and conserve biodiversity.

However, finding and training an appropriate model for a given task and manually labeling enough data for the particular species and study at hand remain significant challenges when trying to employ computer vision to solve a biological question. This requires a lot of time and machine learning knowledge.

Researchers from The Ohio State University, Microsoft, the University of California Irvine, and Rensselaer Polytechnic Institute are investigating building such a model of the fundamental vision of the Tree of Life in this effort. This model must meet these requirements to be generally applicable to real-world biological tasks. First of all, it must be able to accommodate researchers investigating a wide variety of clades, not just one, and ideally generalize to the entire tree of life. Additionally, you should acquire detailed representations of creature images because, in the field of biology, it is common to find visually similar organisms, such as closely related species within the same genus or species that mimic the appearances of others for the same reason. of aptitude. Because of the organization of living things into broad, very fine-grained groups (such as animals, fungi, and plants) on the Tree of Life, this level of granularity is significant. Finally, excellent results in the low-data regime (i.e., zero-shot or few-shot) are crucial due to the high cost of data collection and labeling in biology.

Current domain-general vision models trained on hundreds of millions of images do not perform adequately when applied to evolutionary biology and ecology, although these goals are not new to computer vision. Researchers have identified two main obstacles to creating a basic model of vision in biology. For starters, better pre-training data sets are needed, as those already available are inadequate in terms of size, diversity, or label granularity. Second, as current pre-training algorithms do not address the three main objectives well, there is a need to find better pre-training methods that take advantage of the unique characteristics of the biological domain.

Taking into account these objectives and the obstacles to their achievement, the team presents the following:

TREE OF LIFE-10M, a huge MLready biology image dataset
BIOCLIP is a vision-based model for the tree of life trained using appropriate taxa in TREEOFLIFE-10M.

A large and varied biological imaging dataset that is ML-ready is TREEOFLIFE-10M. With more than 10 million photographs spanning 454,000 taxa in the Tree of Life, researchers have curated and published the largest ML-ready biological image dataset to date with attached taxonomic labels.2 Only 2.7 millions of photographs represent 10,000 iNat21 composite taxa, the largest collection of ML-ready biology images. Existing high-quality data sets such as iNat21 and BIOSCAN-1M are incorporated into TREEOFLIFE-10M. Most of the diversity of data in TREEOFLIFE-10M comes from the Encyclopedia of Life (eol.org), which contains newly selected photographs from that source. The taxonomic hierarchy and higher taxonomic classifications of each image in TREEOFLIFE-10M are annotated to the greatest extent possible. BIOCLIP and other models for the future of biology can be trained with the help of TREEOFLIFE-10M.

BIOCLIP is a view-based representation of the Tree of Life. A common and simple approach to training vision models on large-scale labeled datasets like TREEOFLIFE10M is to learn to predict taxonomic indices from images using a supervised classification objective. ResNet50 and Swin Transformer also use this strategy. However, this ignores and does not use the complex system of taxonomic labels: taxa are not isolated but are interrelated within a complete taxonomy. Therefore, a model trained using basic supervised classification may not be able to classify unknown taxa or generalize well to taxa that were not present during training. Instead, the team follows a new approach that combines BIOCLIP's extensive biological taxonomy with CLIP-style multimodal contrastive learning. Using the CLIP contrastive learning objective, they can learn to associate images with their respective taxonomic names after “flattening” the taxonomy from Kingdom to the most distal taxon rank in a chain known as a taxonomic name. When using taxonomic names of taxa that are not visible, BIOCLIP can also perform zero-shot classification.

The team also suggests and demonstrates that a mixed text type training technique is beneficial; This means that they maintain the generalization of names in the taxonomy but have more room to be flexible when testing by combining multiple text types (e.g., scientific names with common names) during training. For example, intermediate users can still use common species names and BIOCLIP will work exceptionally well. Their comprehensive evaluation of BIOCLIP is based on ten detailed image classification datasets covering flora, fauna and insects and a specially selected RARE SPECIES dataset that was not used during training. BIOCLIP significantly outperforms CLIP and OpenCLIP, resulting in an average absolute improvement of 17% under low-shot circumstances and 18% under zero-shot circumstances, respectively. Furthermore, its intrinsic analysis can explain BIOCLIP's better generalization, showing that it has learned a hierarchical representation that conforms to the Tree of Life.

BIOCLIP training remains focused on classification, although the team has used the CLIP target to effectively learn visual representations of hundreds of thousands of taxa. To allow BIOCLIP to extract detailed trait-level representations, they plan to incorporate research-grade photographs from inaturalist.org, which has 100 million or more photographs, and collect more detailed textual descriptions of the species' appearances in future work.

Review the Paper, Projectand GitHub. All credit for this research goes to the researchers of this project. Also, don't forget to join. our 33k+ ML SubReddit, 41k+ Facebook community, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.

If you like our work, you'll love our newsletter.

Dhanshree Shenwai is a Computer Science Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking with a keen interest in ai applications. He is excited to explore new technologies and advancements in today's evolving world that makes life easier for everyone.

<!– ai CONTENT END 2 –>

(Free Webinar) Alexa, Update My App: Integrate Voice ai into Your Strategy (December 15, 2023)