Deepmind Researchers open source TAPIR: a new AI model to track any point (TAP) that effectively tracks a query point in a video stream

Computer vision is one of the most popular fields of Artificial Intelligence. Models developed using computer vision can derive meaningful information from different types of media, be it digital images, videos, or any other visual input. It teaches machines how to perceive and understand visual information and then act on the details. Computer vision has taken a significant leap forward with the introduction of a new model called Tracking Any Point with Per-Frame Initialization and Temporal Refinement (TAPIR). TAPIR has been designed with the goal of effectively tracking a specific point of interest in a video sequence.

Developed by a team of researchers from Google DeepMind, VGG, the Department of Engineering Sciences and the University of Oxford, the algorithm behind the TAPIR model consists of two stages: a matching stage and a refinement stage. In the matching stage, the TAPIR model analyzes each video sequence frame separately to find a suitable candidate point match for the query point. This step seeks to identify the most probable related point of the lookup point in each frame, and to ensure that the TAPIR model can follow the movement of the lookup point throughout the video, this procedure is carried out on a frame-by-frame basis.

The matching step in which candidate point matches are identified is followed by the use of the refinement step. At this stage, the TAPIR model updates both the trajectory, which is the path followed by the query point, and the query functions based on local correlations, and therefore takes into account the surrounding information in each frame to improve accuracy and precision of lookup point tracking The refinement stage improves the model’s ability to accurately track lookup point movement and adjust for variations in the video stream by integrating local correlations.

JOIN the fastest ML subreddit community

For the evaluation of the TAPIR model, the team used the TAP-Vid benchmark, which is a standardized evaluation data set for video tracking tasks. The results showed that the TAPIR model performs significantly better than the reference techniques. Performance improvement has been measured using a metric called Average Jaccard (AJ), whereby the TAPIR model has been shown to achieve an approximate 20% absolute improvement in AJ compared to other methods on the DAVIS (Video Segmentation) benchmark. densely annotated).

The model has been designed to facilitate fast parallel inference in long video sequences, that is, it can process multiple frames simultaneously, improving the efficiency of tracking tasks. The team has mentioned that the model can be applied live, allowing it to process and keep track of points as new video frames are added. It can track 256 points in 256×256 video at a rate of approximately 40 frames per second (fps) and can also be expanded to handle higher-resolution movies, giving you flexibility in how you handle video of various sizes. and quality.

The team has provided two online Google Colab demos for users to try out TAPIR without installation. Colab’s first demo allows users to run the model on their own videos, providing an interactive experience to test and observe the model’s performance. The second demo focuses on running TAPIR online. In addition, users can run TAPIR live by point tracking on their own webcams with a modern GPU by cloning the provided codebase.

review the Paper and Project. Don’t forget to join our 24k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at asif@marktechpost.com

featured tools Of AI Tools Club

Check out 100 AI tools at AI Tools Club

Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.

Deepmind Researchers open source TAPIR: a new AI model to track any point (TAP) that effectively tracks a query point in a video stream

Technical Terrence Team

Emerging Ability Revealed: Only mature AI like GPT-4 can upgrade itself? Exploring the implications of autonomous growth in linguistic models

Leave a Reply Cancel reply

Recommended.

Special Report-Inside Intel, CEO Pat Gelsinger fumbled the revival of an American icon By Reuters

US lawmakers ask court to approve ‘independent examiner’ in FTX bankruptcy case

Germany liquidates over 90% of its Bitcoin holdings, retaining only $284 million

Nexo reveals the date it will stop the earning program for US customers.

AMD launches AMD-135M: AMD's first series of small language models trained from scratch on AMD Instinct MI250 accelerators using 670B tokens

Categories

Important Links

Deepmind Researchers open source TAPIR: a new AI model to track any point (TAP) that effectively tracks a query point in a video stream

featured tools Of AI Tools Club

Related

Technical Terrence Team

Emerging Ability Revealed: Only mature AI like GPT-4 can upgrade itself? Exploring the implications of autonomous growth in linguistic models

Leave a Reply Cancel reply

Recommended.

Special Report-Inside Intel, CEO Pat Gelsinger fumbled the revival of an American icon By Reuters

US lawmakers ask court to approve ‘independent examiner’ in FTX bankruptcy case

Germany liquidates over 90% of its Bitcoin holdings, retaining only $284 million

Nexo reveals the date it will stop the earning program for US customers.

AMD launches AMD-135M: AMD's first series of small language models trained from scratch on AMD Instinct MI250 accelerators using 670B tokens

Categories

Important Links

Get daily news updates to your inbox!