IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

Apple is sponsoring the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), which will be held in person June 17-21 in Seattle, Washington. CVPR is the annual computer vision event comprising the main conference and several shared workshops and short courses. Below is the schedule of our workshops and sponsored events at CVPR 2024.

Schedule

Stop by the Apple booth in the Arch Building, Exhibit Hall Level 4, Booth #1905, 10:30 a.m. to 6:30 p.m. PST, June 19-20; 10:00 am to 3:00 pm PST on June 21.

Monday June 17

ORAL PRESENTATION AND POSTER
Image Matching Workshop: Local Features and Beyond 2024
1:00 pm PST – 5:45 pm PST, Summit 323
Affine-Based Deformable Attention and Selective Fusion for Semi-Dense Blending
Hongkai Chen, Zixin Luo, Ray Tian, Aron Wang, Lei (VE) Zhou, Xuyang Bai, Mingmin Zhen, Tian Fang, Yanghai Tsin, David McKinnon, Long Quan (Hong Kong University of Science and technology)

Tuesday June 18

WORKSHOP
LatinX in CV (LXCV) at CVPR 2024
8:30 am PST – 6:00 pm PST, Arch 203
Marcel Santos, Conor O'Brien and Angus Choi represent Apple at Latin x workshop events.

Wednesday June 19

POSTER
HUGS: Human Gaussian Splashes
10:30 am PST – 12:00 pm PST, #32, Poster Session 1 and Exhibit Hall (Arch 4A-E)
Muhammed Kocabas (Max Planck Institute for Intelligent Systems), Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan

POSTER
Direct2.5: Diverse Text to 3D Generation Using 2.5D Multi-View Diffusion
5:15 pm PST – 6:45 pm PST, #382, Poster Session 2 and Exhibit Hall (Arch 4A-E)
Yuanxun Lu (Nanjing University), Jingyang Zhang, Shiwei Li, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan (Hong Kong University of Science and technology), Xun Cao (Nanjing University), Yao Yao (Nanjing University )

Thursday June 20

Friday, June 21st

Accepted articles

Affine-Based Deformable Attention and Selective Fusion for Semi-Dense Blending
Hongkai Chen, Zixin Luo, Ray Tian, Aron Wang, Lei (VE) Zhou, Xuyang Bai, Mingmin Zhen, Tian Fang, Yanghai Tsin, David McKinnon, Long Quan (Hong Kong University of Science and technology)

Direct2.5: Diverse Text to 3D Generation Using 2.5D Multi-View Diffusion
Yuanxun Lu (Nanjing University), Jingyang Zhang, Shiwei Li, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan (Hong Kong University of Science and technology), Xun Cao (Nanjing University), Yao Yao (Nanjing University )

KPConvX: Kernel Point Convolution Modernization with Kernel Awareness
Hugues Thomas, Hubert Tsai, Tim Barfoot (University of Toronto), Jian (AIML) Zhang

SAM-CLIP: Merging Vision Foundation models towards semantic and spatial understanding
Haoxiang Wang (University of Illinois Urbana-Champaign), Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pour Ansari

MobileCLIP: Fast Image and Text Models Using Multimodal Booster Training
Pavan Kumar Anasosalu Vasu, Hadi Pour Ansari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel

Probabilistic Speech-Based 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
Karren Yang, Anurag Ranjan, Rick Chang, Raviteja Vemulapalli, Oncel Tuzel

Efficient diffusion models without attention
Jing Nathan Yan (Cornell University), Jiatao Gu, Alexander M. Rush (Cornell University)

HUGS: Human Gaussian Splashes
Muhammed Kocabas (Max Planck Institute for Intelligent Systems), Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan

HumMUSS: Understanding Human Movement Using State Space Models
Arnab Mondal (McGill University), Stefano Alletto, Denis Tome'

Population

MobileCLIP: real-time image and text models

Wednesday, June 19 – Friday, June 21, during exhibition hours

The demo shows zero-shot scene classification running in real time on an iPhone. Because these models align image and text modalities, they can perform zero-shot image classification or image-text/text-image retrieval at breakneck speeds. The app displays the research paper “MobileCLIP: Fast Image and Text Models Using Multimodal Reinforced Training” presented at the same place. The app was created by David Koski, Megan Maher Welsh with contributions from Hugues Thomas, Mouli Sivapurapu and Jian Zhang.

Flow Composer for Apple ML

Wednesday, June 19 – Friday, June 21, during exhibition hours

The demo demonstrates the use of Apple ML features on Mac Book Pro and iPad, leveraging various technologies such as Vision, CoreML, and Core Graphics.

Thanks

Alex Schwing and Philipp Kraehenbuehl are Senior Area Chairs of CVPR 2024.

Alex Toshev, Oncel Tuzel, Mehrdad Farajtabar, Hadi Pour Ansari and Fartash Faghri are the CVPR 2024 area chairs.

Fartash Faghri, Jason Ren, Jianrui Cai, Jiajia Luo, Jierui Lin, Liangchen Song, Or Dinari, Pavan Kumar Anasosalu Vasu, Peter Fu, Raviteja Vemulapalli, Haotian Zhang, Hong-You Chen, Wen Shi, Yongzhi Su, Yuyan Li, Trevine Oorloff, Yongxi Lu, and Jeff Lai are reviewers for CVPR 2024.

Anshul Shah is co-organizer of the workshop. Learning from videos and how-to language: what's next?

Jeff Bigham is co-organizer of the VizWiz Grand Challenge Workshop

Pau Rodríguez López is co-organizer of the Continuous Learning Workshop in Computer Vision

Jeff Lai has a doctoral thesis selected for the Doctoral Consortium.