Action recognition, the task of identifying and classifying human actions from video sequences, is a crucial field within computer vision. However, its reliance on large-scale datasets containing images of people poses significant challenges related to privacy, ethics, and data protection. These issues arise due to the potential identification of individuals based on personal attributes and data collection without explicit consent. Additionally, biases related to gender, race, or specific actions taken by certain groups can affect the accuracy and fairness of models trained on such data sets.
In action recognition, advances in pre-training methodologies on massive video datasets have been instrumental. However, these advances come with challenges, such as ethical considerations, privacy issues, and inherent biases in human image datasets. Existing approaches to address these issues include blurring faces, downsampling videos, or employing synthetic data for training. Despite these efforts, more analysis is needed on how well privacy-preserving pre-trained models transfer their learned representations to subsequent tasks. State-of-the-art models sometimes fail to accurately predict actions due to biases or lack of diverse representations in the training data. These challenges call for novel approaches that address privacy concerns and improve the transferability of learned representations to various action recognition tasks.
To overcome the challenges posed by privacy concerns and biases in human-centric datasets used for action recognition, a new method was recently presented at NeurIPS 2023, the well-known conference, presenting an innovative approach. This recently published work designs a methodology to pre-train action recognition models using a combination of synthetic videos containing virtual humans and real-world videos without humans. By leveraging this novel pre-training strategy called Privacy Preserving MAE-Align (PPMA), the model learns temporal dynamics from synthetic data and contextual features from real human-free videos. This innovative method helps address ethical and privacy concerns related to human data. It significantly improves the transferability of learned representations to various subsequent action recognition tasks, closing the performance gap between models trained with and without human-centered data.
Specifically, the proposed PPMA method follows these key steps:
- Privacy-preserving real data: The process starts with the Kinetics data set, from which humans are removed using the HAT framework, resulting in the No-Human Kinetics data set.
- Adding synthetic data: Synthetic videos from SynAPT are included, offering virtual human actions that facilitate focusing on temporal features.
- Post evaluation: Six diverse tasks evaluate the transferability of the model across various action recognition challenges.
- MAE-Align Pretraining: This two-stage strategy involves:
- Stage 1: MAE Training to predict pixel values and learn real-world contextual features.
- Stage 2: Supervised Alignment using both non-human kinetics and synthetic data for action label-based training.
- Privacy Preserving MAE-Align (PPMA): Combining Stage 1 (MAE trained on Non-Human Kinetics) with Stage 2 (alignment using Non-Human Kinetics and synthetic data), PPMA ensures robust representation learning while safeguarding privacy.
The research team conducted experiments to evaluate the proposed approach. Using ViT-B models trained from scratch without prior ImageNet training, they employed a two-stage process: MAE training for 200 epochs followed by supervised alignment for 50 epochs. Across six diverse tasks, PPMA outperformed other privacy-preserving methods by 2.5% in fine-tuning (FT) and 5% in linear probing (LP). Although slightly less effective on tasks with high scene-object bias, PPMA significantly reduced the performance gap compared to models trained on human-centered real data, showing the promise of achieving robust representations while preserving Privacy. The ablation experiments highlighted the effectiveness of MAE pretraining in learning transferable features, particularly evident when fine-tuned in subsequent tasks. Furthermore, exploring the combination of contextual and temporal features, methods such as averaging model weights and dynamically learning to mix proportions showed potential to improve representations, opening avenues for further exploration.
This paper presents PPMA, a new privacy-preserving approach for action recognition models, which addresses privacy, ethics, and bias challenges in human-centered datasets. Leveraging synthetic and human-free data from the real world, PPMA effectively transfers learned representations to various action recognition tasks, minimizing the performance gap between models trained with and without human-centered data. The experiments underline the effectiveness of PPMA in promoting action recognition while ensuring privacy and mitigating ethical concerns and biases linked to conventional datasets.
Review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 33k+ ML SubReddit, 41k+ Facebook community, Discord Channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
Mahmoud is a PhD researcher in machine learning. He also owns a
Bachelor’s degree in Physical Sciences and Master’s degree in
telecommunications systems and networks. Your current areas of
The research concerns computer vision, stock market prediction and depth.
learning. He produced several scientific articles on the relationship of people.
identification and study of the robustness and stability of depths
networks.
<!– ai CONTENT END 2 –>