My Video Stays Mine: Temporally Consistent Universal Adversarial Perturbations against Video Customization

1Sydney AI Centre, The University of Sydney, 2University of Melbourne,
3City University of Hong Kong, 4Wuhan University,
5Mohamed bin Zayed University of Artificial Intelligence
TC-UAP teaser

A malicious attacker may collect a user's public videos and use them to customize new videos for harmful purposes. Our method TC-UAP applies imperceptible perturbations to the user's videos before releasing, making any downstream customization pipelines fail to produce usable results, protecting the user from that threat.

Abstract

Recent diffusion-based video generation models have enabled high-quality personalized video customization through both tuning-based pipelines, which fine-tune a video diffusion model, and reference-based pipelines such as image-to-video generation. However, these capabilities raise serious concerns about personal privacy, identity ownership and intellectual property protection. Existing anti-diffusion protections focus on the image domain or on reference-based I2V pipelines, leaving the tuning-based video customization unexplored. Protecting videos in this setting raises three challenges: (i) Image-level perturbations, optimized frame by frame, are vulnerable in the video domain. (ii) A perturbation optimized on a single video fails to generalize to other videos or to videos of different length. (iii) Temporally inconsistent perturbations are easily removed by temporal attacks. To address these challenges, we propose Temporally Consistent Universal Adversarial Perturbations (TC-UAP), the first protection method against both reference- and tuning-based video customization. TC-UAP learns a multi-frame universal adversarial perturbation over a set of videos of the same identity, so that a single perturbation can transfer to unseen videos and arbitrary video lengths of that identity. Besides, we further enforce consistency through intrinsic temporal modeling and an extrinsic surrogate temporal-attack loss, ensuring robustness against temporal attacks. Extensive quantitative and qualitative experiments show that TC-UAP degrades identity preservation more than existing baselines under both tuning-based and reference-based video customization, and remains robust under three unseen temporal attacks.

More Results

Each row shows the customization results when the videos are protected by different methods.

Text prompt: “The p3r5on is licking an ice-cream cone, smiling between licks while facing the camera.”

cleanPhotoGuardMistIDProtectorOurs
cleanPhotoGuardMistIDProtectorOurs
cleanPhotoGuardMistIDProtectorOurs
cleanPhotoGuardMistIDProtectorOurs
cleanPhotoGuardMistIDProtectorOurs
cleanPhotoGuardMistIDProtectorOurs