Papers

Sign in to view your remaining parses.
Tag Filter
video diffusion models
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Published:12/19/2024
video diffusion modelsRobotic Action LearningVideo Prediction PolicyDynamic Visual RepresentationsComplex Manipulation Tasks
The Video Prediction Policy (VPP) utilizes Video Diffusion Models (VDMs) to generate visual representations that incorporate both current static and predicted dynamic information, enhancing robot action learning and achieving a 31.6% increase in success rates for complex tasks.
02
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
Published:12/17/2025
Real-Time Interactive World ModelingLong-Term Geometric Consistencyvideo diffusion modelsMemory-Aware ModelingDynamic Context Reconstruction
This paper introduces WorldPlay, a video diffusion model for realtime interactive world modeling with longterm geometric consistency, achieved through three innovations: Dual Action Representation, Reconstituted Context Memory, and Context Forcing.
03
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
Published:6/11/2025
Autoregressive Adversarial Post-TrainingReal-time Video Generationvideo diffusion modelsInteractive Video GenerationLong Video Generation
The paper introduces Autoregressive Adversarial PostTraining (AAPT) to transform a pretrained latent video diffusion model into an efficient realtime interactive video generator. It generates one latent frame per evaluation, streams in real time, and responds to user interacti
03
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models
Published:12/2/2024
Customized Motion TransferMultimodal Large Language Modelvideo diffusion modelsMotion ModelingText-to-Video Generation
MoTrans introduces a customized motion transfer method using a multimodal large language model recaptioner and an appearance injection module, effectively transferring specific humancentric motions from reference videos to new contexts, outperforming existing techniques.
04
Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression
Published:12/4/2025
Training-Free Video GenerationVideo Extrapolation GenerationTemporal Attention MechanismImportance-Aware KV Cache Pruningvideo diffusion models
The paper presents the Deep Forcing mechanism that addresses temporal repetition, drift, and motion deceleration in autoregressive video diffusion. Using trainingfree Deep Sink and Participative Compression, it achieves over 12x extrapolation, enhancing video quality and consist
04
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
Published:9/4/2024
video diffusion modelsHigh-Fidelity Novel View SynthesisPoint-Based RepresentationCamera Trajectory Planning3D Reconstruction and Synthesis
This study introduces , a method that synthesizes highfidelity novel views from single or sparse images using video diffusion models, overcoming the dependence on dense multiview captures. It incorporates coarse 3D clues and camera pose control while featuring an i
02
StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation
Published:11/11/2025
video diffusion modelsReal-Time Interactive Video GenerationStreaming Content CreationLow-Latency Video GenerationMulti-GPU Real-Time Streaming Service
StreamDiffusionV2 is introduced as a streaming system for dynamic and interactive video generation, addressing temporal consistency and low latency issues in live streaming. It integrates SLOaware schedulers and other optimizations for trainingfree realtime service, enhancing
010
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Published:6/6/2025
Efficient Inference of Diffusion Modelsvideo diffusion modelsVideo RestorationOne-Step Video Restoration ModelAdaptive Window Attention Mechanism
SeedVR2 enables onestep highres video restoration using diffusion adversarial posttraining and adaptive window attention, enhancing quality and efficiency while stabilizing training with novel loss functions.
04
InfVSR: Breaking Length Limits of Generic Video Super-Resolution
Published:10/1/2025
Video Super-ResolutionAuto-Regressive Diffusion ModelLong-Sequence Video Processingvideo diffusion modelsTemporal Consistency Evaluation
InfVSR reformulates video superresolution as an autoregressive onestep diffusion model, enabling efficient, scalable processing of long videos with temporal consistency via rolling caches and patchwise supervision.
04
LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation
Published:10/9/2025
video diffusion modelsLinear Attention MechanismPost-Training Sparse Attention OptimizationEfficient Video GenerationDistribution Matching Objective
LinVideo is a datafree posttraining framework that selectively replaces selfattention with linear attention in video diffusion models, using anytime distribution matching to maintain performance and achieve up to 15.92× latency reduction and 1.25–2× speedup.
08