Tags: video diffusion models - Paper Library

解析模型

Email me when analysis completesPick favorite folders after submittingKeep analysis private from users who haven't submitted this paper (still saved as your default analysis)

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

Published:12/19/2024

video diffusion modelsRobotic Action LearningVideo Prediction PolicyDynamic Visual RepresentationsComplex Manipulation Tasks

The Video Prediction Policy (VPP) utilizes Video Diffusion Models (VDMs) to generate visual representations that incorporate both current static and predicted dynamic information, enhancing robot action learning and achieving a 31.6% increase in success rates for complex tasks.

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Published:12/17/2025

Real-Time Interactive World ModelingLong-Term Geometric Consistencyvideo diffusion modelsMemory-Aware ModelingDynamic Context Reconstruction

This paper introduces WorldPlay, a video diffusion model for realtime interactive world modeling with longterm geometric consistency, achieved through three innovations: Dual Action Representation, Reconstituted Context Memory, and Context Forcing.

Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation

Published:6/11/2025

Autoregressive Adversarial Post-TrainingReal-time Video Generationvideo diffusion modelsInteractive Video GenerationLong Video Generation

The paper introduces Autoregressive Adversarial PostTraining (AAPT) to transform a pretrained latent video diffusion model into an efficient realtime interactive video generator. It generates one latent frame per evaluation, streams in real time, and responds to user interacti

MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models

Published:12/2/2024

Customized Motion TransferMultimodal Large Language Modelvideo diffusion modelsMotion ModelingText-to-Video Generation

MoTrans introduces a customized motion transfer method using a multimodal large language model recaptioner and an appearance injection module, effectively transferring specific humancentric motions from reference videos to new contexts, outperforming existing techniques.

Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

Published:12/4/2025

Training-Free Video GenerationVideo Extrapolation GenerationTemporal Attention MechanismImportance-Aware KV Cache Pruningvideo diffusion models

The paper presents the Deep Forcing mechanism that addresses temporal repetition, drift, and motion deceleration in autoregressive video diffusion. Using trainingfree Deep Sink and Participative Compression, it achieves over 12x extrapolation, enhancing video quality and consist

ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

Published:9/4/2024

video diffusion modelsHigh-Fidelity Novel View SynthesisPoint-Based RepresentationCamera Trajectory Planning3D Reconstruction and Synthesis

This study introduces , a method that synthesizes highfidelity novel views from single or sparse images using video diffusion models, overcoming the dependence on dense multiview captures. It incorporates coarse 3D clues and camera pose control while featuring an i

StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

Published:11/11/2025

video diffusion modelsReal-Time Interactive Video GenerationStreaming Content CreationLow-Latency Video GenerationMulti-GPU Real-Time Streaming Service

StreamDiffusionV2 is introduced as a streaming system for dynamic and interactive video generation, addressing temporal consistency and low latency issues in live streaming. It integrates SLOaware schedulers and other optimizations for trainingfree realtime service, enhancing

010

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

Published:6/6/2025

Efficient Inference of Diffusion Modelsvideo diffusion modelsVideo RestorationOne-Step Video Restoration ModelAdaptive Window Attention Mechanism

SeedVR2 enables onestep highres video restoration using diffusion adversarial posttraining and adaptive window attention, enhancing quality and efficiency while stabilizing training with novel loss functions.

InfVSR: Breaking Length Limits of Generic Video Super-Resolution

Published:10/1/2025

Video Super-ResolutionAuto-Regressive Diffusion ModelLong-Sequence Video Processingvideo diffusion modelsTemporal Consistency Evaluation

InfVSR reformulates video superresolution as an autoregressive onestep diffusion model, enabling efficient, scalable processing of long videos with temporal consistency via rolling caches and patchwise supervision.

LinVideo: A Post-Training Framework towards O(n) Attention in Efficient Video Generation

Published:10/9/2025

video diffusion modelsLinear Attention MechanismPost-Training Sparse Attention OptimizationEfficient Video GenerationDistribution Matching Objective

LinVideo is a datafree posttraining framework that selectively replaces selfattention with linear attention in video diffusion models, using anytime distribution matching to maintain performance and achieve up to 15.92× latency reduction and 1.25–2× speedup.

1 - 10 / 10

Papers