Papers

Sign in to view your remaining parses.
Tag Filter
Diffusion Models
ADriver-I: A General World Model for Autonomous Driving
Published:11/23/2023
Driving World ModelsMultimodal Large Language ModelVision-Language-Action ModelDiffusion ModelsnuScenes Dataset
ADriverI integrates visionaction pairs using MLLM and diffusion models to autoregressively predict control signals and future scenes, enabling iterative autonomous driving and significantly improving performance.
04
A Survey on Generative Recommendation: Data, Model, and Tasks
Published:10/31/2025
Generative Recommendation SystemsLarge Language Model Fine-TuningDiffusion ModelsMultimodal Large Language ModelLLM-based Recommendation Systems
This survey reviews generative recommendation via a unified framework, analyzing data augmentation, model alignment, and task design, highlighting innovations in large language and diffusion models that enable knowledge integration, natural language understanding, and personalize
011
dKV-Cache: The Cache for Diffusion Language Models
Published:5/22/2025
Diffusion ModelsDiffusion Language ModelsKV-Cache MechanismInference AccelerationNon-Autoregressive Architecture Optimization
dKVCache introduces a delayed KVcache for diffusion language models, enabling 210× faster inference by conditionally caching keyvalue states. Two variants balance speed and performance, revealing underutilized context and narrowing efficiency gaps with autoregressive models.
011
Effective Diffusion Transformer Architecture for Image Super-Resolution
Published:9/29/2024
Diffusion ModelsImage Super-resolutionDiffusion TransformerMulti-Scale Hierarchical Feature ExtractionFrequency-Adaptive Time-Step Conditioning Module
DiTSR introduces a Ushaped diffusion transformer with frequencyadaptive conditioning, enhancing multiscale feature extraction and resource allocation, achieving superior superresolution without pretraining compared to priorbased methods.
06
DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
Published:3/31/2025
Diffusion ModelsDiffusion TransformerReal-World Image Super-ResolutionLow-Resolution Image Embedding InteractionCross-Stream Convolution Layer Design
DiT4SR integrates LR embeddings into the diffusion transformer's attention for bidirectional latent interaction and uses crossstream convolution to enhance local detail, achieving superior realworld image superresolution performance with diffusion transformers.
09