Tags: Diffusion Models - Paper Library

ADriver-I: A General World Model for Autonomous Driving

Published:11/23/2023

Driving World ModelsMultimodal Large Language ModelVision-Language-Action ModelDiffusion ModelsnuScenes Dataset

ADriverI integrates visionaction pairs using MLLM and diffusion models to autoregressively predict control signals and future scenes, enabling iterative autonomous driving and significantly improving performance.

04

A Survey on Generative Recommendation: Data, Model, and Tasks

Published:10/31/2025

Generative Recommendation SystemsLarge Language Model Fine-TuningDiffusion ModelsMultimodal Large Language ModelLLM-based Recommendation Systems

This survey reviews generative recommendation via a unified framework, analyzing data augmentation, model alignment, and task design, highlighting innovations in large language and diffusion models that enable knowledge integration, natural language understanding, and personalize

011

dKV-Cache: The Cache for Diffusion Language Models

Published:5/22/2025

Diffusion ModelsDiffusion Language ModelsKV-Cache MechanismInference AccelerationNon-Autoregressive Architecture Optimization

dKVCache introduces a delayed KVcache for diffusion language models, enabling 210× faster inference by conditionally caching keyvalue states. Two variants balance speed and performance, revealing underutilized context and narrowing efficiency gaps with autoregressive models.

011

Effective Diffusion Transformer Architecture for Image Super-Resolution

Published:9/29/2024

Diffusion ModelsImage Super-resolutionDiffusion TransformerMulti-Scale Hierarchical Feature ExtractionFrequency-Adaptive Time-Step Conditioning Module

DiTSR introduces a Ushaped diffusion transformer with frequencyadaptive conditioning, enhancing multiscale feature extraction and resource allocation, achieving superior superresolution without pretraining compared to priorbased methods.

06

DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution

Published:3/31/2025

Diffusion ModelsDiffusion TransformerReal-World Image Super-ResolutionLow-Resolution Image Embedding InteractionCross-Stream Convolution Layer Design

DiT4SR integrates LR embeddings into the diffusion transformer's attention for bidirectional latent interaction and uses crossstream convolution to enhance local detail, achieving superior realworld image superresolution performance with diffusion transformers.

09

Papers