Papers
Sign in to view your remaining parses.
Tag Filter
Text-to-Image Generation
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Published:3/6/2024
High-Resolution Image SynthesisImproved Diffusion Modeling TechniquesText-to-Image GenerationBidirectional Information Flow ArchitectureNoise Sampling Technique Optimization
This study introduces a novel multimodal diffusion Transformer (MMDiT) architecture that utilizes rectified flow for enhanced highresolution image synthesis. Optimized noise sampling and bidirectional information flow improve text comprehension and user preference ratings, vali
02
Information to Users
Published:9/1/1989
Training-Free Acceleration MethodsLLM Security MechanismRobotic Action LearningMath Reasoning BenchmarksText-to-Image Generation
The paper examines concurrency control algorithms for realtime database systems, highlighting existing technical flaws and potential methods to enhance algorithm efficiency, contributing significantly to improving the reliability of realtime data processing.
02
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Published:4/13/2023
Text-to-Image GenerationHuman Preference Reward ModelReward Feedback LearningDiffusion Model OptimizationExpert Comparison Ratings
This study introduces ImageReward, a generalpurpose human preference reward model for texttoimage generation, trained on a systematic annotation process with 137,000 expert comparisons. It outperforms existing models and proposes Reward Feedback Learning (ReFL) for optimizing
02
Infinite-Story: A Training-Free Consistent Text-to-Image Generation
Published:11/17/2025
Text-to-Image GenerationTraining-Free Text-to-Image GenerationConsistent Generation FrameworkMulti-Prompt Storytelling ScenariosAutoregressive Modeling
InfiniteStory is a trainingfree framework for consistent texttoimage generation in multiprompt scenarios, addressing identity and style inconsistencies. With key techniques, it achieves stateoftheart performance and is over 6X faster during inference than existing models,
03
HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image Generation
Published:5/10/2025
Text-to-Image GenerationHierarchical Cross-Model AlignmentMultimodal GenerationMS-COCO DatasetDiffusion Models
The paper introduces the Hierarchical CrossModal Alignment (HCMA) framework, addressing the conflict between semantic fidelity and spatial control in texttoimage generation. HCMA combines global and local alignment modules to achieve highquality results in complex scenes, sur
02
Qwen-Image Technical Report
Published:8/4/2001
Text-to-Image GenerationImage Generation ModelImage Editing TechniquesDual-Encoding MechanismData Pipeline Optimization
QwenImage enhances text rendering and image editing using a comprehensive data pipeline and progressive training, while a dualencoding mechanism balances semantic consistency and fidelity, excelling particularly in Chinese text generation.
04
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Published:5/5/2025
Multimodal Understanding and Generation ModelsGenerative Adversarial ModelsFusion of Diffusion and Autoregressive ModelsText-to-Image GenerationMultimodal Datasets and Benchmarks
This paper provides a comprehensive survey on unified multimodal understanding and generation models, exploring the challenges posed by architectural differences between autoregressive and diffusion models. It highlights three main unified framework paradigms and offers tailored
07
FreeFuse: Multi-Subject LoRA Fusion via Auto Masking at Test Time
Published:10/28/2025
Multi-Subject LoRA FusionTraining-Free Fusion MethodCross-Attention Dynamic MaskingText-to-Image GenerationDiffusion Model Inference Optimization
FreeFuse enables multisubject LoRA fusion via automatic, contextaware masks from crossattention weights at test time, requiring no extra training or auxiliary models, improving multisubject texttoimage generation quality and usability over existing methods.
03
DreamAnime: Learning Style-Identity Textual Disentanglement for Anime and Beyond
Published:5/7/2024
Text-to-Image GenerationStyle-Identity DisentanglementAnime Character GenerationText Embedding Space LearningFew-Shot Concept Learning
DreamAnime disentangles style and identity into separate embeddings using fewshot images, enabling flexible anime character and style synthesis with superior concept fidelity and compositional creativity versus existing methods.
03