Page 3 - Paper Library - AiPaper

This paper surveys over 150 methods in personalized content synthesis (PCS) using diffusion models, categorizing them into testtime finetuning and pretrained adaptation frameworks, while addressing challenges like overfitting and proposing future research directions.

Qwen3 Technical Report

Published:5/14/2025

Large Language Model SeriesMixture-of-Expert ArchitectureDynamic Model SwitchingThinking Budget MechanismExpanded Multilingual Support

Qwen3 introduces a unified framework integrating thinking and nonthinking modes for dynamic switching, enhancing performance and multilingual support. It also features a thinking budget mechanism for adaptive resource allocation, expanding language capabilities from 29 to 119 la

UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding

Published:6/29/2025

Urban Intelligence Multi-Modal Large Language ModelUrban Instruction DatasetSpatial Reasoning EnhancementMulti-Stage Training FrameworkUrban Task Performance Evaluation

UrbanLLaVA is a multimodal language model designed for urban intelligence, processing four data types to enhance urban task performance. It leverages a diverse instruction dataset and a multistage training framework, achieving strong crosscity generalization.

RemoteCLIP: A Vision Language Foundation Model for Remote Sensing

Published:6/19/2023

Remote Sensing Vision-Language ModelSelf-Supervised Learning and Image ModelingMultitask Remote Sensing ApplicationsRemote Sensing Object CountingUnified Image-Text Data Format

RemoteCLIP is the first visionlanguage foundation model for remote sensing, overcoming limitations of existing models by integrating heterogeneous annotations into a unified imagetext format, resulting in a 12x larger pretraining dataset that enhances zeroshot and multitask a

A Survey on Remote Sensing Foundation Models: From Vision to Multimodality

Published:3/28/2025

Remote Sensing Foundation ModelsMultimodal Data FusionRemote Sensing Task AnalysisOptical and Radar DataLarge-Scale Annotated Datasets

This paper reviews advancements in remote sensing foundation models, highlighting vision and multimodal approaches that integrate diverse data types, enhancing geospatial data analysis. Despite significant improvements in task performance, challenges remain in data diversity, res

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Published:12/18/2025

Autoregressive Video Diffusion ModelsSelf-Resampling Training MethodLong-Horizon Generation CapabilityTemporal Causal MaskParameter-Free History Retrieval Mechanism

This paper presents a teacherfree selfresampling method for training autoregressive video diffusion models, addressing exposure bias and enabling efficient longhorizon generation with competitive performance and improved temporal consistency.

Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology

Published:6/6/2025

Multimodal Clinical Decision Support SystemsApplication of Vision Transformers in OncologyIntegration of Precision Oncology ToolsAutonomous AI Clinical AgentApplication of GPT-4 in Medical Decision Making

This study developed an autonomous AI agent that integrates GPT4 with multimodal precision oncology tools. Evaluated on 20 real cases, it achieved 87.5% tool accuracy and 91.0% correct conclusions, significantly boosting decision accuracy to 87.2%, laying the groundwork for pers

Memorize-and-Generate: Towards Long-Term Consistency in Real-Time Video Generation

Published:12/21/2025

Real-time Video GenerationVideo Generation FrameworkHistorical Memory RetentionMemory Compression and GenerationAutoregressive Modeling

The MAG framework decouples memory compression from frame generation to enhance longterm consistency in realtime video generation. It utilizes a dedicated memory model for compressing historical data and a generator model for frame synthesis, achieving improved scene consistenc

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

Published:10/1/2025

Humanoid Robot Loco-Manipulation Data GenerationInteraction-Preserving Data Generation EngineDynamic Motion RetargetingLong-Horizon Task Execution for RobotsMotion Capture Datasets

The paper presents OmniRetarget, an engine addressing the embodiment gap in humanoid robots by preserving key interactive relationships. It generates highquality trajectories for reinforcement learning through an interaction mesh, enabling complex task execution for durations up

Quantum Subgradient Estimation for Conditional Value-at-Risk Optimization

Published:10/6/2025

Conditional Value-at-Risk OptimizationQuantum Subgradient EstimationMonte Carlo Simulation ComplexityStochastic Projected Subgradient DescentAmplitude Estimation Quantum Algorithm

This study introduces a quantum subgradient oracle for Conditional ValueatRisk (CVaR) optimization, achieving

O(1/ε)

query complexity, a significant improvement over the

O(1/ε^2)

of classical Monte Carlo methods, with robustness demonstrated through simulations.

EvoLM: In Search of Lost Language Model Training Dynamics

Published:6/19/2025

Dynamics of Large Language Model TrainingSupervised Fine-Tuning and Reinforcement LearningImportance of Continued Pre-TrainingDimensionality Reduction and Generalization AnalysisLarge-Scale Language Model Suite

EvoLM offers a systematic analysis of language model training dynamics across pretraining, continued training, finetuning, and reinforcement learning. Key findings include diminishing returns from excessive training and the importance of continued training in bridging stages, w

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Published:5/12/2025

Sequence Policy OptimizationReinforcement Learning in Reasoning ModelsChain-of-Thought Generation Length ExtensionLLM Reasoning Capacity EnhancementTest-Time Scaling

This study introduces SGRPO, a novel reinforcement learning method that allows reasoning models to exit early during chainofthought generation, improving efficiency by evaluating intermediate reasoning steps and reducing redundancy compared to existing approaches.

A Survey of Controllable Learning: Methods and Applications in Information Retrieval

Published:7/4/2024

Survey of Controllable Learning MethodsApplications of Controllable Learning in Information RetrievalDynamic Target Adaptation StrategiesMulti-Objective Optimization MethodsUser Portrait and Scenario Adaptation

Controllable learning is essential for trustworthy machine learning, allowing dynamic adaptation to complex information needs. This survey defines controllable learning, explores its applications in information retrieval, identifies challenges, and suggests future research direct

LocationSpark: In-memory Distributed Spatial Query Processing and Optimization

Published:7/9/2019

Distributed Spatial Query ProcessingIn-Memory OptimizationQuery SchedulerSpatial Indexing TechniquesQuery Skew Handling

LocationSpark is a distributed inmemory system that addresses scalability in processing massive spatial data by introducing a query scheduler and a new spatial indexing technique, improving performance significantly.

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Published:12/14/2023

6D Object Pose Estimation and TrackingNeural Implicit RepresentationTransformer-based ArchitectureContrastive Learning MethodsLarge-Scale Synthetic Training

FoundationPose is a unified framework for 6D pose estimation and tracking, supporting both modelbased and modelfree setups. It uses largescale synthetic training and contrastive learning to efficiently handle novel objects without finetuning, outperforming existing specialize

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Published:12/2/2016

Point Cloud Classification and SegmentationDeep Learning on Point SetsPermutation InvarianceGeometric Data StructureNetwork Performance Evaluation

PointNet is the first deep learning architecture to directly process raw point clouds, avoiding data bloat from voxelization or image conversion. It respects permutation invariance and excels in object classification and segmentation, outperforming existing methods.

Generalizable Humanoid Manipulation with 3D Diffusion Policies

Published:10/15/2024

Humanoid Robot Manipulation3D Diffusion ModelsRobot Data Acquisition SystemAutonomous Operation in Dynamic EnvironmentsHuman-like Data Collection

This study introduces a humanoid manipulation system integrating a teleoperation control and improved 3D diffusion policy, enabling the robot to autonomously perform tasks in unfamiliar environments based on data from a single scene, overcoming previous training limitations.

NetLLM: Adapting Large Language Models for Networking

Published:2/4/2024

LLM Adaptation for Networking TasksMultimodal Data ProcessingAdaptive Bitrate StreamingNetworking Prediction and OptimizationLow-Cost Fine-Tuning Framework

This study introduces the NetLLM framework, which adapts large language models to efficiently solve networking tasks, reducing engineering costs and improving generalization. In three specific applications, NetLLM outperforms existing stateoftheart algorithms.

DiTFastAttn: Attention Compression for Diffusion Transformer Models

Published:6/13/2024

Attention Compression for Diffusion TransformersWindow Attention with Residual SharingAttention Sharing across TimestepsConditional Generation Redundancy SkippingHigh-Resolution Image Generation Acceleration

DiTFastAttn is presented as a posttraining compression method to address computational bottlenecks in Diffusion Transformers. It effectively reduces spatial, temporal, and conditional redundancies, achieving up to 76% reduction in attention FLOPs and 1.8x acceleration in generat

R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

Published:10/10/2025

Robotic Manipulation via Imitation LearningSpatially Generalized Data GenerationReal-to-Real 3D Data GenerationPoint Cloud Observation-Action Pair AugmentationComplex Multi-Object Task Handling

The R2RGen framework generates realtoreal 3D data to enhance robots' operational capabilities across varied spatial configurations. It augments point cloud observationaction pairs, improving generalization in complex multiobject tasks without simulator limitations.

41 - 60 / 981

Papers