Papers

Sign in to view your remaining parses.
Tag Filter
A Survey on Personalized Content Synthesis with Diffusion Models
Published:5/9/2024
Personalized Content SynthesisDiffusion ModelsTest-Time Fine-Tuning MethodsPre-Trained Adaptation MethodsObject Personalization
This paper surveys over 150 methods in personalized content synthesis (PCS) using diffusion models, categorizing them into testtime finetuning and pretrained adaptation frameworks, while addressing challenges like overfitting and proposing future research directions.
02
Qwen3 Technical Report
Published:5/14/2025
Large Language Model SeriesMixture-of-Expert ArchitectureDynamic Model SwitchingThinking Budget MechanismExpanded Multilingual Support
Qwen3 introduces a unified framework integrating thinking and nonthinking modes for dynamic switching, enhancing performance and multilingual support. It also features a thinking budget mechanism for adaptive resource allocation, expanding language capabilities from 29 to 119 la
02
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding
Published:6/29/2025
Urban Intelligence Multi-Modal Large Language ModelUrban Instruction DatasetSpatial Reasoning EnhancementMulti-Stage Training FrameworkUrban Task Performance Evaluation
UrbanLLaVA is a multimodal language model designed for urban intelligence, processing four data types to enhance urban task performance. It leverages a diverse instruction dataset and a multistage training framework, achieving strong crosscity generalization.
01
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
Published:6/19/2023
Remote Sensing Vision-Language ModelSelf-Supervised Learning and Image ModelingMultitask Remote Sensing ApplicationsRemote Sensing Object CountingUnified Image-Text Data Format
RemoteCLIP is the first visionlanguage foundation model for remote sensing, overcoming limitations of existing models by integrating heterogeneous annotations into a unified imagetext format, resulting in a 12x larger pretraining dataset that enhances zeroshot and multitask a
02
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Published:3/28/2025
Remote Sensing Foundation ModelsMultimodal Data FusionRemote Sensing Task AnalysisOptical and Radar DataLarge-Scale Annotated Datasets
This paper reviews advancements in remote sensing foundation models, highlighting vision and multimodal approaches that integrate diverse data types, enhancing geospatial data analysis. Despite significant improvements in task performance, challenges remain in data diversity, res
08
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling
Published:12/18/2025
Autoregressive Video Diffusion ModelsSelf-Resampling Training MethodLong-Horizon Generation CapabilityTemporal Causal MaskParameter-Free History Retrieval Mechanism
This paper presents a teacherfree selfresampling method for training autoregressive video diffusion models, addressing exposure bias and enabling efficient longhorizon generation with competitive performance and improved temporal consistency.
04
Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology
Published:6/6/2025
Multimodal Clinical Decision Support SystemsApplication of Vision Transformers in OncologyIntegration of Precision Oncology ToolsAutonomous AI Clinical AgentApplication of GPT-4 in Medical Decision Making
This study developed an autonomous AI agent that integrates GPT4 with multimodal precision oncology tools. Evaluated on 20 real cases, it achieved 87.5% tool accuracy and 91.0% correct conclusions, significantly boosting decision accuracy to 87.2%, laying the groundwork for pers
02
Memorize-and-Generate: Towards Long-Term Consistency in Real-Time Video Generation
Published:12/21/2025
Real-time Video GenerationVideo Generation FrameworkHistorical Memory RetentionMemory Compression and GenerationAutoregressive Modeling
The MAG framework decouples memory compression from frame generation to enhance longterm consistency in realtime video generation. It utilizes a dedicated memory model for compressing historical data and a generator model for frame synthesis, achieving improved scene consistenc
03
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
Published:10/1/2025
Humanoid Robot Loco-Manipulation Data GenerationInteraction-Preserving Data Generation EngineDynamic Motion RetargetingLong-Horizon Task Execution for RobotsMotion Capture Datasets
The paper presents OmniRetarget, an engine addressing the embodiment gap in humanoid robots by preserving key interactive relationships. It generates highquality trajectories for reinforcement learning through an interaction mesh, enabling complex task execution for durations up
03
Quantum Subgradient Estimation for Conditional Value-at-Risk Optimization
Published:10/6/2025
Conditional Value-at-Risk OptimizationQuantum Subgradient EstimationMonte Carlo Simulation ComplexityStochastic Projected Subgradient DescentAmplitude Estimation Quantum Algorithm
This study introduces a quantum subgradient oracle for Conditional ValueatRisk (CVaR) optimization, achieving O(1/ε)O(1/ε) query complexity, a significant improvement over the O(1/ε2)O(1/ε^2) of classical Monte Carlo methods, with robustness demonstrated through simulations.
02
EvoLM: In Search of Lost Language Model Training Dynamics
Published:6/19/2025
Dynamics of Large Language Model TrainingSupervised Fine-Tuning and Reinforcement LearningImportance of Continued Pre-TrainingDimensionality Reduction and Generalization AnalysisLarge-Scale Language Model Suite
EvoLM offers a systematic analysis of language model training dynamics across pretraining, continued training, finetuning, and reinforcement learning. Key findings include diminishing returns from excessive training and the importance of continued training in bridging stages, w
03
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
Published:5/12/2025
Sequence Policy OptimizationReinforcement Learning in Reasoning ModelsChain-of-Thought Generation Length ExtensionLLM Reasoning Capacity EnhancementTest-Time Scaling
This study introduces SGRPO, a novel reinforcement learning method that allows reasoning models to exit early during chainofthought generation, improving efficiency by evaluating intermediate reasoning steps and reducing redundancy compared to existing approaches.
02
A Survey of Controllable Learning: Methods and Applications in Information Retrieval
Published:7/4/2024
Survey of Controllable Learning MethodsApplications of Controllable Learning in Information RetrievalDynamic Target Adaptation StrategiesMulti-Objective Optimization MethodsUser Portrait and Scenario Adaptation
Controllable learning is essential for trustworthy machine learning, allowing dynamic adaptation to complex information needs. This survey defines controllable learning, explores its applications in information retrieval, identifies challenges, and suggests future research direct
02
LocationSpark: In-memory Distributed Spatial Query Processing and Optimization
Published:7/9/2019
Distributed Spatial Query ProcessingIn-Memory OptimizationQuery SchedulerSpatial Indexing TechniquesQuery Skew Handling
LocationSpark is a distributed inmemory system that addresses scalability in processing massive spatial data by introducing a query scheduler and a new spatial indexing technique, improving performance significantly.
02
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Published:12/14/2023
6D Object Pose Estimation and TrackingNeural Implicit RepresentationTransformer-based ArchitectureContrastive Learning MethodsLarge-Scale Synthetic Training
FoundationPose is a unified framework for 6D pose estimation and tracking, supporting both modelbased and modelfree setups. It uses largescale synthetic training and contrastive learning to efficiently handle novel objects without finetuning, outperforming existing specialize
03
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
Published:12/2/2016
Point Cloud Classification and SegmentationDeep Learning on Point SetsPermutation InvarianceGeometric Data StructureNetwork Performance Evaluation
PointNet is the first deep learning architecture to directly process raw point clouds, avoiding data bloat from voxelization or image conversion. It respects permutation invariance and excels in object classification and segmentation, outperforming existing methods.
04
Generalizable Humanoid Manipulation with 3D Diffusion Policies
Published:10/15/2024
Humanoid Robot Manipulation3D Diffusion ModelsRobot Data Acquisition SystemAutonomous Operation in Dynamic EnvironmentsHuman-like Data Collection
This study introduces a humanoid manipulation system integrating a teleoperation control and improved 3D diffusion policy, enabling the robot to autonomously perform tasks in unfamiliar environments based on data from a single scene, overcoming previous training limitations.
02
NetLLM: Adapting Large Language Models for Networking
Published:2/4/2024
LLM Adaptation for Networking TasksMultimodal Data ProcessingAdaptive Bitrate StreamingNetworking Prediction and OptimizationLow-Cost Fine-Tuning Framework
This study introduces the NetLLM framework, which adapts large language models to efficiently solve networking tasks, reducing engineering costs and improving generalization. In three specific applications, NetLLM outperforms existing stateoftheart algorithms.
04
DiTFastAttn: Attention Compression for Diffusion Transformer Models
Published:6/13/2024
Attention Compression for Diffusion TransformersWindow Attention with Residual SharingAttention Sharing across TimestepsConditional Generation Redundancy SkippingHigh-Resolution Image Generation Acceleration
DiTFastAttn is presented as a posttraining compression method to address computational bottlenecks in Diffusion Transformers. It effectively reduces spatial, temporal, and conditional redundancies, achieving up to 76% reduction in attention FLOPs and 1.8x acceleration in generat
02
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
Published:10/10/2025
Robotic Manipulation via Imitation LearningSpatially Generalized Data GenerationReal-to-Real 3D Data GenerationPoint Cloud Observation-Action Pair AugmentationComplex Multi-Object Task Handling
The R2RGen framework generates realtoreal 3D data to enhance robots' operational capabilities across varied spatial configurations. It augments point cloud observationaction pairs, improving generalization in complex multiobject tasks without simulator limitations.
03