Papers

Sign in to view your remaining parses.
Tag Filter
HCMA: Hierarchical Cross-model Alignment for Grounded Text-to-Image Generation
Published:5/10/2025
Text-to-Image GenerationHierarchical Cross-Model AlignmentMultimodal GenerationMS-COCO DatasetDiffusion Models
The paper introduces the Hierarchical CrossModal Alignment (HCMA) framework, addressing the conflict between semantic fidelity and spatial control in texttoimage generation. HCMA combines global and local alignment modules to achieve highquality results in complex scenes, sur
02
ATOMAS: Hierarchical Adaptive Alignment on Molecule-Text for Unified Molecule Understanding and Generation
Molecule-Text Cross-Modal Representation LearningHierarchical Adaptive Alignment ModelSMILES String Representation LearningMolecule Understanding and GenerationCross-Modal Fragment Correspondence Learning
Atomas is a hierarchical framework for joint learning of SMILES and text representations, using a Hierarchical Adaptive Alignment model to capture finegrained fragment correspondences. It demonstrates superior performance in various tasks, showcasing its effectiveness and versat
02
Emulating Human-like Adaptive Vision for Efficient and Flexible Machine Visual Perception
Published:9/19/2025
Adaptive Vision ModelsActive Visual PerceptionRepresentation Learning with Reinforcement LearningLarge-Scale Visual Recognition BenchmarksEfficient Inference
The paper presents , a framework that shifts machine vision from passive to active, adaptive perception. It formulates visual perception as a sequential decisionmaking process, significantly reducing inference costs (up to 28 times) while flexibly adapting to various
16
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Published:5/7/2023
Zero-Shot Chain-of-Thought EnhancementImprovement of LLM Reasoning AbilityPlan-and-Solve Prompting StrategyMulti-Step Reasoning TasksPS+ Prompting Extension
This paper introduces a PlanandSolve prompting strategy to enhance zeroshot reasoning in large language models. By breaking tasks into subtasks, it addresses missingstep errors and improves reasoning quality, showing significant performance gains across various datasets.
04
Knowledge-aware Diffusion-Enhanced Multimedia Recommendation
Published:7/22/2025
Knowledge-aware Diffusion Model for Recommendation SystemsGraph Neural Networks with Attention MechanismContrastive Learning for Multimedia RecommendationUser-Item Interaction Graph ModelingExperiments on Multimedia Datasets
The KDiffE framework enhances multimedia recommendation systems by integrating attentionaware useritem interactions within graph neural networks and employing a guided diffusion model to generate relevant knowledge graphs, significantly improving semantic information and perfor
02
Accurate and scalable exchange-correlation with deep learning
Published:6/17/2025
Deep Learning Exchange-Correlation FunctionalAtomization Energy Prediction at Chemical AccuracyData-Driven Approaches in Density Functional TheoryGeneration of High-Accuracy Reference DatasetsSkala Model
The paper introduces Skala, a deep learningbased exchangecorrelation functional that achieves chemical accuracy for atomization energies in small molecules while maintaining semilocal DFT's computational efficiency, leveraging extensive highaccuracy reference data.
02
MultiShotMaster: A Controllable Multi-Shot Video Generation Framework
Published:12/3/2025
Multishot Video Generation FrameworkControllable Video GenerationApplication of RoPE Techniques in Video GenerationEnhanced Multi-Shot Narrative CapabilityAutomated Data Annotation Pipeline
The MultiShotMaster framework addresses current limitations in multishot narrative video generation by integrating two novel RoPE variants, enabling flexible shot arrangement and coherent storytelling, while an automated data annotation pipeline enhances controllability and outp
013
OmniDexGrasp: Generalizable Dexterous Grasping via Foundation Model and Force Feedback
Published:10/27/2025
Foundation Model-based Dexterous GraspingGrasping Tasks and Control StrategiesHuman Demonstration to Robot Action TransferForce-aware Adaptive Grasp StrategyGeneralizable Dexterous Robot Manipulation Framework
This paper introduces OmniDexGrasp, a framework that enhances generalizable dexterous grasping by integrating foundation models and force feedback. It includes modules for generating human grasp images, transferring human demonstrations to robot actions, and ensuring robust execu
06
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
Published:3/19/2025
Self-Supervised Reinforcement LearningDeep Network ArchitectureGoal-Conditioned TasksUnsupervised Reinforcement Learning AlgorithmSimulated Environment Experiments
The paper examines scalability in selfsupervised reinforcement learning by significantly increasing network depth to 1024 layers, resulting in 250 times performance improvements in unsupervised goalconditioned tasks without rewards, showcasing enhanced capabilities.
05
I-FailSense: Towards General Robotic Failure Detection with Vision-Language Models
Published:9/19/2025
Vision-Language Model Failure DetectionSemantic Misalignment Error DetectionFailure Detection in Robotic ManipulationI-FailSense FrameworkOpen-World Robotic Applications
The IFailSense framework is presented for detecting failures in robotic manipulation, focusing on semantic misalignment errors. It builds datasets for detecting these failures and posttrains a VLM with classification heads at multiple layers, showing superior detection performa
02
RoboFail: Analyzing Failures in Robot Learning Policies
Published:12/4/2024
Robot Failure AnalysisDeep Reinforcement Learning FrameworkRobot Manipulation PoliciesFailure Mode Probability IdentificationRobotic Model Generalization
RoboFail is a deep reinforcement learning framework that uses a PPO agent to manipulate environment parameters, identify, and quantify failure modes in robot policies. Findings show small environmental changes significantly increase failure probabilities, aiding targeted training
01
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
Published:12/2/2025
LLM-guided Reinforcement Learning OptimizationMatrix Multiplication Performance OptimizationAutomated Optimization of HGEMM CUDA KernelsCUDA Execution Speed Improvement
The paper introduces CUDAL2, which utilizes Large Language Models and Reinforcement Learning to optimize HGEMM CUDA kernels, outperforming major baselines like torch.matmul and cuBLAS by over 11% across 1,000 configurations.
04
LLM-REDIAL: A Large-Scale Dataset for Conversational Recommender Systems Created from User Behaviors with LLMs
Published:8/1/2024
Conversational Recommender SystemsLarge-Scale Conversational Recommendation DatasetIntegration of User Behavior Data and Dialogue TemplatesMulti-Domain Conversational RecommendationLLM-Generated Dialogues
LLMREDIAL is a largescale dataset for conversational recommender systems, addressing limitations of existing datasets. It combines historical user behavior and dialogue templates generated by LLMs, featuring 47.6k multiturn dialogues with consistent semantics, validated by hum
01
Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems
Published:4/2/2024
Safety Verification for Deep Neural NetworksUnified Framework for Qualitative and Quantitative VerificationNeural Barrier Certificates SynthesisSafety in Reinforcement Learning SystemsStochastic Behavior Modeling
This paper presents a novel framework for unifying qualitative and quantitative safety verification of DNNcontrolled systems, addressing challenges posed by stochastic behavior. By synthesizing neural barrier certificates, it establishes almostsure safety guarantees and precise
03
Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs
Published:3/12/2025
LLM-based Recommendation SystemsPersonalized Recommendation Assistant BenchmarkRecommendation System Performance EvaluationComplex User Query HandlingLLM Capability Assessment
The paper introduces RecBench, a benchmark dataset assessing LLMs in handling complex personalized recommendation tasks, revealing that while LLMs show initial capabilities as assistants, they struggle with reasoning and misleading queries.
01
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression
Published:5/26/2025
Large Language Model Compression BenchmarkAgentic Capability AssessmentWorkflow Generation in Compressed ModelsLong-Context RetrievalQuantization and Pruning Techniques
The paper introduces the Agent Compression Benchmark (ACBench) to evaluate the impact of compression on LLMs' agentic capabilities across 12 tasks and 4 abilities. Results show 4bit quantization minimally affects workflow and tool use, but degrades realworld accuracy by about 1
020
Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning
Published:11/9/2025
Adaptive Humanoid ControlMulti-Behavior DistillationReinforced Fine-TuningHumanoid Locomotion SkillsMulti-Skill Controller
This paper introduces an Adaptive Humanoid Control (AHC) framework that learns adaptive locomotion controllers through multibehavior distillation and reinforced finetuning, showing strong adaptability across various skills and terrains.
02
Vision Bridge Transformer at Scale
Published:11/28/2001
Diffusion TransformerImage and Video Editing TasksLarge-Scale Data ProcessingBridge ModelsInput-to-Output Trajectory Modeling
The Vision Bridge Transformer (ViBT) introduces a largescale implementation of Brownian Bridge Models for efficient conditional generation, enhancing data translation by modeling inputoutput trajectories, achieving robust performance in largescale image and video editing tasks
02
MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion
Published:11/13/2025
Multimodal Urban Traffic ProfilingFrequency Domain Feature LearningVisual Augmentation for Traffic SignalsText Augmentation TechniquesHierarchical Contrastive Learning
This paper introduces MTP, a multimodal framework for urban traffic profiling that learns features from numeric, visual, and textual perspectives, overcoming the limitations of unimodal approaches, and demonstrating superior performance across six realworld datasets.
09
Q-BERT4Rec: Quantized Semantic-ID Representation Learning for Multimodal Recommendation
Published:12/2/2025
Multimodal Sequential Recommendation SystemsQ-BERT ModelSemantic Representation and Quantization LearningCross-Modal Feature FusionSequential Recommendation Optimization
QBERT4Rec is a multimodal sequential recommendation framework that integrates semantic representation and quantization modeling. It enhances generalization and interpretability through crossmodal semantic injection, semantic quantization, and multimask pretraining, achieving s
04