Page 19 - Paper Library - AiPaper

The paper introduces the Hierarchical CrossModal Alignment (HCMA) framework, addressing the conflict between semantic fidelity and spatial control in texttoimage generation. HCMA combines global and local alignment modules to achieve highquality results in complex scenes, sur

ATOMAS: Hierarchical Adaptive Alignment on Molecule-Text for Unified Molecule Understanding and Generation

Molecule-Text Cross-Modal Representation LearningHierarchical Adaptive Alignment ModelSMILES String Representation LearningMolecule Understanding and GenerationCross-Modal Fragment Correspondence Learning

Atomas is a hierarchical framework for joint learning of SMILES and text representations, using a Hierarchical Adaptive Alignment model to capture finegrained fragment correspondences. It demonstrates superior performance in various tasks, showcasing its effectiveness and versat

Emulating Human-like Adaptive Vision for Efficient and Flexible Machine Visual Perception

Published:9/19/2025

Adaptive Vision ModelsActive Visual PerceptionRepresentation Learning with Reinforcement LearningLarge-Scale Visual Recognition BenchmarksEfficient Inference

The paper presents , a framework that shifts machine vision from passive to active, adaptive perception. It formulates visual perception as a sequential decisionmaking process, significantly reducing inference costs (up to 28 times) while flexibly adapting to various

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Published:5/7/2023

Zero-Shot Chain-of-Thought EnhancementImprovement of LLM Reasoning AbilityPlan-and-Solve Prompting StrategyMulti-Step Reasoning TasksPS+ Prompting Extension

This paper introduces a PlanandSolve prompting strategy to enhance zeroshot reasoning in large language models. By breaking tasks into subtasks, it addresses missingstep errors and improves reasoning quality, showing significant performance gains across various datasets.

Knowledge-aware Diffusion-Enhanced Multimedia Recommendation

Published:7/22/2025

Knowledge-aware Diffusion Model for Recommendation SystemsGraph Neural Networks with Attention MechanismContrastive Learning for Multimedia RecommendationUser-Item Interaction Graph ModelingExperiments on Multimedia Datasets

The KDiffE framework enhances multimedia recommendation systems by integrating attentionaware useritem interactions within graph neural networks and employing a guided diffusion model to generate relevant knowledge graphs, significantly improving semantic information and perfor

Accurate and scalable exchange-correlation with deep learning

Published:6/17/2025

Deep Learning Exchange-Correlation FunctionalAtomization Energy Prediction at Chemical AccuracyData-Driven Approaches in Density Functional TheoryGeneration of High-Accuracy Reference DatasetsSkala Model

The paper introduces Skala, a deep learningbased exchangecorrelation functional that achieves chemical accuracy for atomization energies in small molecules while maintaining semilocal DFT's computational efficiency, leveraging extensive highaccuracy reference data.

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Published:12/3/2025

Multishot Video Generation FrameworkControllable Video GenerationApplication of RoPE Techniques in Video GenerationEnhanced Multi-Shot Narrative CapabilityAutomated Data Annotation Pipeline

The MultiShotMaster framework addresses current limitations in multishot narrative video generation by integrating two novel RoPE variants, enabling flexible shot arrangement and coherent storytelling, while an automated data annotation pipeline enhances controllability and outp

013

OmniDexGrasp: Generalizable Dexterous Grasping via Foundation Model and Force Feedback

Published:10/27/2025

Foundation Model-based Dexterous GraspingGrasping Tasks and Control StrategiesHuman Demonstration to Robot Action TransferForce-aware Adaptive Grasp StrategyGeneralizable Dexterous Robot Manipulation Framework

This paper introduces OmniDexGrasp, a framework that enhances generalizable dexterous grasping by integrating foundation models and force feedback. It includes modules for generating human grasp images, transferring human demonstrations to robot actions, and ensuring robust execu

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities

Published:3/19/2025

Self-Supervised Reinforcement LearningDeep Network ArchitectureGoal-Conditioned TasksUnsupervised Reinforcement Learning AlgorithmSimulated Environment Experiments

The paper examines scalability in selfsupervised reinforcement learning by significantly increasing network depth to 1024 layers, resulting in 250 times performance improvements in unsupervised goalconditioned tasks without rewards, showcasing enhanced capabilities.

I-FailSense: Towards General Robotic Failure Detection with Vision-Language Models

Published:9/19/2025

Vision-Language Model Failure DetectionSemantic Misalignment Error DetectionFailure Detection in Robotic ManipulationI-FailSense FrameworkOpen-World Robotic Applications

The IFailSense framework is presented for detecting failures in robotic manipulation, focusing on semantic misalignment errors. It builds datasets for detecting these failures and posttrains a VLM with classification heads at multiple layers, showing superior detection performa

RoboFail: Analyzing Failures in Robot Learning Policies

Published:12/4/2024

Robot Failure AnalysisDeep Reinforcement Learning FrameworkRobot Manipulation PoliciesFailure Mode Probability IdentificationRobotic Model Generalization

RoboFail is a deep reinforcement learning framework that uses a PPO agent to manipulate environment parameters, identify, and quantify failure modes in robot policies. Findings show small environmental changes significantly increase failure probabilities, aiding targeted training

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

Published:12/2/2025

LLM-guided Reinforcement Learning OptimizationMatrix Multiplication Performance OptimizationAutomated Optimization of HGEMM CUDA KernelsCUDA Execution Speed Improvement

The paper introduces CUDAL2, which utilizes Large Language Models and Reinforcement Learning to optimize HGEMM CUDA kernels, outperforming major baselines like torch.matmul and cuBLAS by over 11% across 1,000 configurations.

LLM-REDIAL: A Large-Scale Dataset for Conversational Recommender Systems Created from User Behaviors with LLMs

Published:8/1/2024

Conversational Recommender SystemsLarge-Scale Conversational Recommendation DatasetIntegration of User Behavior Data and Dialogue TemplatesMulti-Domain Conversational RecommendationLLM-Generated Dialogues

LLMREDIAL is a largescale dataset for conversational recommender systems, addressing limitations of existing datasets. It combines historical user behavior and dialogue templates generated by LLMs, featuring 47.6k multiturn dialogues with consistent semantics, validated by hum

Unifying Qualitative and Quantitative Safety Verification of DNN-Controlled Systems

Published:4/2/2024

Safety Verification for Deep Neural NetworksUnified Framework for Qualitative and Quantitative VerificationNeural Barrier Certificates SynthesisSafety in Reinforcement Learning SystemsStochastic Behavior Modeling

This paper presents a novel framework for unifying qualitative and quantitative safety verification of DNNcontrolled systems, addressing challenges posed by stochastic behavior. By synthesizing neural barrier certificates, it establishes almostsure safety guarantees and precise

Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs

Published:3/12/2025

LLM-based Recommendation SystemsPersonalized Recommendation Assistant BenchmarkRecommendation System Performance EvaluationComplex User Query HandlingLLM Capability Assessment

The paper introduces RecBench, a benchmark dataset assessing LLMs in handling complex personalized recommendation tasks, revealing that while LLMs show initial capabilities as assistants, they struggle with reasoning and misleading queries.

Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

Published:5/26/2025

Large Language Model Compression BenchmarkAgentic Capability AssessmentWorkflow Generation in Compressed ModelsLong-Context RetrievalQuantization and Pruning Techniques

The paper introduces the Agent Compression Benchmark (ACBench) to evaluate the impact of compression on LLMs' agentic capabilities across 12 tasks and 4 abilities. Results show 4bit quantization minimally affects workflow and tool use, but degrades realworld accuracy by about 1

020

Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning

Published:11/9/2025

Adaptive Humanoid ControlMulti-Behavior DistillationReinforced Fine-TuningHumanoid Locomotion SkillsMulti-Skill Controller

This paper introduces an Adaptive Humanoid Control (AHC) framework that learns adaptive locomotion controllers through multibehavior distillation and reinforced finetuning, showing strong adaptability across various skills and terrains.

Vision Bridge Transformer at Scale

Published:11/28/2001

Diffusion TransformerImage and Video Editing TasksLarge-Scale Data ProcessingBridge ModelsInput-to-Output Trajectory Modeling

The Vision Bridge Transformer (ViBT) introduces a largescale implementation of Brownian Bridge Models for efficient conditional generation, enhancing data translation by modeling inputoutput trajectories, achieving robust performance in largescale image and video editing tasks

MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion

Published:11/13/2025

Multimodal Urban Traffic ProfilingFrequency Domain Feature LearningVisual Augmentation for Traffic SignalsText Augmentation TechniquesHierarchical Contrastive Learning

This paper introduces MTP, a multimodal framework for urban traffic profiling that learns features from numeric, visual, and textual perspectives, overcoming the limitations of unimodal approaches, and demonstrating superior performance across six realworld datasets.

Q-BERT4Rec: Quantized Semantic-ID Representation Learning for Multimodal Recommendation

Published:12/2/2025

Multimodal Sequential Recommendation SystemsQ-BERT ModelSemantic Representation and Quantization LearningCross-Modal Feature FusionSequential Recommendation Optimization

QBERT4Rec is a multimodal sequential recommendation framework that integrates semantic representation and quantization modeling. It enhances generalization and interpretability through crossmodal semantic injection, semantic quantization, and multimask pretraining, achieving s

361 - 380 / 982

Papers