Page 25 - Paper Library - AiPaper

NAGphormer is proposed to address the computational complexity of existing graph Transformers for node classification in large graphs. By introducing the Hop2Token module, it aggregates multihop neighborhood features as sequences, enhancing node representation effectiveness and

Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image

Published:9/14/2021

Single-Stage Keypoint-Based Object Pose EstimationCategory-Level Object Pose Estimation from RGB Images6-DoF Pose Estimation for Unknown InstancesInformation Propagation using convGRUObjectron Benchmark

This paper introduces a singlestage, keypointbased method for categorylevel 6DoF object pose estimation from a single RGB image. It leverages convGRU for effective information propagation, achieving superior results on the Objectron benchmark compared to stateoftheart meth

Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects

Published:9/28/2018

Deep Object Pose EstimationSynthetic Data in Robotic Manipulation6-DoF Pose EstimationDomain Randomization and Real ImagesReal-Time Object Pose Estimation System

The DOPE system uses synthetic data to train a deep network for 6DoF pose estimation, effectively bridging the reality gap. It achieves stateoftheart performance in realworld applications, enabling effective robotic grasping.

K*-Means: A Parameter-free Clustering Algorithm

Published:5/17/2025

Parameter-Free Clustering AlgorithmMinimum Description Length PrincipleK*-Means ClusteringClustering Optimization Algorithm

The kmeans algorithm introduced here is parameterfree, using the minimum description length principle to automatically determine the optimal number of clusters. It outperforms existing methods in scenarios with unknown k and demonstrates competitive runtime and scalability.

Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)

Recommendation as Language ProcessingUnified Pretraining for RecommendationPersonalized Prompt GenerationKnowledge Transfer in Recommendation SystemsEnhanced Task Generalization

The paper introduces P5, a unified framework that reformulates recommendation tasks as language processing problems using natural language sequences. It leverages personalized prompts and a Transformerbased architecture to enable shared training across multiple tasks, demonstrat

Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization

Published:10/29/2025

Fine-Tuning of Vision-Language-Action ModelsVisual Representation Alignment MethodsOut-of-Distribution Generalization CapabilityAnalysis of Vision-Language Model PerformanceRetention of Visual Action Knowledge

This paper investigates how finetuning VisionLanguageAction (VLA) models degrades visual representations. It reveals that naive finetuning harms visual knowledge, affecting performance in outofdistribution scenarios. A visual representation alignment method is introduced to

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Published:1/16/2025

Survey on Agentic RAG ArchitecturesAdaptive Retrieval-Augmented GenerationMulti-Agent Collaboration StrategiesDynamic Task ManagementReal-Time Data Retrieval and Context Understanding

Agentic RetrievalAugmented Generation (RAG) enhances traditional RAG by embedding autonomous AI agents, overcoming limitations in flexibility and contextawareness. This survey reviews its principles, taxonomy, applications in healthcare, finance, and education, and addresses ch

ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency

Published:4/11/2025

RAW-to-sRGB MappingTexture-Aware Diffusion ModelsHistogram-Guided Color ConsistencyImage Signal Processing SimulationSmartphone Sensor Image Processing

ISPDiffuser introduces a framework using textureaware diffusion models for RAWtosRGB mapping, addressing detail and color consistency issues. It incorporates texture enrichment loss and a histogramguided color consistency module, outperforming existing methods in quantitative

DiffRAW: Leveraging Diffusion Model to Generate DSLR-Comparable Perceptual Quality sRGB from Smartphone RAW Images

Published:3/24/2024

Diffusion Model-based Image GenerationSmartphone RAW Image ProcessingDSLR Image Quality EnhancementPerceptual Quality OptimizationImage Alignment and Mapping

DiffRAW is a novel method that utilizes diffusion models to transform smartphone RAW images into sRGB with DSLRquality perception, enhancing detail while maintaining structural integrity and color alignment, achieving stateoftheart performance across various evaluation metric

Recommender Systems in the Era of Large Language Models (LLMs)

Published:7/5/2023

LLM-based Recommendation SystemsLarge Language Model Fine-TuningGenerative Recommendation SystemsPre-training and Fine-tuning in Recommender SystemsPrompting Methods for Large Language Models

This paper reviews techniques for enhancing recommender systems using Large Language Models (LLMs), focusing on pretraining, finetuning, and prompting. It highlights LLMs' potential in feature encoding and their future applications in recommender system research.

What Matters to Student Success: A Review of the Literature

Published:1/1/2006

Factors Influencing Student SuccessHigher Education Student ExperienceStudent Engagement FrameworkAlignment of Educational Policy with Student NeedsStudent Retention and Achievement

This report reviews factors affecting student success in higher education, identifying key themes and effective practices for retention and achievement. It highlights the significance of institutional support, student engagement, and aligning educational policy with student needs

Can LLMs Address Mental Health Questions? A Comparison with Human Therapists

Published:9/16/2025

LLM-based Mental Health Question AnsweringComparison Study between Human Therapists and LLMsEmotional and Readability AnalysisHuman-Computer Interaction in Mental HealthLimitations of LLMs in Mental Health

This study compares responses generated by large language models (LLMs) and human therapists in mental health contexts. LLMs provided longer, clearer, and more positive answers, yet users preferred human therapist support, highlighting both the potential and limitations of LLMs i

014

Digital Image Noise Estimation Using DWT Coefficients

Published:1/1/2021

Image Noise Estimation Using Discrete Wavelet TransformDigital Image ProcessingGaussian Noise Strength EstimationImage Denoising Applications

This study presents a novel hybrid algorithm combining Discrete Wavelet Transform (DWT) and edge information removal to accurately estimate Gaussian noise strength in digital images, demonstrating significant performance improvements over existing methods.

AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

Published:3/21/2024

Tuning-Free Framework for Video Editing TasksImage-to-Video GenerationPrompt-Based Video EditingTemporal Feature InjectionVisual Consistency Evaluation

This paper presents AnyV2V, a tuningfree framework for video editing that addresses quality and control issues in current generative models. Its approach includes modifying the first frame with an existing image editing model and generating a video through temporal feature injec

010

Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets

Published:10/23/2025

3D Asset Generation from Single ImagesSimulation-Ready Assets for Physics EnginesHigh-Fidelity 3D Scene GenerationScalable Content Creation for Robotic ManipulationSeed3D 1.0 Foundation Model

Seed3D 1.0 is a foundation model that generates highfidelity, simulationready 3D assets from single images, effectively balancing content diversity and physics accuracy for scalable training environments in embodied AI development.

Qwen-Image Technical Report

Published:8/4/2001

Text-to-Image GenerationImage Generation ModelImage Editing TechniquesDual-Encoding MechanismData Pipeline Optimization

QwenImage enhances text rendering and image editing using a comprehensive data pipeline and progressive training, while a dualencoding mechanism balances semantic consistency and fidelity, excelling particularly in Chinese text generation.

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Published:11/6/2001

Vision-Language ModelsVideo Generation ModelsMultimodal ReasoningVideo Thinking Benchmark

The "Thinking with Video" paradigm enhances multimodal reasoning by integrating video generation models, validated through the Video Thinking Benchmark, showing performance improvements in both vision and text tasks while addressing static constraints and modality separation.

Grounding Computer Use Agents on Human Demonstrations

Published:11/10/2001

Large-Scale Desktop Grounding DatasetDesktop UI Element MappingInstruction-to-UI Element Conversion ModelGroundNext Model

The study introduces , a largescale desktop grounding dataset based on expert human demonstrations. It enables the models to achieve stateoftheart performance in mapping instructions to UI elements, highlighting the critical role of highquality data i

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

Published:5/28/2025

Reinforcement Learning for Robotic ControlFlow Matching Policy Fine-TuningOnline Reinforcement Learning FrameworkLong-Horizon Planning with Visual InputSparse Reward Task Benchmarking

ReinFlow is an online reinforcement learning framework for finetuning flow matching policies in robotic control, enhancing exploration and training stability. Experiments show significant improvements in reward and success rates while reducing computation time in challenging tas

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation

Gaussian World ModelRobotic ManipulationModel-Based Reinforcement LearningOffline Imitation LearningPolicy Network

This paper presents the Gaussian World Model (GWM) for robotic manipulation, addressing the lack of 3D geometric understanding in existing models. GWM uses 3D Gaussian primitives and combines a Diffusion Transformer with a 3D VAE to reconstruct future states, enhancing imitation

481 - 500 / 980

Papers