Papers
Sign in to view your remaining parses.
Tag Filter
NAGphormer: A Tokenized Graph Transformer for Node Classification in Large Graphs
Published:6/10/2022
Node Classification in GraphsGraph Transformer ArchitectureNeighborhood Aggregation MethodHop2Token ModuleLarge Graph Processing
NAGphormer is proposed to address the computational complexity of existing graph Transformers for node classification in large graphs. By introducing the Hop2Token module, it aggregates multihop neighborhood features as sequences, enhancing node representation effectiveness and
03
Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image
Published:9/14/2021
Single-Stage Keypoint-Based Object Pose EstimationCategory-Level Object Pose Estimation from RGB Images6-DoF Pose Estimation for Unknown InstancesInformation Propagation using convGRUObjectron Benchmark
This paper introduces a singlestage, keypointbased method for categorylevel 6DoF object pose estimation from a single RGB image. It leverages convGRU for effective information propagation, achieving superior results on the Objectron benchmark compared to stateoftheart meth
02
Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects
Published:9/28/2018
Deep Object Pose EstimationSynthetic Data in Robotic Manipulation6-DoF Pose EstimationDomain Randomization and Real ImagesReal-Time Object Pose Estimation System
The DOPE system uses synthetic data to train a deep network for 6DoF pose estimation, effectively bridging the reality gap. It achieves stateoftheart performance in realworld applications, enabling effective robotic grasping.
02
K*-Means: A Parameter-free Clustering Algorithm
Published:5/17/2025
Parameter-Free Clustering AlgorithmMinimum Description Length PrincipleK*-Means ClusteringClustering Optimization Algorithm
The kmeans algorithm introduced here is parameterfree, using the minimum description length principle to automatically determine the optimal number of clusters. It outperforms existing methods in scenarios with unknown k and demonstrates competitive runtime and scalability.
02
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
Recommendation as Language ProcessingUnified Pretraining for RecommendationPersonalized Prompt GenerationKnowledge Transfer in Recommendation SystemsEnhanced Task Generalization
The paper introduces P5, a unified framework that reformulates recommendation tasks as language processing problems using natural language sequences. It leverages personalized prompts and a Transformerbased architecture to enable shared training across multiple tasks, demonstrat
04
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization
Published:10/29/2025
Fine-Tuning of Vision-Language-Action ModelsVisual Representation Alignment MethodsOut-of-Distribution Generalization CapabilityAnalysis of Vision-Language Model PerformanceRetention of Visual Action Knowledge
This paper investigates how finetuning VisionLanguageAction (VLA) models degrades visual representations. It reveals that naive finetuning harms visual knowledge, affecting performance in outofdistribution scenarios. A visual representation alignment method is introduced to
05
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
Published:1/16/2025
Survey on Agentic RAG ArchitecturesAdaptive Retrieval-Augmented GenerationMulti-Agent Collaboration StrategiesDynamic Task ManagementReal-Time Data Retrieval and Context Understanding
Agentic RetrievalAugmented Generation (RAG) enhances traditional RAG by embedding autonomous AI agents, overcoming limitations in flexibility and contextawareness. This survey reviews its principles, taxonomy, applications in healthcare, finance, and education, and addresses ch
05
ISPDiffuser: Learning RAW-to-sRGB Mappings with Texture-Aware Diffusion Models and Histogram-Guided Color Consistency
Published:4/11/2025
RAW-to-sRGB MappingTexture-Aware Diffusion ModelsHistogram-Guided Color ConsistencyImage Signal Processing SimulationSmartphone Sensor Image Processing
ISPDiffuser introduces a framework using textureaware diffusion models for RAWtosRGB mapping, addressing detail and color consistency issues. It incorporates texture enrichment loss and a histogramguided color consistency module, outperforming existing methods in quantitative
04
DiffRAW: Leveraging Diffusion Model to Generate DSLR-Comparable Perceptual Quality sRGB from Smartphone RAW Images
Published:3/24/2024
Diffusion Model-based Image GenerationSmartphone RAW Image ProcessingDSLR Image Quality EnhancementPerceptual Quality OptimizationImage Alignment and Mapping
DiffRAW is a novel method that utilizes diffusion models to transform smartphone RAW images into sRGB with DSLRquality perception, enhancing detail while maintaining structural integrity and color alignment, achieving stateoftheart performance across various evaluation metric
01
Recommender Systems in the Era of Large Language Models (LLMs)
Published:7/5/2023
LLM-based Recommendation SystemsLarge Language Model Fine-TuningGenerative Recommendation SystemsPre-training and Fine-tuning in Recommender SystemsPrompting Methods for Large Language Models
This paper reviews techniques for enhancing recommender systems using Large Language Models (LLMs), focusing on pretraining, finetuning, and prompting. It highlights LLMs' potential in feature encoding and their future applications in recommender system research.
04
What Matters to Student Success: A Review of the Literature
Published:1/1/2006
Factors Influencing Student SuccessHigher Education Student ExperienceStudent Engagement FrameworkAlignment of Educational Policy with Student NeedsStudent Retention and Achievement
This report reviews factors affecting student success in higher education, identifying key themes and effective practices for retention and achievement. It highlights the significance of institutional support, student engagement, and aligning educational policy with student needs
01
Can LLMs Address Mental Health Questions? A Comparison with Human Therapists
Published:9/16/2025
LLM-based Mental Health Question AnsweringComparison Study between Human Therapists and LLMsEmotional and Readability AnalysisHuman-Computer Interaction in Mental HealthLimitations of LLMs in Mental Health
This study compares responses generated by large language models (LLMs) and human therapists in mental health contexts. LLMs provided longer, clearer, and more positive answers, yet users preferred human therapist support, highlighting both the potential and limitations of LLMs i
014
Digital Image Noise Estimation Using DWT Coefficients
Published:1/1/2021
Image Noise Estimation Using Discrete Wavelet TransformDigital Image ProcessingGaussian Noise Strength EstimationImage Denoising Applications
This study presents a novel hybrid algorithm combining Discrete Wavelet Transform (DWT) and edge information removal to accurately estimate Gaussian noise strength in digital images, demonstrating significant performance improvements over existing methods.
03
AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks
Published:3/21/2024
Tuning-Free Framework for Video Editing TasksImage-to-Video GenerationPrompt-Based Video EditingTemporal Feature InjectionVisual Consistency Evaluation
This paper presents AnyV2V, a tuningfree framework for video editing that addresses quality and control issues in current generative models. Its approach includes modifying the first frame with an existing image editing model and generating a video through temporal feature injec
010
Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets
Published:10/23/2025
3D Asset Generation from Single ImagesSimulation-Ready Assets for Physics EnginesHigh-Fidelity 3D Scene GenerationScalable Content Creation for Robotic ManipulationSeed3D 1.0 Foundation Model
Seed3D 1.0 is a foundation model that generates highfidelity, simulationready 3D assets from single images, effectively balancing content diversity and physics accuracy for scalable training environments in embodied AI development.
03
Qwen-Image Technical Report
Published:8/4/2001
Text-to-Image GenerationImage Generation ModelImage Editing TechniquesDual-Encoding MechanismData Pipeline Optimization
QwenImage enhances text rendering and image editing using a comprehensive data pipeline and progressive training, while a dualencoding mechanism balances semantic consistency and fidelity, excelling particularly in Chinese text generation.
04
Thinking with Video: Video Generation as a Promising Multimodal
Reasoning Paradigm
Published:11/6/2001
Vision-Language ModelsVideo Generation ModelsMultimodal ReasoningVideo Thinking Benchmark
The "Thinking with Video" paradigm enhances multimodal reasoning by integrating video generation models, validated through the Video Thinking Benchmark, showing performance improvements in both vision and text tasks while addressing static constraints and modality separation.
03
Grounding Computer Use Agents on Human Demonstrations
Published:11/10/2001
Large-Scale Desktop Grounding DatasetDesktop UI Element MappingInstruction-to-UI Element Conversion ModelGroundNext Model
The study introduces , a largescale desktop grounding dataset based on expert human demonstrations. It enables the models to achieve stateoftheart performance in mapping instructions to UI elements, highlighting the critical role of highquality data i
01
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Published:5/28/2025
Reinforcement Learning for Robotic ControlFlow Matching Policy Fine-TuningOnline Reinforcement Learning FrameworkLong-Horizon Planning with Visual InputSparse Reward Task Benchmarking
ReinFlow is an online reinforcement learning framework for finetuning flow matching policies in robotic control, enhancing exploration and training stability. Experiments show significant improvements in reward and success rates while reducing computation time in challenging tas
02
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
Gaussian World ModelRobotic ManipulationModel-Based Reinforcement LearningOffline Imitation LearningPolicy Network
This paper presents the Gaussian World Model (GWM) for robotic manipulation, addressing the lack of 3D geometric understanding in existing models. GWM uses 3D Gaussian primitives and combines a Diffusion Transformer with a 3D VAE to reconstruct future states, enhancing imitation
03
……