Page 9 - Paper Library - AiPaper

The Video Prediction Policy (VPP) utilizes Video Diffusion Models (VDMs) to generate visual representations that incorporate both current static and predicted dynamic information, enhancing robot action learning and achieving a 31.6% increase in success rates for complex tasks.

Real-World Reinforcement Learning of Active Perception Behaviors

Published:12/1/2025

Reinforcement Learning for Active Perception BehaviorsAsymmetric Advantage Weighted Regression (AAWR)Robot Learning under Partial ObservabilityPrivileged Value Function EstimationRobot Manipulation Task Evaluation

The paper introduces Asymmetric Advantage Weighted Regression (AAWR) to train active perception policies for robots facing partial observability. Utilizing privileged sensors allows for highquality value function training, significantly enhancing task performance across various

A Learned Cache Eviction Framework with Minimal Overhead

Published:1/27/2023

Machine Learning Cache Eviction FrameworkIntegration of Traditional Cache Systems with Machine LearningEfficient Caching AlgorithmsProduction Workload EvaluationLow-Overhead Cache Decision Making

The MAT framework reduces the number of ML predictions for cache eviction from 63 to 2 by using a heuristic as a filter, maintaining low miss ratios similar to stateoftheart ML systems, which enhances practicality for highthroughput environments.

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Published:12/17/2025

Real-Time Interactive World ModelingLong-Term Geometric Consistencyvideo diffusion modelsMemory-Aware ModelingDynamic Context Reconstruction

This paper introduces WorldPlay, a video diffusion model for realtime interactive world modeling with longterm geometric consistency, achieved through three innovations: Dual Action Representation, Reconstituted Context Memory, and Context Forcing.

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

Published:4/13/2023

Text-to-Image GenerationHuman Preference Reward ModelReward Feedback LearningDiffusion Model OptimizationExpert Comparison Ratings

This study introduces ImageReward, a generalpurpose human preference reward model for texttoimage generation, trained on a systematic annotation process with 137,000 expert comparisons. It outperforms existing models and proposes Reward Feedback Learning (ReFL) for optimizing

A Biologically Plausible Parser

Published:8/5/2021

Biologically Plausible ParserAssembly CalculusLanguage ParsingComputational Framework for Cognitive FunctionsEnglish Sentence Parsing

The paper presents a biologically plausible parser using Assembly Calculus, demonstrating that simple neural mechanisms can effectively parse complex sentences in English and Russian, highlighting the potential of biological models in advanced language processing.

Detailed balance in large language model-driven agents

Published:12/11/2025

LLM Generative DynamicsApplication of Least Action PrincipleTransition Probability Statistical AnalysisMacroscopic Dynamics TheoryComplex AI Systems

This paper introduces a method based on the least action principle to uncover detailed balance in LLMdriven agents, highlighting that their generative processes depend on potential functions rather than generic rules, marking a significant theoretical advance in AI dynamics.

A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation

Published:12/12/2025

RL Training for Large Language ModelsMarkov Decision Process ModelingAutomated Policy GenerationVerifiable Stage-wise ModelingAdvanced Reinforcement Learning Applications

The ALAMP framework automates the transition from natural language task descriptions to MDP modeling and policy generation. By decomposing modeling, coding, and training into verifiable stages, ALAMP enhances policy generation capabilities, outpacing traditional large language

MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation

Published:8/27/2025

Vision-Language-Action ModelRobotic ManipulationLong-Term Memory and Anticipatory ActionMemory-Conditioned Diffusion ModelsShort-Term Memory and Cognition Fusion

MemoryVLA is a memorycentric VisionLanguageAction framework for nonMarkovian robotic manipulation, integrating working memory and episodic memory. It significantly enhances performance in 150 tasks, achieving up to a 26% success rate increase across simulations and realworl

SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation

Published:11/13/2025

Disentangled Spatial Representation ModelRobotic ManipulationSemantic-guided Geometric ModuleMultitask EvaluationSpatial Transformer

The paper presents the SpatialActor model to enhance robustness in robotic manipulation by decoupling semantic and geometric information. It employs a semanticguided geometric module and a spatial transformer. The model demonstrates superior performance across various tasks unde

SCB-Dataset: A Dataset for Detecting Student and Teacher Classroom Behavior

Published:4/5/2023

Classroom Behavior Detection DatasetStudent-Teacher Behavior AnalysisDeep Learning Applications in EducationBenchmarking YOLO Series AlgorithmsVision-Language Models

The paper presents SCBDataset, the first largescale dataset covering 19 classroom behavior classes for students and teachers, addressing data scarcity in education. It includes 13,330 images and 122,977 labels, designed for object detection and image classification, establishin

MiMo-Audio: Audio Language Models are Few-Shot Learners

Audio Language ModelsFew-Shot Learning CapabilitiesSpeech Intelligence BenchmarksAudio Understanding BenchmarksTask Generation and Conversion

MiMoAudio demonstrates strong fewshot learning abilities in audio tasks, leveraging over 100 million hours of pretraining data. It achieved stateoftheart performance in speech intelligence and audio understanding benchmarks while effectively generalizing to new tasks.

心相应，爱相随：夫妻相似性与婚姻满意度

Couple Similarity ResearchMarital SatisfactionPsychological Research MethodsSociological ResearchImpact of Family of Origin

This study examines the impact of couple similarity on marital satisfaction using a couplecentered approach with 638 Chinese couples. It finds real spouses are more similar, particularly in family of origin, with effects on satisfaction varying by gender and marriage stage, whil

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Published:2/28/2024

1-bit Large Language ModelsBitNet ArchitectureCost-Effectiveness OptimizationModel Compression and High PerformanceCustom Hardware Design

This study introduces BitNet b1.58, a 1bit LLM variant using ternary weights {1, 0, 1}. It matches the performance of fullprecision models while being more costeffective in latency, memory, throughput, and energy, paving the way for new training methods and hardware design.

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Published:7/1/2024

Diffusion Model for Sequence GenerationEnhanced Sampling with Causal PredictionMulti-Stage Generation OptimizationPerformance Enhancement in Decision-Making and Planning TasksVariable-Length Generation and Diffusion Guidance

This paper introduces Diffusion Forcing, a novel training paradigm that combines nexttoken prediction and fullsequence diffusion, enabling denoising of tokens with independent noise levels. It supports variablelength generation and offers significant performance improvements i

Toward Full-Immersive Multiuser Virtual Reality With Redirected Walking

Published:1/1/2023

Multiplayer Virtual RealityRedirected Walking AlgorithmsHead-Mounted DisplaysVirtual Environment Performance EvaluationFull-Immersive Virtual Reality

This study addresses the challenge of achieving continuous fullimmersive multiuser experiences in VR by proposing Redirected Walking (RDW) algorithms. A modular framework was developed for performance evaluation, demonstrating that proposed enhancements significantly improve use

Redirected Walking for Multi-User eXtended Reality Experiences with Confined Physical Spaces

Published:9/30/2025

Multi-User Redirected Walking in Virtual RealityExploration of Virtual Environments in Confined SpacesVirtual Reality Maze Game DesignMotion Evaluation in Networked Virtual Reality EnvironmentsCybersickness Research and Assessment

This paper presents a novel redirected walking algorithm combining Artificial Potential Fields and SteertoOrbit techniques, supporting multiuser XR experiences in a confined 6x6m² space. Tests show an 80% reduction in cybersickness while enhancing walking efficiency and user c

Incident Diagnosing and Reporting System Based on Retrieval Augmented Large Language Model

Published:4/11/2025

Retrieval-Augmented LLM for Incident Diagnosis and ReportingAnomaly Analysis of IoT Sensor RecordsAutomated Incident Report GenerationDiagnosis of Complex EventsIoT Maintenance and Troubleshooting Support

The study introduces RAIDR, a Retrieval Augmented language model for diagnosing and reporting incidents in IoT. It retrieves relevant documentation and utilizes LLM to analyze anomalies and generate reports, streamlining maintenance and troubleshooting.

Leveraging LLMs for Collaborative Ontology Engineering in Parkinson Disease Monitoring and Alerting

Published:12/16/2025

Application of Large Language Models in Ontology EngineeringOntology for Parkinson's Disease Monitoring and AlertingHuman-LLM Collaborative Ontology ConstructionOne Shot and Chain of Thought Prompt TechniquesX-HCOME and SimX-HCOME+ Methodologies

The paper examines four methods for using LLMs in constructing a Parkinson's Disease monitoring ontology, revealing that while LLMs can generate ontologies, humanLLM collaboration significantly improves their comprehensiveness and accuracy.

HiMaCon: Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data

Published:10/13/2025

Unsupervised Hierarchical Manipulation Concept LearningCross-Modal Data Correlation AnalysisCross-Modal Perception NetworkRobotic Manipulation Policy OptimizationHierarchical Temporal Abstraction Modeling

HiMaCon is a selfsupervised framework that learns hierarchical manipulation concepts from unlabeled multimodal robot demonstrations, enhancing imitation learning by capturing crossmodal correlations and structuring concepts across temporal horizons, significantly improving gen

161 - 180 / 980

Papers