Papers
Sign in to view your remaining parses.
Tag Filter
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
Published:12/19/2024
video diffusion modelsRobotic Action LearningVideo Prediction PolicyDynamic Visual RepresentationsComplex Manipulation Tasks
The Video Prediction Policy (VPP) utilizes Video Diffusion Models (VDMs) to generate visual representations that incorporate both current static and predicted dynamic information, enhancing robot action learning and achieving a 31.6% increase in success rates for complex tasks.
02
Real-World Reinforcement Learning of Active Perception Behaviors
Published:12/1/2025
Reinforcement Learning for Active Perception BehaviorsAsymmetric Advantage Weighted Regression (AAWR)Robot Learning under Partial ObservabilityPrivileged Value Function EstimationRobot Manipulation Task Evaluation
The paper introduces Asymmetric Advantage Weighted Regression (AAWR) to train active perception policies for robots facing partial observability. Utilizing privileged sensors allows for highquality value function training, significantly enhancing task performance across various
02
A Learned Cache Eviction Framework with Minimal Overhead
Published:1/27/2023
Machine Learning Cache Eviction FrameworkIntegration of Traditional Cache Systems with Machine LearningEfficient Caching AlgorithmsProduction Workload EvaluationLow-Overhead Cache Decision Making
The MAT framework reduces the number of ML predictions for cache eviction from 63 to 2 by using a heuristic as a filter, maintaining low miss ratios similar to stateoftheart ML systems, which enhances practicality for highthroughput environments.
02
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling
Published:12/17/2025
Real-Time Interactive World ModelingLong-Term Geometric Consistencyvideo diffusion modelsMemory-Aware ModelingDynamic Context Reconstruction
This paper introduces WorldPlay, a video diffusion model for realtime interactive world modeling with longterm geometric consistency, achieved through three innovations: Dual Action Representation, Reconstituted Context Memory, and Context Forcing.
03
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Published:4/13/2023
Text-to-Image GenerationHuman Preference Reward ModelReward Feedback LearningDiffusion Model OptimizationExpert Comparison Ratings
This study introduces ImageReward, a generalpurpose human preference reward model for texttoimage generation, trained on a systematic annotation process with 137,000 expert comparisons. It outperforms existing models and proposes Reward Feedback Learning (ReFL) for optimizing
02
A Biologically Plausible Parser
Published:8/5/2021
Biologically Plausible ParserAssembly CalculusLanguage ParsingComputational Framework for Cognitive FunctionsEnglish Sentence Parsing
The paper presents a biologically plausible parser using Assembly Calculus, demonstrating that simple neural mechanisms can effectively parse complex sentences in English and Russian, highlighting the potential of biological models in advanced language processing.
04
Detailed balance in large language model-driven agents
Published:12/11/2025
LLM Generative DynamicsApplication of Least Action PrincipleTransition Probability Statistical AnalysisMacroscopic Dynamics TheoryComplex AI Systems
This paper introduces a method based on the least action principle to uncover detailed balance in LLMdriven agents, highlighting that their generative processes depend on potential functions rather than generic rules, marking a significant theoretical advance in AI dynamics.
03
A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation
Published:12/12/2025
RL Training for Large Language ModelsMarkov Decision Process ModelingAutomated Policy GenerationVerifiable Stage-wise ModelingAdvanced Reinforcement Learning Applications
The ALAMP framework automates the transition from natural language task descriptions to MDP modeling and policy generation. By decomposing modeling, coding, and training into verifiable stages, ALAMP enhances policy generation capabilities, outpacing traditional large language
02
MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation
Published:8/27/2025
Vision-Language-Action ModelRobotic ManipulationLong-Term Memory and Anticipatory ActionMemory-Conditioned Diffusion ModelsShort-Term Memory and Cognition Fusion
MemoryVLA is a memorycentric VisionLanguageAction framework for nonMarkovian robotic manipulation, integrating working memory and episodic memory. It significantly enhances performance in 150 tasks, achieving up to a 26% success rate increase across simulations and realworl
03
SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation
Published:11/13/2025
Disentangled Spatial Representation ModelRobotic ManipulationSemantic-guided Geometric ModuleMultitask EvaluationSpatial Transformer
The paper presents the SpatialActor model to enhance robustness in robotic manipulation by decoupling semantic and geometric information. It employs a semanticguided geometric module and a spatial transformer. The model demonstrates superior performance across various tasks unde
03
SCB-Dataset: A Dataset for Detecting Student and Teacher Classroom Behavior
Published:4/5/2023
Classroom Behavior Detection DatasetStudent-Teacher Behavior AnalysisDeep Learning Applications in EducationBenchmarking YOLO Series AlgorithmsVision-Language Models
The paper presents SCBDataset, the first largescale dataset covering 19 classroom behavior classes for students and teachers, addressing data scarcity in education. It includes 13,330 images and 122,977 labels, designed for object detection and image classification, establishin
05
MiMo-Audio: Audio Language Models are Few-Shot Learners
Audio Language ModelsFew-Shot Learning CapabilitiesSpeech Intelligence BenchmarksAudio Understanding BenchmarksTask Generation and Conversion
MiMoAudio demonstrates strong fewshot learning abilities in audio tasks, leveraging over 100 million hours of pretraining data. It achieved stateoftheart performance in speech intelligence and audio understanding benchmarks while effectively generalizing to new tasks.
01
心相应,爱相随:夫妻相似性与婚姻满意度
Couple Similarity ResearchMarital SatisfactionPsychological Research MethodsSociological ResearchImpact of Family of Origin
This study examines the impact of couple similarity on marital satisfaction using a couplecentered approach with 638 Chinese couples. It finds real spouses are more similar, particularly in family of origin, with effects on satisfaction varying by gender and marriage stage, whil
02
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Published:2/28/2024
1-bit Large Language ModelsBitNet ArchitectureCost-Effectiveness OptimizationModel Compression and High PerformanceCustom Hardware Design
This study introduces BitNet b1.58, a 1bit LLM variant using ternary weights {1, 0, 1}. It matches the performance of fullprecision models while being more costeffective in latency, memory, throughput, and energy, paving the way for new training methods and hardware design.
02
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Published:7/1/2024
Diffusion Model for Sequence GenerationEnhanced Sampling with Causal PredictionMulti-Stage Generation OptimizationPerformance Enhancement in Decision-Making and Planning TasksVariable-Length Generation and Diffusion Guidance
This paper introduces Diffusion Forcing, a novel training paradigm that combines nexttoken prediction and fullsequence diffusion, enabling denoising of tokens with independent noise levels. It supports variablelength generation and offers significant performance improvements i
02
Toward Full-Immersive Multiuser Virtual Reality With Redirected Walking
Published:1/1/2023
Multiplayer Virtual RealityRedirected Walking AlgorithmsHead-Mounted DisplaysVirtual Environment Performance EvaluationFull-Immersive Virtual Reality
This study addresses the challenge of achieving continuous fullimmersive multiuser experiences in VR by proposing Redirected Walking (RDW) algorithms. A modular framework was developed for performance evaluation, demonstrating that proposed enhancements significantly improve use
02
Redirected Walking for Multi-User eXtended Reality Experiences with Confined Physical Spaces
Published:9/30/2025
Multi-User Redirected Walking in Virtual RealityExploration of Virtual Environments in Confined SpacesVirtual Reality Maze Game DesignMotion Evaluation in Networked Virtual Reality EnvironmentsCybersickness Research and Assessment
This paper presents a novel redirected walking algorithm combining Artificial Potential Fields and SteertoOrbit techniques, supporting multiuser XR experiences in a confined 6x6m² space. Tests show an 80% reduction in cybersickness while enhancing walking efficiency and user c
03
Incident Diagnosing and Reporting System Based on Retrieval Augmented Large Language Model
Published:4/11/2025
Retrieval-Augmented LLM for Incident Diagnosis and ReportingAnomaly Analysis of IoT Sensor RecordsAutomated Incident Report GenerationDiagnosis of Complex EventsIoT Maintenance and Troubleshooting Support
The study introduces RAIDR, a Retrieval Augmented language model for diagnosing and reporting incidents in IoT. It retrieves relevant documentation and utilizes LLM to analyze anomalies and generate reports, streamlining maintenance and troubleshooting.
02
Leveraging LLMs for Collaborative Ontology Engineering in Parkinson Disease Monitoring and Alerting
Published:12/16/2025
Application of Large Language Models in Ontology EngineeringOntology for Parkinson's Disease Monitoring and AlertingHuman-LLM Collaborative Ontology ConstructionOne Shot and Chain of Thought Prompt TechniquesX-HCOME and SimX-HCOME+ Methodologies
The paper examines four methods for using LLMs in constructing a Parkinson's Disease monitoring ontology, revealing that while LLMs can generate ontologies, humanLLM collaboration significantly improves their comprehensiveness and accuracy.
02
HiMaCon: Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data
Published:10/13/2025
Unsupervised Hierarchical Manipulation Concept LearningCross-Modal Data Correlation AnalysisCross-Modal Perception NetworkRobotic Manipulation Policy OptimizationHierarchical Temporal Abstraction Modeling
HiMaCon is a selfsupervised framework that learns hierarchical manipulation concepts from unlabeled multimodal robot demonstrations, enhancing imitation learning by capturing crossmodal correlations and structuring concepts across temporal horizons, significantly improving gen
02
……