Papers

Sign in to view your remaining parses.
Tag Filter
Riemannian Flow Matching Policy for Robot Motion Learning
Published:3/16/2024
Flow Matching PoliciesRobotic Action LearningVisuomotor PoliciesRiemannian Flow Matching PolicyGeometric-Aware Robot Control
The paper presents Riemannian Flow Matching Policies (RFMP), a model for learning robot visuomotor strategies that excels in efficient training and inference. RFMP effectively manages highdimensional, multimodal distributions and incorporates geometric awareness, outperforming e
01
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
Published:6/11/2025
Autoregressive Adversarial Post-TrainingReal-time Video Generationvideo diffusion modelsInteractive Video GenerationLong Video Generation
The paper introduces Autoregressive Adversarial PostTraining (AAPT) to transform a pretrained latent video diffusion model into an efficient realtime interactive video generator. It generates one latent frame per evaluation, streams in real time, and responds to user interacti
03
Fast and Robust Visuomotor Riemannian Flow Matching Policy
Published:12/14/2024
Riemannian Flow Matching PolicyVisuomotor PoliciesStable Riemannian Flow Matching PolicyRobotic Task LearningGeometric Constraints
The paper introduces the Riemannian Flow Matching Policy (RFMP) for visuomotor tasks, offering fast inference and easy training. It incorporates geometric constraints for robustness and outperforms traditional diffusion policies in real and simulated tasks.
02
GentleHumanoid: Learning Upper-body Compliance for Contact-rich Human and Object Interaction
Published:11/7/2025
Upper-body Compliance Learning for Humanoid RobotsSpring-based Impedance ControlContact-rich Human-Robot InteractionSafe Object ManipulationWhole-body Motion Tracking Policy
GentleHumanoid integrates impedance control into a wholebody motion tracking policy for humanoid robots, achieving upperbody compliance. It employs a springbased model to adapt to diverse humanrobot interactions, reducing contact forces while ensuring successful task executio
03
Learning Human-Humanoid Coordination for Collaborative Object Carrying
Published:10/16/2025
Human-Humanoid CollaborationProprioceptive Reinforcement LearningCollaborative Carrying TasksDynamic Object InteractionClosed-Loop Training Environment
The COLA method enables effective humanhumanoid collaboration in complex carrying tasks using proprioceptiononly reinforcement learning. It predicts object motion and human intent, achieving a 24.7% reduction in human effort while maintaining stability, validated across various
02
Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning
Published:11/14/2025
Humanoid Whole-Body ControlReinforcement Learning Training PipelineAction Generation in Dynamic EnvironmentsBadminton Motion ControlMultistage Reinforcement Learning
This paper presents a reinforcement learning training pipeline to develop a unified wholebody controller for humanoid badminton, enabling coordinated footwork and striking without reliance on motion priors or expert demonstrations. The training is validated in both simulated and
02
IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models
Published:5/22/2025
Instruction-Following Capability in Audio LLMsIFEval-Audio Benchmark DatasetMultimodal Model EvaluationAudio Instruction GenerationAudio-Text Instruction Pairing
The study introduces IFEvalAudio, a novel dataset for assessing instructionfollowing capabilities in audiobased large language models, comprising 280 audioinstructionanswer triples across six dimensions, and benchmarks stateoftheart audio LLMs.
01
AHELM: A Holistic Evaluation of Audio-Language Models
Published:8/29/2025
Evaluation of Audio-Language ModelsAHELM BenchmarkPARADE DatasetMultimodal Model Performance AssessmentSpeech Recognition and Language Model Integration
AHELM is a benchmark introduced to holistically assess AudioLanguage Models (ALMs), integrating multiple datasets and introducing PARADE and CoReBench. It covers ten key evaluation aspects and standardizes methods for equitable model comparisons.
01
AudioBench: A Universal Benchmark for Audio Large Language Models
Published:4/1/2025
Benchmark for Audio Large Language ModelsEvaluation of Audio Understanding TasksSpeech Understanding and Scene RecognitionDatasets for Speech and Voice UnderstandingEvaluation Toolkit for Audio Large Language Models
AudioBench is introduced as a universal benchmark for Audio Large Language Models, covering 8 tasks and 26 datasets, including 7 new ones. It evaluates speech and audio scene understanding, addressing gaps in existing benchmarks for instructionfollowing capabilities. Five models
01
Prototype memory and attention mechanisms for few shot image generation
Published:10/6/2021
Few-Shot Image GenerationPrototype Memory MechanismMemory Concept Attention (MoCA)Neural Network Visual ProcessingMomentum Online Clustering
This study explores the role of "grandmother cells" in the primary visual cortex in image generation, proposing them as prototype memory priors. These are learned via momentum online clustering and utilized through Memory Concept Attention (MoCA), significantly improving synthesi
02
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Published:9/11/2024
Speech Interaction with Large Language ModelsLLaMA-Omni Speech Model ArchitectureLow-Latency Speech GenerationInstructS2S-200K DatasetReal-Time Speech Response
LLaMAOmni is a novel speech interaction model that enables lowlatency, highquality interaction with large language models. Utilizing a unique architecture and the InstructS2S200K dataset, it generates text and speech responses within 226 ms without requiring transcription.
01
Deanonymizing Ethereum Validators: The P2P Network Has a Privacy Issue
Published:9/6/2024
Ethereum Validator DeanonymizationPrivacy Issues in Blockchain P2P NetworksValidator Geographic Distribution AnalysisDeanonymization Methodology and ExperimentsSecurity Vulnerabilities in Ethereum Network
This study reveals a significant privacy vulnerability in Ethereum's P2P network, demonstrating its failure to protect validator anonymity. The proposed method enables nodes to identify validators on connected peers, locating over 15% of them through data analysis. The paper disc
02
Active Visual Perception: Opportunities and Challenges
Published:12/3/2025
Active Visual PerceptionVisual Perception in Complex EnvironmentsRobotic Active PerceptionDynamic Decision-Making and Multimodal InputsReal-Time Visual Data Processing
Active visual perception enables systems to dynamically interact with the environment for better data acquisition. This paper reviews its potential and challenges, underscoring its significance in robotics, autonomous vehicles, and surveillance, while highlighting issues like rea
01
Personalized Generation In Large Model Era: A Survey
Published:3/4/2025
Personalized Content Generation ResearchPersonalized Generation in Large Model EraEvaluation Metrics for Personalized Generation SystemsMultimodal Personalized Generation TechniquesDatasets for Personalized Generation
This survey comprehensively investigates Personalized Generation (PGen) in the era of large models, conceptualizing its key components and objectives. A multilevel taxonomy reviews technical advancements and datasets while envisioning PGen's applications and future challenges, p
03
Large Language Models for Power System Applications: A Comprehensive Literature Survey
Published:12/15/2025
Large Language Models in Power SystemsFault Diagnosis in Power SystemsLoad ForecastingOptimization and Control in Power SystemsSimulation and Planning in Power Systems
This review analyzes the applications of Large Language Models (LLMs) in power systems from 2020 to 2025, covering areas like fault diagnosis and load forecasting. It notes the potential of LLMs while highlighting challenges such as data scarcity and safety. Future research shoul
06
Utilizing LLMs for Industrial Process Automation: A Case Study on Modifying RAPID Programs
Published:11/14/2025
LLMs in Industrial Process AutomationRAPID Program ModificationFew-Shot Prompting MethodDomain-Specific Programming LanguagesSensitive Data Protection
This study explores using existing Large Language Models for industrial process automation, specifically modifying RAPID programming. It finds that fewshot prompting can effectively address simple issues without extensive model training, while ensuring the security of sensitive
02
Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought
Published:5/21/2025
Multimodal Chain-of-ThoughtLarge Vision-Language ModelsForms of Visual Thought ExpressionsImage-Text Interleaved GenerationMultimodal Task Performance Enhancement
This paper investigates the mechanisms of Multimodal ChainofThought (MCoT) in Large VisionLanguage Models (LVLMs), highlighting how visual thoughts enhance performance and interpretability. Four forms of visual thought expressions are defined, demonstrating their impact on MCo
01
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models
Published:12/2/2024
Customized Motion TransferMultimodal Large Language Modelvideo diffusion modelsMotion ModelingText-to-Video Generation
MoTrans introduces a customized motion transfer method using a multimodal large language model recaptioner and an appearance injection module, effectively transferring specific humancentric motions from reference videos to new contexts, outperforming existing techniques.
04
Motion Prompting: Controlling Video Generation with Motion Trajectories
Published:12/4/2024
Motion Trajectory Control in Video GenerationConditioned Training of Video Generation ModelsMotion Prompt Expansion MethodModeling Dynamic Actions and Temporal CompositionsInteractive Applications of Video Models
This paper introduces motion prompting to control video generation via motion trajectories, addressing limitations of textbased prompts. It demonstrates converting highlevel requests into detailed motion prompts, showcasing versatility in motion control and image editing with i
02
ShotDirector: Directorially Controllable Multi-Shot Video Generation with Cinematographic Transitions
Published:12/11/2025
Multi-Shot Video GenerationShot Transition DesignCamera Control ModuleHierarchical Editing PatternsShotWeaver40K Dataset
The paper introduces ShotDirector, an efficient framework combining parameterlevel camera control and hierarchical editingpatternaware prompting, enhancing shot transition design in multishot video generation and improving narrative coherence through finegrained control.
03