Papers
Sign in to view your remaining parses.
Tag Filter
PIPEMESH: Achieving Memory-Efficient Computation-Communication Overlap for Training Large Language Models
Published:1/1/2025
Training Efficiency Optimization for Large Language ModelsElastic Pipeline SchedulingMixed Sharding StrategyCommunication-Computational OverlapMemory Optimization Techniques
PIPEMESH introduces an elastic pipeline scheduling method to enhance the efficiency of computationcommunication overlap in training large language models. It utilizes mixed sharding and selective recomputation, achieving a 20.1% to 33.8% increase in throughput while reducing mem
03
FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing
Published:6/10/2025
Frequency Bias MitigationDeepfake DetectionFrequency Feature AugmentationConsistency RegularizationCross-Domain Generalization
The paper introduces FreqDebias, a framework addressing spectral bias in deepfake detection by leveraging Forgery Mixup and dual consistency regularization, significantly enhancing crossdomain generalization and outperforming stateoftheart methods.
02
Universal Method for Enhancing Dynamics in Neural Networks via Memristor and Application in IoT-Based Robot Navigation
Published:1/1/2025
IoT-Based Robot Navigation with Memristor Neural NetworksDynamic Enhancement in Multimodal Neural NetworksCentral Cyclic Neural NetworksMemristive Central Cyclic Neural NetworksRobot Motion Performance Evaluation
This study presents a universal method for enhancing the dynamics of memristive neural networks, improving IoTbased robots' navigation and security in complex environments through various dynamic models and experimental validations.
02
GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields
Published:9/1/2023
Multi-Task Robotic ManipulationGeneralizable Neural Feature FieldsVisual Behavior CloningPerceiver TransformerStable Diffusion Model
GNFactor proposes a behavior cloning agent using Generalizable Neural Feature Fields, enhancing robots' multitask manipulation in complex environments by optimizing reconstruction and decisionmaking modules. It significantly improves 3D structure understanding and semantic comp
02
Multi-User Redirected Walking in Separate Physical Spaces for Online VR Scenarios
Published:3/2/2023
Multi-User Redirected WalkingOnline Virtual Reality ScenariosUser Fairness StrategyVirtual Environment CoordinationImmersive Experience Optimization
This paper introduces a novel multiuser redirected walking method to address locomotion fairness issues in online multiplayer VR, significantly reducing reset occurrences while enhancing immersive experiences for users.
02
A Study on Multi-User Interaction-based Redirected Walking
Published:10/13/2023
Multi-User InteractionRedirected WalkingVirtual Reality User Experience
This study investigates integrating Redirected Walking (RDW) in multiuser VR, analyzing how user interactions can mask discrete manipulations. Findings reveal 81% of participants were unaware of translations, providing developers with practical guidance for effective RDW use in
02
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Published:9/13/2022
Multi-Task Transformer for Robotic ManipulationPerceiver Transformer6-DoF ManipulationLanguage-Conditioned Behavior CloningRGB-D Voxel Observations
The PerAct framework enhances robotic manipulation efficiency under data scarcity by transforming RGBD observations into voxel representations with a Perceiver Transformer. It demonstrates strong performance on 18 simulated and 7 realworld tasks using few demonstrations, outper
02
CONCURRENCY CONTROL IN REAL TIME DATABASE SYSTEMS: ISSUES AND CHALLENGES
Concurrency Control in Real-Time Database SystemsTransaction Prioritization in Real-Time DatabasesChallenges in Real-Time Database SystemsResearch on Concurrency Control Techniques
RealTime Database Systems (RTDBS) face unique challenges requiring prioritized transaction execution within strict time constraints. Existing probabilistic concurrency control techniques are unsuitable for RTDBS. This paper explores these issues and proposes new adaptive control
03
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Published:8/30/2024
Real-Time Speech Interaction ModelText-Instructed Speech GenerationVoiceAssistant-400K DatasetStreaming Inference MethodsEnd-to-End Conversational System
The paper presents MiniOmni, an endtoend opensource realtime speech interaction model that generates text and audio simultaneously using textinstructed speech generation and batchparallel inference. It also introduces the VoiceAssistant400K dataset to enhance voice assist
02
Information to Users
Published:9/1/1989
Training-Free Acceleration MethodsLLM Security MechanismRobotic Action LearningMath Reasoning BenchmarksText-to-Image Generation
The paper examines concurrency control algorithms for realtime database systems, highlighting existing technical flaws and potential methods to enhance algorithm efficiency, contributing significantly to improving the reliability of realtime data processing.
02
Spatial Intention Maps for Multi-Agent Mobile Manipulation
Published:5/30/2021
Multi-Agent Mobile ManipulationSpatial Intention MapsVision-Based Deep Reinforcement LearningDecentralized CollaborationMulti-Robot Cooperative Behavior
This paper introduces spatial intention maps for enhancing coordination in multiagent mobile manipulation, converting each agent's intentions into a 2D overhead map aligned with visual input. Experiments show significant performance improvements and enhanced cooperative behavior
02
Recent Advances in Discrete Speech Tokens: A Review
Published:2/10/2025
Discrete Speech TokensReview of Speech Representation TechnologiesAcoustic and Semantic TokensIntegration of Speech into Large Language ModelsDiscrete Speech Tokenization
This review establishes a classification for discrete speech tokens in large language models, examining acoustic and semantic tokens through systematic comparisons. It highlights the importance of discretization for textfree speech modeling and outlines ongoing challenges and fu
02
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Published:5/7/2025
VITA-Audio Multimodal Language ModelFast Audio-Text GenerationLightweight Cross-Modal Token Prediction ModuleReal-Time Conversational CapabilitySpeech Recognition and Synthesis Tasks
VITAAudio is an endtoend speechlanguage model that reduces latency in audio token generation using a lightweight Multiple Crossmodal Token Prediction module, achieving a 3 to 5 times inference speedup, enabling realtime conversation capabilities.
02
LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models
Published:10/17/2025
Speech Large Language ModelAudio Tokenization and DetokenizationMulti-Stage Training StrategyLow-Bitrate High-Quality Speech SynthesisAcoustic Feature Extraction
LongCatAudioCodec is an audio tokenizer and detokenizer solution for industrial speech large language models, utilizing a decoupled architecture and multistage training. It achieves high intelligibility and quality synthesis at ultralow frame rates and bitrates.
02
Constrained Style Learning from Imperfect Demonstrations under Task Optimality
Published:7/13/2025
Constrained Style LearningLearning from Imperfect DemonstrationsTask Optimality in Reinforcement LearningRobot Style ImitationAdaptive Lagrangian Multiplier
The study proposes ConsMimic, a method that models style imitation from imperfect demonstrations as a constrained Markov Decision Process, ensuring high task performance while capturing stylistic nuances. An adaptive Lagrangian multiplier enables selective imitation, achieving a
03
AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control
Published:4/6/2021
Adversarial Imitation LearningPhysics-Based Character ControlMotion Prior MechanismDynamic Selection in Reinforcement LearningUnstructured Motion Dataset
The paper presents a fully automated method called Adversarial Motion Priors (AMP) for generating graceful and realistic motions in physically simulated characters, utilizing adversarial imitation learning to simplify task objectives and learn behavior styles from unstructured mo
020
Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions
Published:3/29/2022
Adversarial Motion PriorsSubstitution for Complex Reward FunctionsStyle Reward LearningSimulated Reinforcement LearningTransfer of Naturalistic Strategies
The study introduces using 'style rewards' from motion capture data to replace complex reward functions for training agents, promoting natural and energyefficient behaviors, leveraging Adversarial Motion Priors for effective realworld transfer without complex rewards.
02
Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs
Published:8/26/2024
3D Reconstruction from Uncalibrated Image PairsGaussian Splatting AlgorithmNovel View SynthesisExtension of MASt3R ModelScanNet++ Dataset
Splatt3R is a posefree, feedforward method for 3D reconstruction and novel view synthesis using uncalibrated image pairs, predicting 3D Gaussian parameters effectively. It employs a twostage training strategy for geometry and synthesis, achieving realtime rendering and excell
02
Grounding Image Matching in 3D with MASt3R
Published:6/14/2024
3D Image MatchingDUSt3R FrameworkDense Local Feature LearningFast Matching SchemeMap-free Localization Dataset
MASt3R enhances 3D image matching accuracy by incorporating dense local feature regression and matching loss into the DUSt3R framework, while introducing a fast reciprocal matching scheme. It achieved stateoftheart performance, improving VCRE AUC by 30% in mapfree localizatio
02
DUSt3R: Geometric 3D Vision Made Easy
Published:12/22/2023
Geometric 3D VisionMulti-View Stereo ReconstructionUnconstrained Stereo ReconstructionPointmap RegressionTransformer-based Network Architecture
DUSt3R introduces a novel paradigm for 3D reconstruction, eliminating the need for camera calibration. By regressing pointmaps from images, it simplifies processes and achieves stateoftheart performance in depth and pose estimation.
03
……