Page 7 - Paper Library - AiPaper

PIPEMESH: Achieving Memory-Efficient Computation-Communication Overlap for Training Large Language Models

Published:1/1/2025

Training Efficiency Optimization for Large Language ModelsElastic Pipeline SchedulingMixed Sharding StrategyCommunication-Computational OverlapMemory Optimization Techniques

PIPEMESH introduces an elastic pipeline scheduling method to enhance the efficiency of computationcommunication overlap in training large language models. It utilizes mixed sharding and selective recomputation, achieving a 20.1% to 33.8% increase in throughput while reducing mem

FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing

Published:6/10/2025

Frequency Bias MitigationDeepfake DetectionFrequency Feature AugmentationConsistency RegularizationCross-Domain Generalization

The paper introduces FreqDebias, a framework addressing spectral bias in deepfake detection by leveraging Forgery Mixup and dual consistency regularization, significantly enhancing crossdomain generalization and outperforming stateoftheart methods.

Universal Method for Enhancing Dynamics in Neural Networks via Memristor and Application in IoT-Based Robot Navigation

Published:1/1/2025

IoT-Based Robot Navigation with Memristor Neural NetworksDynamic Enhancement in Multimodal Neural NetworksCentral Cyclic Neural NetworksMemristive Central Cyclic Neural NetworksRobot Motion Performance Evaluation

This study presents a universal method for enhancing the dynamics of memristive neural networks, improving IoTbased robots' navigation and security in complex environments through various dynamic models and experimental validations.

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

Published:9/1/2023

Multi-Task Robotic ManipulationGeneralizable Neural Feature FieldsVisual Behavior CloningPerceiver TransformerStable Diffusion Model

GNFactor proposes a behavior cloning agent using Generalizable Neural Feature Fields, enhancing robots' multitask manipulation in complex environments by optimizing reconstruction and decisionmaking modules. It significantly improves 3D structure understanding and semantic comp

Multi-User Redirected Walking in Separate Physical Spaces for Online VR Scenarios

Published:3/2/2023

Multi-User Redirected WalkingOnline Virtual Reality ScenariosUser Fairness StrategyVirtual Environment CoordinationImmersive Experience Optimization

This paper introduces a novel multiuser redirected walking method to address locomotion fairness issues in online multiplayer VR, significantly reducing reset occurrences while enhancing immersive experiences for users.

A Study on Multi-User Interaction-based Redirected Walking

Published:10/13/2023

Multi-User InteractionRedirected WalkingVirtual Reality User Experience

This study investigates integrating Redirected Walking (RDW) in multiuser VR, analyzing how user interactions can mask discrete manipulations. Findings reveal 81% of participants were unaware of translations, providing developers with practical guidance for effective RDW use in

Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation

Published:9/13/2022

Multi-Task Transformer for Robotic ManipulationPerceiver Transformer6-DoF ManipulationLanguage-Conditioned Behavior CloningRGB-D Voxel Observations

The PerAct framework enhances robotic manipulation efficiency under data scarcity by transforming RGBD observations into voxel representations with a Perceiver Transformer. It demonstrates strong performance on 18 simulated and 7 realworld tasks using few demonstrations, outper

CONCURRENCY CONTROL IN REAL TIME DATABASE SYSTEMS: ISSUES AND CHALLENGES

Concurrency Control in Real-Time Database SystemsTransaction Prioritization in Real-Time DatabasesChallenges in Real-Time Database SystemsResearch on Concurrency Control Techniques

RealTime Database Systems (RTDBS) face unique challenges requiring prioritized transaction execution within strict time constraints. Existing probabilistic concurrency control techniques are unsuitable for RTDBS. This paper explores these issues and proposes new adaptive control

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Published:8/30/2024

Real-Time Speech Interaction ModelText-Instructed Speech GenerationVoiceAssistant-400K DatasetStreaming Inference MethodsEnd-to-End Conversational System

The paper presents MiniOmni, an endtoend opensource realtime speech interaction model that generates text and audio simultaneously using textinstructed speech generation and batchparallel inference. It also introduces the VoiceAssistant400K dataset to enhance voice assist

Information to Users

Published:9/1/1989

Training-Free Acceleration MethodsLLM Security MechanismRobotic Action LearningMath Reasoning BenchmarksText-to-Image Generation

The paper examines concurrency control algorithms for realtime database systems, highlighting existing technical flaws and potential methods to enhance algorithm efficiency, contributing significantly to improving the reliability of realtime data processing.

Spatial Intention Maps for Multi-Agent Mobile Manipulation

Published:5/30/2021

Multi-Agent Mobile ManipulationSpatial Intention MapsVision-Based Deep Reinforcement LearningDecentralized CollaborationMulti-Robot Cooperative Behavior

This paper introduces spatial intention maps for enhancing coordination in multiagent mobile manipulation, converting each agent's intentions into a 2D overhead map aligned with visual input. Experiments show significant performance improvements and enhanced cooperative behavior

Recent Advances in Discrete Speech Tokens: A Review

Published:2/10/2025

Discrete Speech TokensReview of Speech Representation TechnologiesAcoustic and Semantic TokensIntegration of Speech into Large Language ModelsDiscrete Speech Tokenization

This review establishes a classification for discrete speech tokens in large language models, examining acoustic and semantic tokens through systematic comparisons. It highlights the importance of discretization for textfree speech modeling and outlines ongoing challenges and fu

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Published:5/7/2025

VITA-Audio Multimodal Language ModelFast Audio-Text GenerationLightweight Cross-Modal Token Prediction ModuleReal-Time Conversational CapabilitySpeech Recognition and Synthesis Tasks

VITAAudio is an endtoend speechlanguage model that reduces latency in audio token generation using a lightweight Multiple Crossmodal Token Prediction module, achieving a 3 to 5 times inference speedup, enabling realtime conversation capabilities.

LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models

Published:10/17/2025

Speech Large Language ModelAudio Tokenization and DetokenizationMulti-Stage Training StrategyLow-Bitrate High-Quality Speech SynthesisAcoustic Feature Extraction

LongCatAudioCodec is an audio tokenizer and detokenizer solution for industrial speech large language models, utilizing a decoupled architecture and multistage training. It achieves high intelligibility and quality synthesis at ultralow frame rates and bitrates.

Constrained Style Learning from Imperfect Demonstrations under Task Optimality

Published:7/13/2025

Constrained Style LearningLearning from Imperfect DemonstrationsTask Optimality in Reinforcement LearningRobot Style ImitationAdaptive Lagrangian Multiplier

The study proposes ConsMimic, a method that models style imitation from imperfect demonstrations as a constrained Markov Decision Process, ensuring high task performance while capturing stylistic nuances. An adaptive Lagrangian multiplier enables selective imitation, achieving a

AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control

Published:4/6/2021

Adversarial Imitation LearningPhysics-Based Character ControlMotion Prior MechanismDynamic Selection in Reinforcement LearningUnstructured Motion Dataset

The paper presents a fully automated method called Adversarial Motion Priors (AMP) for generating graceful and realistic motions in physically simulated characters, utilizing adversarial imitation learning to simplify task objectives and learn behavior styles from unstructured mo

020

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions

Published:3/29/2022

Adversarial Motion PriorsSubstitution for Complex Reward FunctionsStyle Reward LearningSimulated Reinforcement LearningTransfer of Naturalistic Strategies

The study introduces using 'style rewards' from motion capture data to replace complex reward functions for training agents, promoting natural and energyefficient behaviors, leveraging Adversarial Motion Priors for effective realworld transfer without complex rewards.

Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs

Published:8/26/2024

3D Reconstruction from Uncalibrated Image PairsGaussian Splatting AlgorithmNovel View SynthesisExtension of MASt3R ModelScanNet++ Dataset

Splatt3R is a posefree, feedforward method for 3D reconstruction and novel view synthesis using uncalibrated image pairs, predicting 3D Gaussian parameters effectively. It employs a twostage training strategy for geometry and synthesis, achieving realtime rendering and excell

Grounding Image Matching in 3D with MASt3R

Published:6/14/2024

3D Image MatchingDUSt3R FrameworkDense Local Feature LearningFast Matching SchemeMap-free Localization Dataset

MASt3R enhances 3D image matching accuracy by incorporating dense local feature regression and matching loss into the DUSt3R framework, while introducing a fast reciprocal matching scheme. It achieved stateoftheart performance, improving VCRE AUC by 30% in mapfree localizatio

DUSt3R: Geometric 3D Vision Made Easy

Published:12/22/2023

Geometric 3D VisionMulti-View Stereo ReconstructionUnconstrained Stereo ReconstructionPointmap RegressionTransformer-based Network Architecture

DUSt3R introduces a novel paradigm for 3D reconstruction, eliminating the need for camera calibration. By regressing pointmaps from images, it simplifies processes and achieves stateoftheart performance in depth and pose estimation.

121 - 140 / 980

Papers