Page 6 - Paper Library - AiPaper

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Published:6/16/2024

Query-Aware Sparsity OptimizationLong-Context Inference in Large Language ModelsSelf-Attention AccelerationKV Cache Selection AlgorithmLong-Dependency Task Performance Optimization

This paper presents Quest, a queryaware KV cache selection algorithm, enhancing the efficiency of longcontext LLM inference. By tracking critical key values, Quest achieves up to 7.03x speedup in selfattention while maintaining high accuracy on longdependency tasks.

SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning

Published:9/6/2025

Acceleration of Vision-Language-Action ModelsAction-Aware Self-Speculative PruningTraining-Free Pruning MethodsDynamic Layer-Level PruningLIBERO Benchmark

SpecPruneVLA accelerates VisionLanguageAction models by integrating local and global information for efficient pruning. It employs static and dynamic pruning strategies, achieving 1.46x speedup on NVIDIA A800 and 1.57x on RTX 3090, with minimal success rate loss.

Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search

Published:12/1/2025

Ranking Relevance Optimization in Xiaohongshu SearchReinforcement Learning-based Generative Relevance ModelsMulti-Step Reasoning Prompt DesignStepwise Advantage Masking StrategyBusiness-Specific Relevance Criteria

This study reformulates ranking relevance in Xiaohongshu search as a reasoning task, utilizing a reinforcement learning framework to enhance generative relevance models. Key innovations include multistep reasoning prompts and a stepwise advantage masking strategy, significantly

HierPrompt: Zero-Shot Hierarchical Text Classification with LLM-Enhanced Prototypes

Published:1/1/2025

Zero-Shot Hierarchical Text ClassificationLLM-Enhanced PrototypesHierarchical Prototype RefinementExample Text Prototype (ETP)Maximum Similarity Propagation (MSP)

HierPrompt is proposed for zeroshot hierarchical text classification, enhancing prototype representation and informativeness. It introduces Example Text Prototypes and Category Name Prototypes, utilizing Maximum Similarity Propagation to improve prototype construction, showing s

Interactive Design of Stylized Walking Gaits for Robotic Characters

Published:7/19/2024

Robotic Action LearningLocomotion Skill Training for Humanoid RobotsDynamic Motion GenerationInteractive Robot DesignGait Generation Models

This paper presents an interactive system for creating stylized bipedal gaits for robotic characters, combining artistdirected tools with a modelbased control stack to generate physically constrained motions in real time.

Design and Control of a Bipedal Robotic Character

Published:1/9/2025

Bipedal Robot ControlDynamic Gait GenerationReinforcement Learning for Robotic ControlEntertainment Robot DesignHuman-Robot Interaction Interface

This study introduces a bipedal robot that integrates expressive artistic movements with robust dynamic mobility for entertainment applications, utilizing a reinforcement learning control architecture to perform complex actions based on command signals, enhanced by an animation e

Real-Time Machine Learning: The Missing Pieces

Published:3/11/2017

Real-Time Machine Learning FrameworkDynamic Decision Feedback LoopHigh Throughput Distributed ExecutionAdaptive Task Graph ConstructionHeterogeneous Kernel Execution

The paper discusses the evolution of machine learning applications as they shift from static model predictions to realtime feedback loops. It identifies new challenges for existing distributed execution frameworks and proposes a novel architecture that achieves a 63x performance

Ray: A Distributed Framework for Emerging AI Applications

Published:12/16/2017

Distributed System FrameworkReinforcement Learning ApplicationsDynamic SchedulerTask-Parallel ComputationActor Model

Ray is a distributed framework designed for emerging AI applications, especially in reinforcement learning, providing a unified interface for taskparallel and actorbased computations, achieving over 1.8 million tasks per second with superior performance.

HybridFlow: A Flexible and Efficient RLHF Framework

Published:9/28/2024

RL Training for Large Language ModelsHybrid Controller RL FrameworkReinforcement Learning from Human FeedbackDataflow Computation ModelDistributed Computation Optimization

HybridFlow is a hybrid framework that integrates single and multicontroller paradigms to enhance the efficiency and flexibility of RLHF systems. It features hierarchical APIs and a 3DHybridEngine for efficient model weight repartitioning, achieving 1.53 to 20.57 times throughpu

Olaf: Bringing an Animated Character to Life in the Physical World

Published:12/19/2025

Animated Character Mechanical DesignReinforcement Learning Control MechanismAnimation Performance in the Physical WorldSound and Heat Optimization StrategiesRobotic Smooth Motion Learning

This paper brings the animated character Olaf to the physical world, utilizing reinforcement learning for control. It introduces a compact mechanical design with hidden asymmetrical legs, and strategies for noise reduction and temperature control, validating the model's effective

Revisiting Feature Prediction for Learning Visual Representations from Video

Published:2/16/2024

Video Feature PredictionSelf-Supervised Visual Representation LearningV-JEPA ModelVideo-Based Model TrainingVision Transformer

The VJEPA model promotes unsupervised learning from 2 million videos with a standalone feature prediction objective, demonstrating versatility in visual representations and strong performance on motion and appearance tasks without relying on pretrained encoders.

VL-JEPA: Joint Embedding Predictive Architecture for Vision-language

Published:12/12/2025

Vision-Language ModelsJoint Embedding Predictive ArchitectureOpen-Vocabulary ClassificationText-to-Video RetrievalSelective Decoding

VLJEPA introduces a visionlanguage model using Joint Embedding Predictive Architecture that predicts continuous text embeddings, outperforming traditional models with 50% fewer parameters. It also supports selective decoding, enhancing efficiency for tasks like openvocabulary

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

Published:6/12/2025

Video Understanding and PlanningSelf-Supervised Video ModelsV-JEPA 2 ArchitectureMachine Action PredictionLarge Language Model for Video QA

VJEPA 2 is a selfsupervised video model that combines vast video data with limited robot interaction data, achieving stateoftheart performance in motion understanding and human action prediction, while also excelling in video questionanswering tasks.

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Published:1/20/2023

Self-Supervised Learning from ImagesImage-based Joint-Embedding Predictive ArchitectureVision TransformerImageNet DatasetSemantic Image Representations

The Imagebased JointEmbedding Predictive Architecture (IJEPA) is introduced for efficient selfsupervised learning, predicting representations from a single context block. It performs remarkably on ImageNet without manual data augmentation, demonstrating high computational eff

Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces

Published:6/18/2025

Generative Optimization FrameworkParallel Program Performance ImprovementAgent-System InterfaceDomain-Specific LanguageHigh-Performance Mapper Development

This study introduces a generative optimization framework that automates highperformance mapper development through an AgentSystem Interface, significantly enhancing parallel program performance with a 3.8x improvement in just 10 iterations.

SAM 3D Body: Robust Full-Body Human Mesh Recovery

Single-Image Full-Body 3D Human Mesh RecoveryMomentum Human RigUser-Guided InferenceMulti-Stage Annotation PipelineHigh-Quality Annotation Generation

The SAM 3D Body (3DB) model achieves stateoftheart performance in singleimage 3D human mesh recovery, utilizing the Momentum Human Rig for parametric representation and enabling userguided inference. It enhances data quality through a multistage annotation process and is op

VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning

Published:7/8/2022

Multi-Agent Reinforcement LearningOpen-Source Benchmarking ToolFast Simulation FrameworkVectorized Physics EngineProximal Policy Optimization Algorithm

VMAS is an opensource simulator that enhances scalability and efficiency in multiagent reinforcement learning, utilizing a PyTorchbased 2D physics engine for parallel simulations, achieving over 100x speed improvement compared to OpenAI MPE.

InfoDCL: Informative Noise Enhanced Diffusion Based Contrastive Learning

Published:12/18/2025

Diffusion Model Contrastive LearningContrastive Learning in Recommendation SystemsUser Preference ModelingGraph Convolutional Network InferenceInformative Noise Enhancement

InfoDCL introduces a novel framework that combines a singlestep diffusion process with auxiliary semantic information to generate authentic user preferences, enhancing contrastive learning. It transforms interference between generation and preference learning into collaboration,

Training-Free Efficient Video Generation via Dynamic Token Carving

Published:5/23/2025

Efficient Inference of Video Diffusion ModelsDynamic Token CarvingProgressive Resolution GenerationBlock-Wise Attention MechanismVideo Generation Acceleration

The paper presents Jenga, a trainingfree method for efficient video generation that addresses the computational bottlenecks of Video Diffusion Transformers. Jenga achieves 8.83x speedup while maintaining generation quality, significantly enhancing practical application efficienc

WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training

Published:2/28/2025

Long-Context ModelingLarge Language Model TrainingWeight Pipeline ParallelismDistributed Training OptimizationCommunication Efficiency Enhancement

WeiPipe is a weight pipeline parallelism method that effectively reduces communication costs in large model training by overlapping communication and computation, significantly enhancing scalability and throughput compared to existing methods.

101 - 120 / 980

Papers