Papers

Sign in to view your remaining parses.
Tag Filter
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Published:6/16/2024
Query-Aware Sparsity OptimizationLong-Context Inference in Large Language ModelsSelf-Attention AccelerationKV Cache Selection AlgorithmLong-Dependency Task Performance Optimization
This paper presents Quest, a queryaware KV cache selection algorithm, enhancing the efficiency of longcontext LLM inference. By tracking critical key values, Quest achieves up to 7.03x speedup in selfattention while maintaining high accuracy on longdependency tasks.
01
SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
Published:9/6/2025
Acceleration of Vision-Language-Action ModelsAction-Aware Self-Speculative PruningTraining-Free Pruning MethodsDynamic Layer-Level PruningLIBERO Benchmark
SpecPruneVLA accelerates VisionLanguageAction models by integrating local and global information for efficient pruning. It employs static and dynamic pruning strategies, achieving 1.46x speedup on NVIDIA A800 and 1.57x on RTX 3090, with minimal success rate loss.
02
Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search
Published:12/1/2025
Ranking Relevance Optimization in Xiaohongshu SearchReinforcement Learning-based Generative Relevance ModelsMulti-Step Reasoning Prompt DesignStepwise Advantage Masking StrategyBusiness-Specific Relevance Criteria
This study reformulates ranking relevance in Xiaohongshu search as a reasoning task, utilizing a reinforcement learning framework to enhance generative relevance models. Key innovations include multistep reasoning prompts and a stepwise advantage masking strategy, significantly
02
HierPrompt: Zero-Shot Hierarchical Text Classification with LLM-Enhanced Prototypes
Published:1/1/2025
Zero-Shot Hierarchical Text ClassificationLLM-Enhanced PrototypesHierarchical Prototype RefinementExample Text Prototype (ETP)Maximum Similarity Propagation (MSP)
HierPrompt is proposed for zeroshot hierarchical text classification, enhancing prototype representation and informativeness. It introduces Example Text Prototypes and Category Name Prototypes, utilizing Maximum Similarity Propagation to improve prototype construction, showing s
05
Interactive Design of Stylized Walking Gaits for Robotic Characters
Published:7/19/2024
Robotic Action LearningLocomotion Skill Training for Humanoid RobotsDynamic Motion GenerationInteractive Robot DesignGait Generation Models
This paper presents an interactive system for creating stylized bipedal gaits for robotic characters, combining artistdirected tools with a modelbased control stack to generate physically constrained motions in real time.
02
Design and Control of a Bipedal Robotic Character
Published:1/9/2025
Bipedal Robot ControlDynamic Gait GenerationReinforcement Learning for Robotic ControlEntertainment Robot DesignHuman-Robot Interaction Interface
This study introduces a bipedal robot that integrates expressive artistic movements with robust dynamic mobility for entertainment applications, utilizing a reinforcement learning control architecture to perform complex actions based on command signals, enhanced by an animation e
03
Real-Time Machine Learning: The Missing Pieces
Published:3/11/2017
Real-Time Machine Learning FrameworkDynamic Decision Feedback LoopHigh Throughput Distributed ExecutionAdaptive Task Graph ConstructionHeterogeneous Kernel Execution
The paper discusses the evolution of machine learning applications as they shift from static model predictions to realtime feedback loops. It identifies new challenges for existing distributed execution frameworks and proposes a novel architecture that achieves a 63x performance
02
Ray: A Distributed Framework for Emerging AI Applications
Published:12/16/2017
Distributed System FrameworkReinforcement Learning ApplicationsDynamic SchedulerTask-Parallel ComputationActor Model
Ray is a distributed framework designed for emerging AI applications, especially in reinforcement learning, providing a unified interface for taskparallel and actorbased computations, achieving over 1.8 million tasks per second with superior performance.
02
HybridFlow: A Flexible and Efficient RLHF Framework
Published:9/28/2024
RL Training for Large Language ModelsHybrid Controller RL FrameworkReinforcement Learning from Human FeedbackDataflow Computation ModelDistributed Computation Optimization
HybridFlow is a hybrid framework that integrates single and multicontroller paradigms to enhance the efficiency and flexibility of RLHF systems. It features hierarchical APIs and a 3DHybridEngine for efficient model weight repartitioning, achieving 1.53 to 20.57 times throughpu
01
Olaf: Bringing an Animated Character to Life in the Physical World
Published:12/19/2025
Animated Character Mechanical DesignReinforcement Learning Control MechanismAnimation Performance in the Physical WorldSound and Heat Optimization StrategiesRobotic Smooth Motion Learning
This paper brings the animated character Olaf to the physical world, utilizing reinforcement learning for control. It introduces a compact mechanical design with hidden asymmetrical legs, and strategies for noise reduction and temperature control, validating the model's effective
03
Revisiting Feature Prediction for Learning Visual Representations from Video
Published:2/16/2024
Video Feature PredictionSelf-Supervised Visual Representation LearningV-JEPA ModelVideo-Based Model TrainingVision Transformer
The VJEPA model promotes unsupervised learning from 2 million videos with a standalone feature prediction objective, demonstrating versatility in visual representations and strong performance on motion and appearance tasks without relying on pretrained encoders.
03
VL-JEPA: Joint Embedding Predictive Architecture for Vision-language
Published:12/12/2025
Vision-Language ModelsJoint Embedding Predictive ArchitectureOpen-Vocabulary ClassificationText-to-Video RetrievalSelective Decoding
VLJEPA introduces a visionlanguage model using Joint Embedding Predictive Architecture that predicts continuous text embeddings, outperforming traditional models with 50% fewer parameters. It also supports selective decoding, enhancing efficiency for tasks like openvocabulary
04
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning
Published:6/12/2025
Video Understanding and PlanningSelf-Supervised Video ModelsV-JEPA 2 ArchitectureMachine Action PredictionLarge Language Model for Video QA
VJEPA 2 is a selfsupervised video model that combines vast video data with limited robot interaction data, achieving stateoftheart performance in motion understanding and human action prediction, while also excelling in video questionanswering tasks.
03
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Published:1/20/2023
Self-Supervised Learning from ImagesImage-based Joint-Embedding Predictive ArchitectureVision TransformerImageNet DatasetSemantic Image Representations
The Imagebased JointEmbedding Predictive Architecture (IJEPA) is introduced for efficient selfsupervised learning, predicting representations from a single context block. It performs remarkably on ImageNet without manual data augmentation, demonstrating high computational eff
02
Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces
Published:6/18/2025
Generative Optimization FrameworkParallel Program Performance ImprovementAgent-System InterfaceDomain-Specific LanguageHigh-Performance Mapper Development
This study introduces a generative optimization framework that automates highperformance mapper development through an AgentSystem Interface, significantly enhancing parallel program performance with a 3.8x improvement in just 10 iterations.
02
SAM 3D Body: Robust Full-Body Human Mesh Recovery
Single-Image Full-Body 3D Human Mesh RecoveryMomentum Human RigUser-Guided InferenceMulti-Stage Annotation PipelineHigh-Quality Annotation Generation
The SAM 3D Body (3DB) model achieves stateoftheart performance in singleimage 3D human mesh recovery, utilizing the Momentum Human Rig for parametric representation and enabling userguided inference. It enhances data quality through a multistage annotation process and is op
04
VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning
Published:7/8/2022
Multi-Agent Reinforcement LearningOpen-Source Benchmarking ToolFast Simulation FrameworkVectorized Physics EngineProximal Policy Optimization Algorithm
VMAS is an opensource simulator that enhances scalability and efficiency in multiagent reinforcement learning, utilizing a PyTorchbased 2D physics engine for parallel simulations, achieving over 100x speed improvement compared to OpenAI MPE.
03
InfoDCL: Informative Noise Enhanced Diffusion Based Contrastive Learning
Published:12/18/2025
Diffusion Model Contrastive LearningContrastive Learning in Recommendation SystemsUser Preference ModelingGraph Convolutional Network InferenceInformative Noise Enhancement
InfoDCL introduces a novel framework that combines a singlestep diffusion process with auxiliary semantic information to generate authentic user preferences, enhancing contrastive learning. It transforms interference between generation and preference learning into collaboration,
05
Training-Free Efficient Video Generation via Dynamic Token Carving
Published:5/23/2025
Efficient Inference of Video Diffusion ModelsDynamic Token CarvingProgressive Resolution GenerationBlock-Wise Attention MechanismVideo Generation Acceleration
The paper presents Jenga, a trainingfree method for efficient video generation that addresses the computational bottlenecks of Video Diffusion Transformers. Jenga achieves 8.83x speedup while maintaining generation quality, significantly enhancing practical application efficienc
03
WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training
Published:2/28/2025
Long-Context ModelingLarge Language Model TrainingWeight Pipeline ParallelismDistributed Training OptimizationCommunication Efficiency Enhancement
WeiPipe is a weight pipeline parallelism method that effectively reduces communication costs in large model training by overlapping communication and computation, significantly enhancing scalability and throughput compared to existing methods.
03