Papers
Sign in to view your remaining parses.
Tag Filter
RLPIR: Reinforcement Learning with Prefix and Intrinsic Reward
Published:10/8/2025
RL Training for Large Language ModelsSequence Policy OptimizationTraining-Free Acceleration MethodsLLM Reasoning Capacity Enhancement
RLPIR introduces a verifierfree RL framework using prefix rollout and intrinsic rewards, achieving comparable performance to RLVR with 7× faster training and 45% shorter reasoning sequences, enhancing LLM efficiency without relying on ground truth.
01
JURY-RL: Votes Propose, Proofs Dispose for Label-Free RLVR
Published:10/8/2025
RL Training for Large Language ModelsTraining-Free Acceleration MethodsReinforcement Learning for Math ReasoningSequence Policy Optimization
JURYRL separates answer proposal via voting from reward disposal via theorem proving, uses ResZero for unverifiable cases, stabilizing RL training and outperforming labelfree baselines in reasoning and code tasks, rivaling supervised training.
03
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards
Published:10/8/2025
RL Training for Large Language ModelsSequence Policy OptimizationReinforcement Learning for Math Reasoning
ROVER algorithm leverages the special MDP structure of RLVR in math reasoning, recovering optimal actions from fixed random policy valuation, bypassing complex policy iteration. It enhances LLM reasoning quality and diversity efficiently and simply.
02
Tree Search for LLM Agent Reinforcement Learning
Published:10/8/2025
RL Training for Large Language ModelsSequence Policy OptimizationTree Search Reinforcement LearningGroup Relative Advantage Estimation
This work introduces TreeGRPO, a tree search method improving rollout efficiency and generating stepwise supervision to enhance multiturn RL for LLM agents, outperforming chainbased approaches across diverse QA datasets.
03
Stratified GRPO: Handling Structural Heterogeneity in Reinforcement Learning of LLM Search Agents
Published:10/8/2025
RL Training for Large Language ModelsSequence Policy OptimizationCross-Stratum Bias CorrectionStratified Advantage NormalizationReinforcement Learning with Structural Heterogeneity
Stratified GRPO with Stratified Advantage Normalization eliminates crossstratum bias in heterogeneous LLM search agent trajectories, yielding unbiased, stable credit assignment and superior multistep RL performance.
03
Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks
Published:10/8/2025
Sequence Policy OptimizationHierarchy-of-Groups Policy OptimizationLong-Horizon Reinforcement LearningRL Training for Large Language ModelsHistorical Context Consistency Modeling
This paper proposes HGPO to address context inconsistency in longhorizon tasks by hierarchical grouping and adaptive advantage aggregation, improving biasvariance tradeoffs and outperforming existing RL methods without extra models.
01
Octo: An Open-Source Generalist Robot Policy
Published:5/21/2024
Generalist Robot PoliciesMulti-modal action representation and modelingTransformer architectureLarge-Scale Robot Demonstration DatasetRobotic Action Learning
Octo is an opensource transformerbased generalist robot policy pretrained on 800K trajectories, enabling fast finetuning across diverse sensors and robots, guided by language or images, demonstrating strong generalization on nine platforms.
05
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment
Published:6/10/2025
Language-Guided Audio-Visual LearningMultimodal Action Knowledge GraphAction-Music Consistency EvaluationLong-Term Sports AssessmentAudio-Visual Cross-Modal Fusion
Proposed a languageguided audiovisual learning framework using action knowledge graphs and crossmodal fusion, achieving stateoftheart longterm sports assessment with low computational cost on four public benchmarks.
02
MagicDrive-V2: High-Resolution Long Video Generation for Autonomous
Driving with Adaptive Control
Published:11/21/2024
Long Video Generation for Autonomous DrivingMulti-View Video GenerationSpatio-Temporal Conditional EncodingDiffusion Model Video SynthesisGeometric Control Methods
MagicDriveV2 uses MVDiT blocks and spatiotemporal encoding for highres, multiview autonomous driving videos with precise geometric and textual control, achieving 3.3× resolution and 4× frame rate improvements with progressive training, enabling broader applications.
06
$π_0$: A Vision-Language-Action Flow Model for General Robot Control
Published:11/1/2024
Vision-Language-Action ModelGeneralist Robot PoliciesMultimodal Robot LearningLLM-guided motion planning
This work introduces , combining a pretrained visionlanguage model with flow matching for precise, multirobot control, enabling zeroshot languagedriven dexterous tasks and enhanced generalization across diverse platforms.
02
CARE: Contextual Adaptation of Recommenders for LLM-based Conversational
Recommendation
Published:8/19/2025
LLM-based Recommendation SystemsConversational Recommender SystemsContext-Aware RecommendationEntity-Level Recommendation EnhancementRecommendation Reranking
CARE framework integrates external recommenders with LLMs, enabling domain adaptation and leveraging context and collaborative relationships to improve accuracy and diversity in conversational recommendations.
03
Towards Lightweight and Robust Machine Learningfor CDN Caching
CDN Caching OptimizationReinforcement Learning for Network OptimizationLightweight Decision Tree ModelsDelayed Reward MechanismDomain-Specific Modeling
By explicitly modeling optimal caching, this work simplifies CDN caching optimization and uses lightweight decision trees to outperform heuristics, addressing reinforcement learning’s delayed reward challenges for efficient, robust caching.
03
TV-Rec: Time-Variant Convolutional Filter for Sequential Recommendation
Published:10/29/2025
Sequential Recommender SystemsTime-Variant Convolutional FiltersUser Behavior Sequence ModelingGraph Signal ProcessingRecommendation Inference Acceleration
TVRec introduces timevariant convolutional filters inspired by graph signal processing, replacing fixed filters and selfattention to model temporal user behavior variations, improving expressiveness, reducing computation, and boosting accuracy by 7.49% on six benchmarks.
05
Machine learning and deep learning based predictive quality in manufacturing: a systematic review
Published:5/28/2022
Predictive Quality in ManufacturingManufacturing Process Data AnalysisMachine Learning-Based Quality PredictionDeep Learning for Quality InspectionData-Driven Decision Making in Manufacturing
This review systematically analyzes 20122021 studies on ML and DL for predictive quality in manufacturing, categorizing methods and data usage, identifying challenges, and outlining future research to advance datadriven quality assurance.
02
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image
Generative Models Great Again
Published:7/30/2025
Discrete Autoregressive Image GenerationReinforcement Learning for Image Generation OptimizationSemantic Image TokenizerUnified Language-Image Autoregressive ModelingOffline Diffusion Decoder
XOmni employs reinforcement learning to enhance discrete autoregressive image generation, integrating a semantic tokenizer, a unified languageimage model, and an offline diffusion decoder to improve visual fidelity and instruction adherence.
02
Bi-Level Optimization for Generative Recommendation: Bridging
Tokenization and Generation
Published:10/24/2025
Generative Recommendation SystemsBi-Level Optimization FrameworkJoint Optimization of Tokenizer and RecommenderMeta-Learning ApproachGradient Conflict Mitigation
BLOGER unifies tokenizer and recommender optimization via bilevel learning and gradient surgery, enhancing item identifiers' quality and aligning with recommendation goals for improved generative recommendation performance.
05
Pctx: Tokenizing Personalized Context for Generative Recommendation
Published:10/24/2025
Generative Recommendation SystemsPersonalized Context TokenizationAutoregressive Recommendation ModelsUser Interaction History ModelingSemantic ID Representation in Recommendation
This paper introduces a personalized contextaware tokenizer generating contextdependent semantic IDs, enhancing generative recommendation personalization and improving NDCG@10 by up to 11.44% across datasets.
03
Scaling Laws for Neural Language Models
Published:1/23/2020
Scaling Laws for Neural Language ModelsLanguage Model Performance ModelingCompute Resource Allocation OptimizationModel Capacity and Training Efficiency
The paper shows that language model loss follows powerlaw scaling with model size, data, and compute, while depth/width effects are minimal. It reveals quantitative laws of overfitting and training speed, enabling optimal compute allocation favoring large, sampleefficient model
03
Señorita-2M: A High-Quality Instruction-based Dataset for General
Video Editing by Video Specialists
Published:2/11/2025
Instruction-based Video Editing DatasetHigh-Quality Video Editing PairsEnd-to-End Video Editing MethodsPre-trained Video Generation ModelsVideo Editing Filtering Pipeline
Señorita2M offers 2M highquality video editing pairs from four specialized models, with a filtering pipeline improving data quality, advancing endtoend video editing with faster inference and superior results.
04
Ambiguity, Nondeterminism and State Complexity of Finite Automata
Ambiguity Analysis of Finite AutomataNondeterminism Measures in Finite AutomataState Complexity of Finite AutomataComparisons of Nondeterministic Finite Automata
This paper surveys measures of ambiguity and nondeterminism in finite automata, focusing on their impact on state complexity, and reveals how increased ambiguity or nondeterminism can reduce the number of states needed in NFAs.
02
……