Tags: LLM Reasoning Capacity Enhancement - Paper Library

SPECTRA: Faster Large Language Model Inference with Optimized Internal and External Speculation

Published:1/1/2025

LLM Reasoning Capacity EnhancementTraining-Free Acceleration MethodsTraining-Independent Inference OptimizationUtilization of Internal and External Speculation

SPECTRA is a novel framework that accelerates large language model inference through optimized internal and external speculation, requiring no additional training. It achieves up to 4.08x speedup over stateoftheart methods across various benchmarks, with its implementation pub

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Published:5/12/2025

Sequence Policy OptimizationReinforcement Learning in Reasoning ModelsChain-of-Thought Generation Length ExtensionLLM Reasoning Capacity EnhancementTest-Time Scaling

This study introduces SGRPO, a novel reinforcement learning method that allows reasoning models to exit early during chainofthought generation, improving efficiency by evaluating intermediate reasoning steps and reducing redundancy compared to existing approaches.

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Published:12/9/2025

RL Training for Large Language ModelsLLM Reasoning Capacity EnhancementSequence Policy OptimizationLong-Context ModelingReinforcement Learning for Math Reasoning

This study examines whether reinforcement learning (RL) truly enhances reasoning capabilities in language models, offering a transparent framework. Key findings include RL's effectiveness at the model's competence edge, with minimal pretraining seed needed for transfer, while mi

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Published:3/22/2022

Chain-of-Thought ReasoningSelf-Consistency Decoding StrategyLLM Reasoning Capacity EnhancementComplex Reasoning TasksMath Reasoning Benchmarks

This paper introduces selfconsistency, a decoding strategy that enhances chainofthought reasoning in large language models by sampling diverse reasoning paths, significantly improving performance on benchmarks like GSM8K by 17.9%.

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Published:4/19/2025

Reinforcement Learning with Verifiable RewardsLLM Reasoning Capacity EnhancementMath Reasoning BenchmarksProgramming Task Reasoning AbilityComparison of RL Algorithms

This study investigates the effectiveness of Reinforcement Learning with Verifiable Rewards (RLVR) in enhancing reasoning capabilities of large language models (LLMs). It finds that current setups fail to elicit new reasoning patterns, with base models performing better at larger

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

RL Training for Large Language ModelsLong-Context ModelingLLM Reasoning Capacity EnhancementSparse Attention Mechanism

DeepSeekV3.2 is presented, balancing computational efficiency and reasoning capabilities through three innovations: a sparse attention mechanism that reduces complexity, a scalable reinforcement learning framework rivaling GPT5, and a synthesis pipeline enhancing generalization

072

ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

Published:8/30/2025

LLM Reasoning Capacity EnhancementReinforcement Learning for Math Reasoning

The paper introduces 'ParaThinker,' a novel paradigm for scaling LLMs that utilizes native thought parallelism to overcome the bottleneck of 'Tunnel Vision' in testtime computation, significantly enhancing reasoning capabilities by synthesizing multiple diverse reasoning paths.

Inference Performance of Large Language Models on a 64-core RISC-V CPU with Silicon-Enabled Vectors

LLM Reasoning Capacity EnhancementRISC-V Based Hardware OptimizationSilicon-Enabled Vector ComputingEnergy-Efficient Computing ArchitecturesMatrix Multiplication Performance Benchmark

This study evaluates LLM inference performance on a 64core RISCV CPU with SiliconEnabled Vectors, revealing significant throughput and energy efficiency improvements, particularly for smaller models. It offers practical insights for deploying LLMs on future heterogeneous compu

CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG

Published:4/11/2025

Large Language Model Fine-TuningRetrieval-Augmented ReasoningLLM Security MechanismCredibility-Aware Attention ModificationLLM Reasoning Capacity Enhancement

CrAM dynamically adjusts influential attention heads in LLMs to reduce lowcredibility document impact in RAG, improving misinformation resistance by over 20%, outperforming supervised finetuning across datasets and models.

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

Published:9/30/2025

RL Training for Large Language ModelsLLM Reasoning Capacity EnhancementSequence Policy OptimizationMemory Mechanisms for LLMsTest-Time Scaling Techniques

ReasoningBank distills selfjudged experiences into general reasoning strategies, enabling agents to retrieve and update memories for continual improvement. Combined with MaTTS, it enhances learning efficiency and performance in continuous multitask scenarios.

Self-Improving LLM Agents at Test-Time

Published:10/8/2025

Large Language Model Fine-TuningRL Training for Large Language ModelsLLM Reasoning Capacity EnhancementLLM Confidence CalibrationSelf-Improving Large Language Models

This work introduces a testtime selfimprovement method for LLM agents using uncertainty detection, selfgenerated data augmentation, and finetuning, achieving higher accuracy with fewer samples and enhancing robustness in complex tasks through distillation.

011

Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations

Published:10/8/2025

Large Language Model Fine-TuningSequence Policy OptimizationRL Training for Large Language ModelsLong-Horizon Consistency ModelingLLM Reasoning Capacity Enhancement

Hindsight Supervised Learning relabels LLM agent trajectories with actual achieved goals, using masking and reweighting to enhance finetuning in longhorizon tasks, showing improved performance and sample efficiency over baselines in ALFWorld and WebShop.

RLPIR: Reinforcement Learning with Prefix and Intrinsic Reward

Published:10/8/2025

RL Training for Large Language ModelsSequence Policy OptimizationTraining-Free Acceleration MethodsLLM Reasoning Capacity Enhancement

RLPIR introduces a verifierfree RL framework using prefix rollout and intrinsic rewards, achieving comparable performance to RLVR with 7× faster training and 45% shorter reasoning sequences, enhancing LLM efficiency without relying on ground truth.

Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning

Published:6/9/2025

Causal Attention MechanismToken PruningGradient-Guided Knowledge DistillationLLM Reasoning Capacity EnhancementLong-Context Modeling

LeaF uses gradientguided token pruning to remove confounding tokens, aligning student attention with causal focus from teachers, improving reasoning accuracy and interpretability across multiple benchmarks.

PEARL: Towards Permutation-Resilient LLMs

Published:2/20/2025

RL Training for Large Language ModelsLLM Reasoning Capacity EnhancementSequence Policy OptimizationTraining-Free Acceleration Methods

PEARL uses distributionally robust optimization and a permutationproposal network to enhance LLMs' resilience against worstcase input orderings, effectively mitigating permutation attacks and boosting performance across varied contexts.

EdgeShard: Efficient LLM Inference via Collaborative Edge Computing

Published:5/23/2024

LLM Reasoning Capacity EnhancementTraining-Free Acceleration MethodsCollaborative Edge Computing InferenceModel Sharding DeploymentDynamic Programming Optimization

EdgeShard uses collaborative edge computing to shard LLMs across distributed devices, optimizing latency and throughput via dynamic programming, reducing inference delay by 50% and doubling throughput while addressing cloud dependency challenges.

MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Static Quantization

Published:10/25/2025

Multimodal Large Language ModelStatic QuantizationPost-Training Quantization FrameworkModality-Specific QuantizationLLM Reasoning Capacity Enhancement

MQuant introduces a posttraining static quantization framework for multimodal large language models, reducing latency and outliers via modalityspecific quantization, flexible attention switching, and rotation suppression, boosting inference efficiency across major models.

EcoServe: Enabling Cost-effective LLM Serving with Proactive Intra- and Inter-Instance Orchestration

Published:4/25/2025

LLM Reasoning Capacity EnhancementLLM Inference SchedulingMulti-Instance Coordinated SchedulingPartially Disaggregated Inference StrategyLarge-Scale GPU Cluster Serving

EcoServe introduces partial disaggregation with temporal decoupling and rolling activation, proactively orchestrating instances to reduce interference, enhance throughput and latency, and enable costeffective LLM serving on commodity clusters with superior performance to existin

dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching

Published:5/17/2025

Diffusion Model Fine-TuningEfficient Inference of Diffusion ModelsLLM Reasoning Capacity EnhancementTraining-Free Acceleration MethodsAuto-Regressive Diffusion Model

dLLMCache is a trainingfree adaptive caching method accelerating diffusion large language models by reusing intermediate computations, achieving up to 9.1× speedup on LLaDA 8B and Dream 7B without degrading output quality.

Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory

Published:4/11/2025

LLM Reasoning Capacity EnhancementTest-Time LearningPersistent Adaptive Memory MechanismUnsupervised Reasoning Enhancement

Dynamic Cheatsheet equips blackbox LLMs with persistent adaptive memory, enabling testtime learning that reuses strategies and code snippets, significantly boosting performance without labeled data or retraining.

014

1 - 20 / 26

Papers