Papers

Sign in to view your remaining parses.
Tag Filter
LLM Reasoning Capacity Enhancement
SPECTRA: Faster Large Language Model Inference with Optimized Internal and External Speculation
Published:1/1/2025
LLM Reasoning Capacity EnhancementTraining-Free Acceleration MethodsTraining-Independent Inference OptimizationUtilization of Internal and External Speculation
SPECTRA is a novel framework that accelerates large language model inference through optimized internal and external speculation, requiring no additional training. It achieves up to 4.08x speedup over stateoftheart methods across various benchmarks, with its implementation pub
02
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
Published:5/12/2025
Sequence Policy OptimizationReinforcement Learning in Reasoning ModelsChain-of-Thought Generation Length ExtensionLLM Reasoning Capacity EnhancementTest-Time Scaling
This study introduces SGRPO, a novel reinforcement learning method that allows reasoning models to exit early during chainofthought generation, improving efficiency by evaluating intermediate reasoning steps and reducing redundancy compared to existing approaches.
02
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models
Published:12/9/2025
RL Training for Large Language ModelsLLM Reasoning Capacity EnhancementSequence Policy OptimizationLong-Context ModelingReinforcement Learning for Math Reasoning
This study examines whether reinforcement learning (RL) truly enhances reasoning capabilities in language models, offering a transparent framework. Key findings include RL's effectiveness at the model's competence edge, with minimal pretraining seed needed for transfer, while mi
04
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Published:3/22/2022
Chain-of-Thought ReasoningSelf-Consistency Decoding StrategyLLM Reasoning Capacity EnhancementComplex Reasoning TasksMath Reasoning Benchmarks
This paper introduces selfconsistency, a decoding strategy that enhances chainofthought reasoning in large language models by sampling diverse reasoning paths, significantly improving performance on benchmarks like GSM8K by 17.9%.
02
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Published:4/19/2025
Reinforcement Learning with Verifiable RewardsLLM Reasoning Capacity EnhancementMath Reasoning BenchmarksProgramming Task Reasoning AbilityComparison of RL Algorithms
This study investigates the effectiveness of Reinforcement Learning with Verifiable Rewards (RLVR) in enhancing reasoning capabilities of large language models (LLMs). It finds that current setups fail to elicit new reasoning patterns, with base models performing better at larger
01
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
RL Training for Large Language ModelsLong-Context ModelingLLM Reasoning Capacity EnhancementSparse Attention Mechanism
DeepSeekV3.2 is presented, balancing computational efficiency and reasoning capabilities through three innovations: a sparse attention mechanism that reduces complexity, a scalable reinforcement learning framework rivaling GPT5, and a synthesis pipeline enhancing generalization
072
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
Published:8/30/2025
LLM Reasoning Capacity EnhancementReinforcement Learning for Math Reasoning
The paper introduces 'ParaThinker,' a novel paradigm for scaling LLMs that utilizes native thought parallelism to overcome the bottleneck of 'Tunnel Vision' in testtime computation, significantly enhancing reasoning capabilities by synthesizing multiple diverse reasoning paths.
04
Inference Performance of Large Language Models on a 64-core RISC-V CPU with Silicon-Enabled Vectors
LLM Reasoning Capacity EnhancementRISC-V Based Hardware OptimizationSilicon-Enabled Vector ComputingEnergy-Efficient Computing ArchitecturesMatrix Multiplication Performance Benchmark
This study evaluates LLM inference performance on a 64core RISCV CPU with SiliconEnabled Vectors, revealing significant throughput and energy efficiency improvements, particularly for smaller models. It offers practical insights for deploying LLMs on future heterogeneous compu
07
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
Published:4/11/2025
Large Language Model Fine-TuningRetrieval-Augmented ReasoningLLM Security MechanismCredibility-Aware Attention ModificationLLM Reasoning Capacity Enhancement
CrAM dynamically adjusts influential attention heads in LLMs to reduce lowcredibility document impact in RAG, improving misinformation resistance by over 20%, outperforming supervised finetuning across datasets and models.
03
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
Published:9/30/2025
RL Training for Large Language ModelsLLM Reasoning Capacity EnhancementSequence Policy OptimizationMemory Mechanisms for LLMsTest-Time Scaling Techniques
ReasoningBank distills selfjudged experiences into general reasoning strategies, enabling agents to retrieve and update memories for continual improvement. Combined with MaTTS, it enhances learning efficiency and performance in continuous multitask scenarios.
06
Self-Improving LLM Agents at Test-Time
Published:10/8/2025
Large Language Model Fine-TuningRL Training for Large Language ModelsLLM Reasoning Capacity EnhancementLLM Confidence CalibrationSelf-Improving Large Language Models
This work introduces a testtime selfimprovement method for LLM agents using uncertainty detection, selfgenerated data augmentation, and finetuning, achieving higher accuracy with fewer samples and enhancing robustness in complex tasks through distillation.
011
Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations
Published:10/8/2025
Large Language Model Fine-TuningSequence Policy OptimizationRL Training for Large Language ModelsLong-Horizon Consistency ModelingLLM Reasoning Capacity Enhancement
Hindsight Supervised Learning relabels LLM agent trajectories with actual achieved goals, using masking and reweighting to enhance finetuning in longhorizon tasks, showing improved performance and sample efficiency over baselines in ALFWorld and WebShop.
04
RLPIR: Reinforcement Learning with Prefix and Intrinsic Reward
Published:10/8/2025
RL Training for Large Language ModelsSequence Policy OptimizationTraining-Free Acceleration MethodsLLM Reasoning Capacity Enhancement
RLPIR introduces a verifierfree RL framework using prefix rollout and intrinsic rewards, achieving comparable performance to RLVR with 7× faster training and 45% shorter reasoning sequences, enhancing LLM efficiency without relying on ground truth.
01
Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning
Published:6/9/2025
Causal Attention MechanismToken PruningGradient-Guided Knowledge DistillationLLM Reasoning Capacity EnhancementLong-Context Modeling
LeaF uses gradientguided token pruning to remove confounding tokens, aligning student attention with causal focus from teachers, improving reasoning accuracy and interpretability across multiple benchmarks.
06
PEARL: Towards Permutation-Resilient LLMs
Published:2/20/2025
RL Training for Large Language ModelsLLM Reasoning Capacity EnhancementSequence Policy OptimizationTraining-Free Acceleration Methods
PEARL uses distributionally robust optimization and a permutationproposal network to enhance LLMs' resilience against worstcase input orderings, effectively mitigating permutation attacks and boosting performance across varied contexts.
06
EdgeShard: Efficient LLM Inference via Collaborative Edge Computing
Published:5/23/2024
LLM Reasoning Capacity EnhancementTraining-Free Acceleration MethodsCollaborative Edge Computing InferenceModel Sharding DeploymentDynamic Programming Optimization
EdgeShard uses collaborative edge computing to shard LLMs across distributed devices, optimizing latency and throughput via dynamic programming, reducing inference delay by 50% and doubling throughput while addressing cloud dependency challenges.
06
MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Static Quantization
Published:10/25/2025
Multimodal Large Language ModelStatic QuantizationPost-Training Quantization FrameworkModality-Specific QuantizationLLM Reasoning Capacity Enhancement
MQuant introduces a posttraining static quantization framework for multimodal large language models, reducing latency and outliers via modalityspecific quantization, flexible attention switching, and rotation suppression, boosting inference efficiency across major models.
04
EcoServe: Enabling Cost-effective LLM Serving with Proactive Intra- and Inter-Instance Orchestration
Published:4/25/2025
LLM Reasoning Capacity EnhancementLLM Inference SchedulingMulti-Instance Coordinated SchedulingPartially Disaggregated Inference StrategyLarge-Scale GPU Cluster Serving
EcoServe introduces partial disaggregation with temporal decoupling and rolling activation, proactively orchestrating instances to reduce interference, enhance throughput and latency, and enable costeffective LLM serving on commodity clusters with superior performance to existin
06
dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching
Published:5/17/2025
Diffusion Model Fine-TuningEfficient Inference of Diffusion ModelsLLM Reasoning Capacity EnhancementTraining-Free Acceleration MethodsAuto-Regressive Diffusion Model
dLLMCache is a trainingfree adaptive caching method accelerating diffusion large language models by reusing intermediate computations, achieving up to 9.1× speedup on LLaDA 8B and Dream 7B without degrading output quality.
05
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
Published:4/11/2025
LLM Reasoning Capacity EnhancementTest-Time LearningPersistent Adaptive Memory MechanismUnsupervised Reasoning Enhancement
Dynamic Cheatsheet equips blackbox LLMs with persistent adaptive memory, enabling testtime learning that reuses strategies and code snippets, significantly boosting performance without labeled data or retraining.
014