Tags: Large Language Model Fine-Tuning - Paper Library

ModuLoRA is a memoryefficient finetuning algorithm that enables 2/3/4bit precision tuning of 65B LLMs on a 24GB consumer GPU, integrating any weight quantizer for improved performance across various tasks with significantly reduced memory usage.

Jenga: Enhancing LLM Long-Context Fine-tuning with Contextual Token Sparsity

Large Language Model Fine-TuningLong-Context ModelingSparse Attention Mechanism

Jenga is a novel LLM finetuning system that optimizes activation memory usage in longcontext applications using Contextual Token Sparsity. It employs token elimination, pattern prediction, and kernel optimization, achieving up to 1.93x memory reduction and 1.36x acceleration ov

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

Published:4/9/2021

RL Training for Large Language ModelsLarge Language Model Fine-TuningTransformer-Based Efficient Forward PredictionGPU Cluster TrainingPipeline Parallel Training

This paper introduces a novel interleaved pipeline parallelism schedule, combining tensor, pipeline, and data parallelism, to enhance the training efficiency of large language models on GPU clusters, achieving 502 petaFLOP/s on 3072 GPUs with over 10% throughput improvement.

Recommender Systems in the Era of Large Language Models (LLMs)

Published:7/5/2023

LLM-based Recommendation SystemsLarge Language Model Fine-TuningGenerative Recommendation SystemsPre-training and Fine-tuning in Recommender SystemsPrompting Methods for Large Language Models

This paper reviews techniques for enhancing recommender systems using Large Language Models (LLMs), focusing on pretraining, finetuning, and prompting. It highlights LLMs' potential in feature encoding and their future applications in recommender system research.

LoRA: Low-Rank Adaptation of Large Language Models

Published:6/18/2021

Low-Rank Adaptation for Large Language ModelsTransformer architectureLarge Language Model Fine-TuningParameter Efficiency OptimizationRoBERTa and Its Derivatives

LoRA introduces a lowrank adaptation method for finetuning large language models, significantly reducing trainable parameters by injecting rank decomposition matrices while freezing the model weights. It achieves comparable or better performance on RoBERTa, DeBERTa, GPT2, and

SCALING LARGE LANGUAGE MODELS FOR NEXT-GENERATION SINGLE-CELL ANALYSIS

Published:4/17/2025

Large Language Model Fine-TuningSingle-Cell RNA SequencingCell Text ModelingBiological Information SynthesisMulticellular Context Reasoning

This study introduces a novel approach using the Cell2Sentence framework to convert singlecell RNA sequencing data into textual 'cell sentences,' training large language models on over a billion tokens. Scaling to 27 billion parameters resulted in enhanced performance in multice

Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning

Published:10/11/2025

Harmful Fine-Tuning MitigationLLM Security MechanismLarge Language Model Fine-Tuning

Pharmacist curates highquality, safetycritical alignment data, enhancing defense and inference in large language models against harmful finetuning while reducing training time, outperforming current alignmentstage defenses.

Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning

Published:8/19/2024

Harmful Fine-Tuning MitigationLarge Language Model Fine-TuningLLM Security Mechanism

Antidote mitigates harmful finetuning attacks on large language models by oneshot pruning harmful parameters postfinetuning, independent of hyperparameters, effectively reducing harmful outputs while preserving task accuracy.

CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG

Published:4/11/2025

Large Language Model Fine-TuningRetrieval-Augmented ReasoningLLM Security MechanismCredibility-Aware Attention ModificationLLM Reasoning Capacity Enhancement

CrAM dynamically adjusts influential attention heads in LLMs to reduce lowcredibility document impact in RAG, improving misinformation resistance by over 20%, outperforming supervised finetuning across datasets and models.

A Survey on Generative Recommendation: Data, Model, and Tasks

Published:10/31/2025

Generative Recommendation SystemsLarge Language Model Fine-TuningDiffusion ModelsMultimodal Large Language ModelLLM-based Recommendation Systems

This survey reviews generative recommendation via a unified framework, analyzing data augmentation, model alignment, and task design, highlighting innovations in large language and diffusion models that enable knowledge integration, natural language understanding, and personalize

011

Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation

Published:9/3/2024

Harmful Fine-Tuning MitigationLarge Language Model Fine-TuningLLM Security MechanismWeight Perturbation MitigationAlignment-Stage Optimization

Booster introduces a loss regularizer during alignment to attenuate harmful weight perturbations from finetuning, effectively reducing harmful outputs while preserving downstream task performance in large language models.

Grounded in Reality: Learning and Deploying Proactive LLM from Offline Logs

Published:10/29/2025

RL Training for Large Language ModelsSequence Policy OptimizationLarge Language Model Fine-Tuning

LearntoAsk learns proactive LLMs from offline expert logs without simulators by leveraging observed future data to infer turnbyturn rewards, decomposing longhorizon tasks for effective training and deployment in realworld highstakes domains.

Large Language Models as Realistic Microservice Trace Generators

Published:12/16/2024

Large Language Model Fine-TuningMicroservice Call Graph GenerationSynthetic Workload Trace GenerationRecursive Generation MethodInstruction Tuning

This study finetunes large language models with recursive generation and instruction tuning to create accurate, diverse synthetic microservice traces, effectively replacing real data and supporting downstream tasks like feature prediction and data completion.

MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation

Published:10/28/2025

Generative Recommendation SystemsLarge Language Model Fine-TuningRL Training for Large Language ModelsSequence Policy OptimizationResidual Quantized Variational Autoencoder (RQ-VAE)

MiniOneRec, the first opensource generative recommendation framework, uses Residual Quantized VAE for SID and posttrains 0.5B–7B parameter Qwen models, confirming scaling benefits and improving ranking accuracy and diversity via aligned SID processing and constrained RL.

030

Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents

Published:11/1/2023

Large Language Model Fine-TuningRL Training for Large Language ModelsLLM-guided motion planningDialogue Policy PlanningSelf-Play Reinforcement Learning

PPDPP introduces a tunable dialogue policy planner enhancing LLMs' proactive dialogue capabilities via supervised finetuning and reinforcement learning, achieving superior generalization and performance across diverse applications.

Training LLM Agents to Empower Humans

Published:10/8/2025

Large Language Model Fine-TuningLLM-guided motion planningTraining-Free Acceleration MethodsMechanism of RL Preserving Prior Knowledge

This work introduces an unsupervised LLM finetuning method maximizing human empowerment, improving assistive agent effectiveness without extra human feedback, validated by user studies and coding benchmarks with higher acceptance and success rates.

012

Self-Improving LLM Agents at Test-Time

Published:10/8/2025

Large Language Model Fine-TuningRL Training for Large Language ModelsLLM Reasoning Capacity EnhancementLLM Confidence CalibrationSelf-Improving Large Language Models

This work introduces a testtime selfimprovement method for LLM agents using uncertainty detection, selfgenerated data augmentation, and finetuning, achieving higher accuracy with fewer samples and enhancing robustness in complex tasks through distillation.

011

Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations

Published:10/8/2025

Large Language Model Fine-TuningSequence Policy OptimizationRL Training for Large Language ModelsLong-Horizon Consistency ModelingLLM Reasoning Capacity Enhancement

Hindsight Supervised Learning relabels LLM agent trajectories with actual achieved goals, using masking and reweighting to enhance finetuning in longhorizon tasks, showing improved performance and sample efficiency over baselines in ALFWorld and WebShop.

Chain of Strategy Optimization Makes Large Language Models Better Emotional Supporter

Published:3/7/2025

Sequence Policy OptimizationLarge Language Model Fine-TuningEmotional Support ConversationsPreference Bias MitigationMCTS-Based Strategy Dataset Construction

The study introduces ChainofStrategy Optimization, using MCTS to build ESCPro for finegrained strategy tuning, improving LLMs' strategy accuracy, bias mitigation, and empathetic response in emotional support conversations.

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Published:2/6/2024

RL Training for Large Language ModelsMath Reasoning BenchmarksGroup Relative Policy OptimizationLarge Language Model Fine-TuningPublic Data-Driven Pretraining

DeepSeekMath 7B refines pretraining on 120B math tokens plus code and language, introducing Group Relative Policy Optimization (GRPO) to enhance reasoning and memory efficiency, achieving 51.7% on MATH benchmark near GPT4 performance.

1 - 20 / 26

Papers