Papers

Sign in to view your remaining parses.
Tag Filter
Large Language Model Fine-Tuning
ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Published:9/28/2023
Large Language Model Fine-TuningLow-Rank Adaptation FinetuningQuantization Methods2-Bit LLMs TrainingConsumer GPU Optimization
ModuLoRA is a memoryefficient finetuning algorithm that enables 2/3/4bit precision tuning of 65B LLMs on a 24GB consumer GPU, integrating any weight quantizer for improved performance across various tasks with significantly reduced memory usage.
02
Jenga: Enhancing LLM Long-Context Fine-tuning with Contextual Token Sparsity
Large Language Model Fine-TuningLong-Context ModelingSparse Attention Mechanism
Jenga is a novel LLM finetuning system that optimizes activation memory usage in longcontext applications using Contextual Token Sparsity. It employs token elimination, pattern prediction, and kernel optimization, achieving up to 1.93x memory reduction and 1.36x acceleration ov
04
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Published:4/9/2021
RL Training for Large Language ModelsLarge Language Model Fine-TuningTransformer-Based Efficient Forward PredictionGPU Cluster TrainingPipeline Parallel Training
This paper introduces a novel interleaved pipeline parallelism schedule, combining tensor, pipeline, and data parallelism, to enhance the training efficiency of large language models on GPU clusters, achieving 502 petaFLOP/s on 3072 GPUs with over 10% throughput improvement.
03
Recommender Systems in the Era of Large Language Models (LLMs)
Published:7/5/2023
LLM-based Recommendation SystemsLarge Language Model Fine-TuningGenerative Recommendation SystemsPre-training and Fine-tuning in Recommender SystemsPrompting Methods for Large Language Models
This paper reviews techniques for enhancing recommender systems using Large Language Models (LLMs), focusing on pretraining, finetuning, and prompting. It highlights LLMs' potential in feature encoding and their future applications in recommender system research.
04
LoRA: Low-Rank Adaptation of Large Language Models
Published:6/18/2021
Low-Rank Adaptation for Large Language ModelsTransformer architectureLarge Language Model Fine-TuningParameter Efficiency OptimizationRoBERTa and Its Derivatives
LoRA introduces a lowrank adaptation method for finetuning large language models, significantly reducing trainable parameters by injecting rank decomposition matrices while freezing the model weights. It achieves comparable or better performance on RoBERTa, DeBERTa, GPT2, and
02
SCALING LARGE LANGUAGE MODELS FOR NEXT-GENERATION SINGLE-CELL ANALYSIS
Published:4/17/2025
Large Language Model Fine-TuningSingle-Cell RNA SequencingCell Text ModelingBiological Information SynthesisMulticellular Context Reasoning
This study introduces a novel approach using the Cell2Sentence framework to convert singlecell RNA sequencing data into textual 'cell sentences,' training large language models on over a billion tokens. Scaling to 27 billion parameters resulted in enhanced performance in multice
05
Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
Published:10/11/2025
Harmful Fine-Tuning MitigationLLM Security MechanismLarge Language Model Fine-Tuning
Pharmacist curates highquality, safetycritical alignment data, enhancing defense and inference in large language models against harmful finetuning while reducing training time, outperforming current alignmentstage defenses.
02
Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning
Published:8/19/2024
Harmful Fine-Tuning MitigationLarge Language Model Fine-TuningLLM Security Mechanism
Antidote mitigates harmful finetuning attacks on large language models by oneshot pruning harmful parameters postfinetuning, independent of hyperparameters, effectively reducing harmful outputs while preserving task accuracy.
05
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
Published:4/11/2025
Large Language Model Fine-TuningRetrieval-Augmented ReasoningLLM Security MechanismCredibility-Aware Attention ModificationLLM Reasoning Capacity Enhancement
CrAM dynamically adjusts influential attention heads in LLMs to reduce lowcredibility document impact in RAG, improving misinformation resistance by over 20%, outperforming supervised finetuning across datasets and models.
03
A Survey on Generative Recommendation: Data, Model, and Tasks
Published:10/31/2025
Generative Recommendation SystemsLarge Language Model Fine-TuningDiffusion ModelsMultimodal Large Language ModelLLM-based Recommendation Systems
This survey reviews generative recommendation via a unified framework, analyzing data augmentation, model alignment, and task design, highlighting innovations in large language and diffusion models that enable knowledge integration, natural language understanding, and personalize
011
Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation
Published:9/3/2024
Harmful Fine-Tuning MitigationLarge Language Model Fine-TuningLLM Security MechanismWeight Perturbation MitigationAlignment-Stage Optimization
Booster introduces a loss regularizer during alignment to attenuate harmful weight perturbations from finetuning, effectively reducing harmful outputs while preserving downstream task performance in large language models.
05
Grounded in Reality: Learning and Deploying Proactive LLM from Offline Logs
Published:10/29/2025
RL Training for Large Language ModelsSequence Policy OptimizationLarge Language Model Fine-Tuning
LearntoAsk learns proactive LLMs from offline expert logs without simulators by leveraging observed future data to infer turnbyturn rewards, decomposing longhorizon tasks for effective training and deployment in realworld highstakes domains.
06
Large Language Models as Realistic Microservice Trace Generators
Published:12/16/2024
Large Language Model Fine-TuningMicroservice Call Graph GenerationSynthetic Workload Trace GenerationRecursive Generation MethodInstruction Tuning
This study finetunes large language models with recursive generation and instruction tuning to create accurate, diverse synthetic microservice traces, effectively replacing real data and supporting downstream tasks like feature prediction and data completion.
06
MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation
Published:10/28/2025
Generative Recommendation SystemsLarge Language Model Fine-TuningRL Training for Large Language ModelsSequence Policy OptimizationResidual Quantized Variational Autoencoder (RQ-VAE)
MiniOneRec, the first opensource generative recommendation framework, uses Residual Quantized VAE for SID and posttrains 0.5B–7B parameter Qwen models, confirming scaling benefits and improving ranking accuracy and diversity via aligned SID processing and constrained RL.
030
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents
Published:11/1/2023
Large Language Model Fine-TuningRL Training for Large Language ModelsLLM-guided motion planningDialogue Policy PlanningSelf-Play Reinforcement Learning
PPDPP introduces a tunable dialogue policy planner enhancing LLMs' proactive dialogue capabilities via supervised finetuning and reinforcement learning, achieving superior generalization and performance across diverse applications.
05
Training LLM Agents to Empower Humans
Published:10/8/2025
Large Language Model Fine-TuningLLM-guided motion planningTraining-Free Acceleration MethodsMechanism of RL Preserving Prior Knowledge
This work introduces an unsupervised LLM finetuning method maximizing human empowerment, improving assistive agent effectiveness without extra human feedback, validated by user studies and coding benchmarks with higher acceptance and success rates.
012
Self-Improving LLM Agents at Test-Time
Published:10/8/2025
Large Language Model Fine-TuningRL Training for Large Language ModelsLLM Reasoning Capacity EnhancementLLM Confidence CalibrationSelf-Improving Large Language Models
This work introduces a testtime selfimprovement method for LLM agents using uncertainty detection, selfgenerated data augmentation, and finetuning, achieving higher accuracy with fewer samples and enhancing robustness in complex tasks through distillation.
011
Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations
Published:10/8/2025
Large Language Model Fine-TuningSequence Policy OptimizationRL Training for Large Language ModelsLong-Horizon Consistency ModelingLLM Reasoning Capacity Enhancement
Hindsight Supervised Learning relabels LLM agent trajectories with actual achieved goals, using masking and reweighting to enhance finetuning in longhorizon tasks, showing improved performance and sample efficiency over baselines in ALFWorld and WebShop.
04
Chain of Strategy Optimization Makes Large Language Models Better Emotional Supporter
Published:3/7/2025
Sequence Policy OptimizationLarge Language Model Fine-TuningEmotional Support ConversationsPreference Bias MitigationMCTS-Based Strategy Dataset Construction
The study introduces ChainofStrategy Optimization, using MCTS to build ESCPro for finegrained strategy tuning, improving LLMs' strategy accuracy, bias mitigation, and empathetic response in emotional support conversations.
02
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Published:2/6/2024
RL Training for Large Language ModelsMath Reasoning BenchmarksGroup Relative Policy OptimizationLarge Language Model Fine-TuningPublic Data-Driven Pretraining
DeepSeekMath 7B refines pretraining on 120B math tokens plus code and language, introducing Group Relative Policy Optimization (GRPO) to enhance reasoning and memory efficiency, achieving 51.7% on MATH benchmark near GPT4 performance.
03