Papers
Sign in to view your remaining parses.
Tag Filter
LLM Quantization
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting
Published:1/23/2025
LLM QuantizationOrthogonal and Scaling TransformationsQuantization Space Utilization RateKL-Top Loss FunctionPost-Training Quantization
The OSTQuant method optimizes large language model quantization using orthogonal and scaling transformations, addressing uneven and heavytailed data distributions. Introducing Quantization Space Utilization Rate (QSUR) effectively assesses quantizability, while the KLTop loss f
02
FlatQuant: Flatness Matters for LLM Quantization
Published:10/12/2024
LLM QuantizationPost-Training Quantization MethodsWeights and Activations FlatteningKronecker Product Matrix OptimizationLLaMA-3-70B Model Evaluation
FlatQuant introduces a new posttraining quantization method that optimizes the flatness of weights and activations, reducing quantization error significantly. It establishes a new benchmark for the LLaMA370B model, achieving less than 1% accuracy drop and up to 2.3x speed impr
05