Tags: LLM Inference Acceleration - Paper Library

PLAIN: Leveraging High Internal Bandwidth in PIM for Accelerating Large Language Model Inference via Mixed-Precision Quantization

Published:10/26/2025

DRAM-PIM Deep Learning AccelerationLLM Inference AccelerationMixed-Precision Quantization AlgorithmHigh Bandwidth Memory OptimizationPIM-Based Computation Scheduling

PLAIN is a novel software/hardware codesign framework for accelerating large language model inference through mixedprecision quantization. It optimizes parameter quantization and leverages PIM characteristics, achieving up to 5.03x and 1.69x performance improvements with neglig

Papers