Tags: Efficient Attention Mechanism - Paper Library

MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers

Published:11/6/2024

Transformer architectureEfficient Attention MechanismIn-Memory Lookup TablesReduction of Computational ComplexityMulti-Head Attention Operation

MemoryFormer is a novel transformer architecture that reduces computational complexity by eliminating most fullyconnected layers while retaining necessary multihead attention operations, utilizing inmemory lookup tables and hash algorithms for dynamic vector retrieval, validat

03

STORE: Semantic Tokenization, Orthogonal Rotation and Efficient Attention for Scaling Up Ranking Models

Published:11/24/2025

Scalable Ranking ModelsSemantic TokenizationOrthogonal Rotation TransformationHigh-Dimensional Feature SparsityEfficient Attention Mechanism

The paper introduces STORE, a scalable ranking framework addressing representation and computational bottlenecks in personalized recommendation systems through semantic tokenization, efficient attention, and orthogonal rotation.

03

Fast Video Generation with Sliding Tile Attention

Published:2/7/2025

Sliding Tile Attention MechanismVideo Diffusion Generation ModelsEfficient Attention MechanismHunyuanVideoComputational Efficiency Optimization

The study introduces Sliding Tile Attention (STA) to reduce computational bottlenecks in video generation, achieving 58.79% Model FLOPs Utilization while decreasing latency to 501 seconds without quality loss, demonstrating significant efficiency improvements over existing method

08

Papers