Papers

Sign in to view your remaining parses.
Tag Filter
Transformer architecture
MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers
Published:11/6/2024
Transformer architectureEfficient Attention MechanismIn-Memory Lookup TablesReduction of Computational ComplexityMulti-Head Attention Operation
MemoryFormer is a novel transformer architecture that reduces computational complexity by eliminating most fullyconnected layers while retaining necessary multihead attention operations, utilizing inmemory lookup tables and hash algorithms for dynamic vector retrieval, validat
03
Scalable Diffusion Models with Transformers
Published:12/20/2022
Diffusion ModelsTransformer architectureImage GenerationScalable Diffusion ModelsClass-Conditional Image Generation
This study introduces Diffusion Transformers (DiTs), which replace UNet with a transformer architecture for image generation. Higher Gflops correlate with better performance (lower FID), with the largest model achieving stateoftheart results on ImageNet benchmarks.
05
LoRA: Low-Rank Adaptation of Large Language Models
Published:6/18/2021
Low-Rank Adaptation for Large Language ModelsTransformer architectureLarge Language Model Fine-TuningParameter Efficiency OptimizationRoBERTa and Its Derivatives
LoRA introduces a lowrank adaptation method for finetuning large language models, significantly reducing trainable parameters by injecting rank decomposition matrices while freezing the model weights. It achieves comparable or better performance on RoBERTa, DeBERTa, GPT2, and
02
Octo: An Open-Source Generalist Robot Policy
Published:5/21/2024
Generalist Robot PoliciesMulti-modal action representation and modelingTransformer architectureLarge-Scale Robot Demonstration DatasetRobotic Action Learning
Octo is an opensource transformerbased generalist robot policy pretrained on 800K trajectories, enabling fast finetuning across diverse sensors and robots, guided by language or images, demonstrating strong generalization on nine platforms.
05
Large Language Diffusion Models
Published:2/14/2025
Large Language Diffusion ModelsAuto-Regressive Diffusion ModelLarge Language Model Fine-TuningTransformer architectureProbabilistic Inference Generation
LLaDA, a diffusionbased large language model, uses masking and reverse generation with Transformers to predict tokens, optimizing likelihood bounds. It matches autoregressive baselines in diverse tasks and excels in context learning, demonstrating diffusion models’ promise for s
05
The Devil in Linear Transformer
Published:10/19/2022
Transformer architectureLong-Context ModelingSparse Attention EfficiencyTransformer-Based Efficient Forward Prediction
This paper identifies unbounded gradients and attention dilution as key flaws in kernelbased linear transformers, then introduces TransNormer, which stabilizes training via normalized attention and enhances local focus with diagonal attention, achieving superior accuracy and eff
05