Papers
Sign in to view your remaining parses.
Tag Filter
Transformer architecture
Octo: An Open-Source Generalist Robot Policy
Published:5/21/2024
Generalist Robot PoliciesMulti-modal action representation and modelingTransformer architectureLarge-Scale Robot Demonstration DatasetRobotic Action Learning
Octo is an opensource transformerbased generalist robot policy pretrained on 800K trajectories, enabling fast finetuning across diverse sensors and robots, guided by language or images, demonstrating strong generalization on nine platforms.
05
Large Language Diffusion Models
Published:2/14/2025
Large Language Diffusion ModelsAuto-Regressive Diffusion ModelLarge Language Model Fine-TuningTransformer architectureProbabilistic Inference Generation
LLaDA, a diffusionbased large language model, uses masking and reverse generation with Transformers to predict tokens, optimizing likelihood bounds. It matches autoregressive baselines in diverse tasks and excels in context learning, demonstrating diffusion models’ promise for s
03
The Devil in Linear Transformer
Published:10/19/2022
Transformer architectureLong-Context ModelingSparse Attention EfficiencyTransformer-Based Efficient Forward Prediction
This paper identifies unbounded gradients and attention dilution as key flaws in kernelbased linear transformers, then introduces TransNormer, which stabilizes training via normalized attention and enhances local focus with diagonal attention, achieving superior accuracy and eff
02