Tags: Multimodal Contrastive Learning - Paper Library

SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing

Published:7/23/2025

Spatial Audio Language ModelMultimodal Contrastive LearningStructured Audio EmbeddingsSpatial Audio Understanding and EditingZero-Shot Direction Classification

The paper presents SALM, a model that aligns spatial audio with natural language through multimodal contrastive learning, utilizing structured embeddings for separate and joint representation of semantic and spatial information, supporting zeroshot direction classification and t

Papers