Papers
Sign in to view your remaining parses.
Tag Filter
Multimodal Contrastive Learning
Learning Spatially-Aware Language and Audio Embeddings
Published:9/18/2024
Spatially-Aware Audio and Text Embedding ModelMultimodal Contrastive LearningAudio Event Localization and DetectionOpen Vocabulary Text DescriptionsNon-Spatial Audio and Text Mapping
The paper introduces ELSA, a multimodal contrastive learning model that captures both semantic and spatial features of audio. Using synthetic spatial audio, ELSA demonstrates superior performance in semantic retrieval and 3D localization, improving accuracy over existing models.
01
SALM: Spatial Audio Language Model with Structured Embeddings for Understanding and Editing
Published:7/23/2025
Spatial Audio Language ModelMultimodal Contrastive LearningStructured Audio EmbeddingsSpatial Audio Understanding and EditingZero-Shot Direction Classification
The paper presents SALM, a model that aligns spatial audio with natural language through multimodal contrastive learning, utilizing structured embeddings for separate and joint representation of semantic and spatial information, supporting zeroshot direction classification and t
00