AINeutralarXiv – CS AI · 3h ago6/10
🧠
LoSATok: Low-dimensional Semantic-Acoustic Tokenizer for Cross-Domain Audio Understanding and Generation
Researchers introduce LoSATok, a novel audio tokenizer that compresses high-dimensional semantic features into 128-dimensional representations while preserving understanding and generation capabilities. The innovation combines semantic bottleneck compression with dual-level supervision to improve performance for speech, music, and audio generation tasks across diffusion transformer models.