Formalizes modality collapse in multimodal LLMs as a mismatched-decoder problem; information-theoretic bound shows accessible information is limited by decoder sensitivity and distributional mismatch.
Code
Shows that speech LLMs often reduce to implicit ASR→LLM cascades, with mechanistic evidence (logit lens, concept erasure) that text representations are causally required for task performance.
Introduces a structured family of local-linearity metrics that predict where linear interpretability tools succeed and how they fail.
Code
·
Demo
Controlled studies of representation geometry during training, revealing task-specific collapse floors, top-down layer reorganization, and RankMe as a precursor to capability acquisition.
Code
Dataset and evaluation showing that audio-LLMs systematically follow conflicting text over spoken audio across 8 languages, even under explicit instructions to follow the audio.
Code
·
Demo
Layer-wise decomposition of how fine-tuning distributes across transformer depth, with implications for parameter-efficient adaptation.
Code
Cross-lingual transfer methods for improving ASR in low-resource languages using non-target language data.
Interspeech 2021