y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Is Multilingual LLM Watermarking Truly Multilingual? Scaling Robustness to 100+ Languages via Back-Translation

arXiv – CS AI|Asim Mohamed, Martin Gubri|
🤖AI Summary

Researchers demonstrate that current multilingual watermarking methods for LLMs fail to maintain robustness across medium- and low-resource languages, particularly under translation attacks. They introduce STEAM, a new detection method using Bayesian optimization that improves watermark detection across 133 languages with significant performance gains.

Key Takeaways
  • Existing multilingual watermarking methods for LLMs are not truly multilingual and fail on medium- and low-resource languages.
  • The failure stems from semantic clustering issues when tokenizers lack sufficient full-word tokens for specific languages.
  • STEAM uses Bayesian optimization to search among 133 candidate languages for optimal back-translation to recover watermark strength.
  • The method is compatible with any watermarking approach and works across different tokenizers and languages.
  • STEAM achieves average gains of +0.23 AUC and +37% TPR@1% compared to existing methods.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles