y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data

arXiv – CS AI|Ragib Amin Nihal, Benjamin Yen, Runwu Shi, Takeshi Ashizawa, Kazuhiro Nakadai|
🤖AI Summary

Researchers developed DSMIL-LocNet, a weakly supervised machine learning framework that automates both detection and temporal localization of whale calls in long-duration underwater recordings using only recording-level labels rather than frame-by-frame annotations. The system achieves F1 scores of 0.88-0.91 on recordings up to 30 minutes, significantly outperforming fully supervised baselines that degrade to 0.19-0.64 on the same task.

Analysis

DSMIL-LocNet addresses a fundamental bottleneck in marine bioacoustics: the massive annotation burden that limits deployment of automated whale monitoring systems. Traditional approaches require dual annotation pipelines—binary presence labels (quick) and precise temporal boundaries for each call (extremely time-consuming). This creates a practical ceiling on operational scale, as expert annotators cannot feasibly timestamp every call across months of continuous recordings.

The research builds on multiple instance learning (MIL), a weakly supervised technique that learns from aggregate labels rather than individual examples. By combining spectral and temporal feature streams, the dual-stream architecture handles long recordings (2-30 minutes) without temporal compression, a critical limitation of standard CNNs that collapse time-dimension information. The AcousticTrends BlueFinLibrary evaluation demonstrates dramatic performance improvements: baselines achieve only 0.19-0.64 F1 on 5-30 minute recordings, while DSMIL-LocNet maintains 0.88-0.91 F1 across all durations.

This advance enables passive acoustic monitoring systems to scale from research pilots to operational deployment. Marine conservation organizations can now process continuous PAM data with minimal manual effort, improving whale population monitoring, ship strike risk assessment, and behavioral research. The open-source release multiplies potential applications beyond whales to other marine species and acoustic ecology research.

Future developments may extend this framework to multiclass species detection, real-time processing pipelines, and integration with underwater sensor networks. The efficiency gains could redirect limited expert resources from annotation toward data interpretation and conservation action.

Key Takeaways
  • DSMIL-LocNet performs both whale call detection and temporal localization using only recording-level labels, eliminating the need for expensive frame-level annotation
  • The dual-stream architecture achieves 0.88-0.91 F1 scores on 30-minute recordings, vastly outperforming fully supervised CNN baselines that degrade to 0.19-0.64
  • Weakly supervised learning overcomes the temporal compression problem that limits standard CNNs on long-duration bioacoustic data
  • The framework enables operational-scale deployment of passive acoustic monitoring for marine conservation and whale population research
  • Open-source code release allows researchers to apply the method across species and acoustic ecology applications beyond marine mammals
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles