#model-debugging News & Analysis

3 articles tagged with #model-debugging. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AINeutralarXiv – CS AI · Jun 16/10

🧠

Who Gets Credit or Blame? Attributing Accountability in Modern AI Systems

Researchers propose a framework to attribute AI model behavior to specific development stages (pretraining, fine-tuning, alignment), enabling accountability tracking without model retraining. The method quantifies how each stage contributes to model outputs and can identify spurious correlations, advancing transparency in AI development.

AIBullisharXiv – CS AI · May 126/10

🧠

E-TCAV: Formalizing Penultimate Proxies for Efficient Concept Based Interpretability

Researchers introduce E-TCAV, an optimized version of TCAV that improves the efficiency and stability of neural network interpretability testing by leveraging penultimate layer representations. The method achieves linear speed-ups while maintaining accuracy, advancing practical tools for model debugging and real-time concept-guided training across vision and language tasks.

AINeutralarXiv – CS AI · Apr 146/10

🧠

From Attribution to Action: A Human-Centered Application of Activation Steering

Researchers introduce an interactive workflow combining Sparse Autoencoders (SAE) and activation steering to make AI explainability actionable for practitioners. Through expert interviews with debugging tasks on CLIP, the study reveals that activation steering enables hypothesis testing and intervention-based debugging, though practitioners emphasize trust in observed model behavior over explanation plausibility and identify risks like ripple effects and limited generalization.

$XRP