y0news
← Feed
Back to feed
🧠 AI🔴 BearishImportance 7/10

Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety

arXiv – CS AI|David Gringras|
🤖AI Summary

A large-scale study of 62,808 AI safety evaluations across six frontier models reveals that deployment scaffolding architectures can significantly impact measured safety, with map-reduce scaffolding degrading safety performance. The research found that evaluation format (multiple-choice vs open-ended) affects safety scores more than scaffold architecture itself, and safety rankings vary dramatically across different models and configurations.

Key Takeaways
  • Map-reduce scaffolding degrades measured AI safety with a number needed to harm of 14, while other scaffold architectures preserve safety within acceptable margins.
  • Switching from multiple-choice to open-ended evaluation formats shifts safety scores by 5-20 percentage points, exceeding any scaffold effects.
  • Model-scaffold interactions vary dramatically, with safety performance changes ranging from -16.8 to +18.8 percentage points on the same benchmark.
  • AI safety rankings reverse completely across different benchmarks, making universal safety claims unreliable.
  • The study establishes that per-model, per-configuration testing is necessary as no composite safety index achieves reliable generalizability.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles