AINeutralarXiv โ CS AI ยท 14h ago6/10
๐ง
PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk
Researchers introduce PRISM, a framework that detects AI behavioral risks by analyzing underlying reasoning hierarchies rather than individual harmful outputs. The system identifies 27 risk signals across value prioritization, evidence weighting, and information source trust, using forced-choice data from 7 AI models to distinguish between structurally dangerous, context-dependent, and balanced AI reasoning patterns.