y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#risk-detection News & Analysis

4 articles tagged with #risk-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AIBullisharXiv – CS AI · May 117/10
🧠

BEAVER: An Efficient Deterministic LLM Verifier

BEAVER is a new verification framework that computes mathematically sound probability bounds on whether large language models satisfy safety properties, identifying 2-3x more risky outputs than existing methods while using 90% less computational resources. The framework addresses a critical gap in LLM deployment by providing deterministic guarantees rather than ad-hoc sampling estimates.

AIBearisharXiv – CS AI · Apr 107/10
🧠

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

Researchers introduce TraceSafe-Bench, a benchmark evaluating how well LLM guardrails detect safety risks across multi-step tool-using trajectories. The study reveals that guardrail effectiveness depends more on structural reasoning capabilities than semantic safety training, and that general-purpose LLMs outperform specialized safety models in detecting mid-execution vulnerabilities.

AIBullishOpenAI News · May 146/10
🧠

Helping ChatGPT better recognize context in sensitive conversations

OpenAI has released safety updates to ChatGPT that improve its ability to recognize context in sensitive conversations and detect potential risks over extended interactions. These enhancements enable the model to respond more safely by better understanding conversational nuance and maintaining awareness of conversation history when evaluating harmful requests.

🧠 ChatGPT
AINeutralarXiv – CS AI · Apr 146/10
🧠

PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk

Researchers introduce PRISM, a framework that detects AI behavioral risks by analyzing underlying reasoning hierarchies rather than individual harmful outputs. The system identifies 27 risk signals across value prioritization, evidence weighting, and information source trust, using forced-choice data from 7 AI models to distinguish between structurally dangerous, context-dependent, and balanced AI reasoning patterns.