#risk-detection News & Analysis

7 articles tagged with #risk-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles

AIBullisharXiv – CS AI · Jun 257/10

🧠

Yuvion VL: A Multimodal Foundation Model for Adversarial Content and AI Safety

Researchers introduce Yuvion VL, a multimodal AI foundation model specifically engineered to detect and understand adversarial content and safety risks across images and text. The model achieves industry-leading safety performance while maintaining general capabilities, addressing a critical gap in AI systems' ability to handle real-world multimodal threats.

AIBullisharXiv – CS AI · Jun 27/10

🧠

TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety

Researchers introduce TRACE, a novel safety detection system for long-horizon LLM agents that compresses extended trajectories into compact evidence states to better identify distributed risk signals. The method achieves up to 12.6 percentage points improvement over baselines across multiple safety benchmarks while maintaining performance stability as context length increases.

AIBullisharXiv – CS AI · May 117/10

🧠

BEAVER: An Efficient Deterministic LLM Verifier

BEAVER is a new verification framework that computes mathematically sound probability bounds on whether large language models satisfy safety properties, identifying 2-3x more risky outputs than existing methods while using 90% less computational resources. The framework addresses a critical gap in LLM deployment by providing deterministic guarantees rather than ad-hoc sampling estimates.

AIBearisharXiv – CS AI · Apr 107/10

🧠

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

Researchers introduce TraceSafe-Bench, a benchmark evaluating how well LLM guardrails detect safety risks across multi-step tool-using trajectories. The study reveals that guardrail effectiveness depends more on structural reasoning capabilities than semantic safety training, and that general-purpose LLMs outperform specialized safety models in detecting mid-execution vulnerabilities.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Towards AI epidemiology: a measurement standardisation framework for prospective risk detection

Researchers propose a measurement standardization framework for detecting risks in deployed AI systems through structured expert-AI interaction analysis, without requiring access to model internals. The framework aims to establish reliable alignment scoring methodologies that could enable institutional monitoring of AI behavior and support epidemiological studies of AI-related outcomes in professional settings.

AIBullishOpenAI News · May 146/10

🧠

Helping ChatGPT better recognize context in sensitive conversations

OpenAI has released safety updates to ChatGPT that improve its ability to recognize context in sensitive conversations and detect potential risks over extended interactions. These enhancements enable the model to respond more safely by better understanding conversational nuance and maintaining awareness of conversation history when evaluating harmful requests.

🧠 ChatGPT

AINeutralarXiv – CS AI · Apr 146/10

🧠

PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk

Researchers introduce PRISM, a framework that detects AI behavioral risks by analyzing underlying reasoning hierarchies rather than individual harmful outputs. The system identifies 27 risk signals across value prioritization, evidence weighting, and information source trust, using forced-choice data from 7 AI models to distinguish between structurally dangerous, context-dependent, and balanced AI reasoning patterns.