🧠 AI⚪ NeutralImportance 6/10

GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models

arXiv – CS AI|Mohd Ariful Haque, Fahad Rahman, Kishor Datta Gupta, Roy George|May 29, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce GPF-LiveNews, a streaming evaluation protocol that audits how large language models frame news differently based on group identities and prompts. Testing 23 models across 42 identity labels reveals that policy-oriented prompts trigger stronger semantic shifts in framing, while sentiment variation remains inconsistent, highlighting the need for continuous monitoring of LLM outputs in production environments.

Analysis

GPF-LiveNews addresses a critical gap in AI safety research: static bias benchmarks fail to capture how language models dynamically frame information for different audiences in real-time. This matters because deployed LLMs encounter constantly shifting inputs, retrieval systems, and safety mechanisms that traditional evaluation methods don't measure. The protocol streams fresh news from established sources through multiple identity-conditioned prompts, systematically measuring whether models subtly alter their framing—a phenomenon distinct from outright toxicity or refusal.

The research emerges from growing concerns about algorithmic amplification of group-based narratives. While prior work examined factual accuracy or demographic representation, GPF-LiveNews specifically tracks semantic drift and sentiment disparity, two mechanisms through which models could reinforce polarization without triggering safety filters. The pilot's finding that policy-focused prompts generate the strongest semantic movement suggests models are most sensitive to requests demanding actionable guidance tied to group identity.

For AI developers and deployers, this framework provides a practical monitoring tool that moves beyond snapshot benchmarks toward continuous auditing. The sentiment variation findings—flatter across dimensions than expected—warrant deeper investigation into whether models genuinely exhibit consistent behavior or whether sentiment metrics lack sensitivity to subtle framing differences.

The authors deliberately frame results as audit signals for human review rather than fairness verdicts, avoiding overconfidence claims. Future work should expand beyond news domains and test whether findings generalize to financial information, medical guidance, and policy recommendations where group-conditioned framing carries material consequences for different populations.

Key Takeaways

→GPF-LiveNews enables continuous monitoring of how LLMs frame news differently across 42 identity groups and seven prompt families, moving beyond static bias benchmarks.
→Policy-and-action prompts trigger the strongest semantic shifts in model outputs, indicating sensitivity to guidance requests tied to group identity.
→Sentiment variation proved surprisingly flat across demographic and prompt dimensions, suggesting either robust consistency or insufficient sensitivity metrics.
→The protocol treats all scores as audit signals for human review rather than permanent fairness rankings, acknowledging limitations of automated evaluation.
→Fresh news streams and reproduction scripts are released as artifacts, enabling other teams to replicate and extend the evaluation methodology.

#llm-evaluation #bias-auditing #group-framing #ai-safety #streaming-benchmarks #semantic-analysis #model-monitoring

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

GPF-LiveNews: A Streaming Evaluation Protocol for Group-Conditioned Framing in Large Language Models

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge