#semantic-invariance News & Analysis

3 articles tagged with #semantic-invariance. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AINeutralarXiv – CS AI · May 297/10

🧠

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks

FormInv introduces a measurement protocol that audits mathematical reasoning benchmarks for semantic consistency, revealing that current evaluation methods mask significant ranking volatility across AI models. The study found 3.1% semantically incorrect paraphrases in MathCheck that altered model rankings and discovered that models achieving similar accuracy scores (86-96%) exhibit drastically different consistency rates (50-82%) when tested against semantically equivalent problem restatements.

🧠 GPT-4🧠 Claude🧠 Haiku

AINeutralarXiv – CS AI · Mar 167/10

🧠

Semantic Invariance in Agentic AI

Researchers developed a testing framework to evaluate how reliably AI agents maintain consistent reasoning when inputs are semantically equivalent but differently phrased. Their study of seven foundation models across 19 reasoning problems found that larger models aren't necessarily more robust, with the smaller Qwen3-30B-A3B achieving the highest stability at 79.6% invariant responses.

AIBearisharXiv – CS AI · Mar 36/108

🧠

LLM Self-Explanations Fail Semantic Invariance

Research reveals that Large Language Model (LLM) self-explanations fail semantic invariance testing, showing that AI models' self-reports change based on how tasks are framed rather than actual task performance. Four frontier AI models demonstrated unreliable self-reporting when faced with semantically different but functionally identical tool descriptions, raising questions about using model self-reports as evidence of capability.