y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#diagnostic-framework News & Analysis

4 articles tagged with #diagnostic-framework. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

4 articles
AINeutralarXiv โ€“ CS AI ยท Apr 157/10
๐Ÿง 

Beyond Scores: Diagnostic LLM Evaluation via Fine-Grained Abilities

Researchers propose a cognitive diagnostic framework that evaluates large language models across fine-grained ability dimensions rather than aggregate scores, enabling targeted model improvement and task-specific selection. The approach uses multidimensional Item Response Theory to estimate abilities across 35 dimensions for mathematics and generalizes to physics, chemistry, and computer science with strong predictive accuracy.

AIBullisharXiv โ€“ CS AI ยท Mar 276/10
๐Ÿง 

TRAJEVAL: Decomposing Code Agent Trajectories for Fine-Grained Diagnosis

Researchers introduce TRAJEVAL, a diagnostic framework that breaks down AI code agent performance into three stages (search, read, edit) to identify specific failure points rather than just binary pass/fail outcomes. The framework analyzed 16,758 trajectories and found that real-time feedback based on trajectory signals improved state-of-the-art models by 2.2-4.6 percentage points while reducing costs by 20-31%.

๐Ÿง  GPT-5
AINeutralarXiv โ€“ CS AI ยท Mar 54/10
๐Ÿง 

When Visual Evidence is Ambiguous: Pareidolia as a Diagnostic Probe for Vision Models

Researchers developed a framework using face pareidolia (seeing faces in non-face objects) to test how different AI vision models handle ambiguous visual information. The study found that vision-language models like CLIP and LLaVA tend to over-interpret ambiguous patterns, while pure vision models remain more uncertain and detection models are more conservative.