#evaluation-protocols News & Analysis

2 articles tagged with #evaluation-protocols. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AINeutralarXiv – CS AI · Jun 106/10

🧠

Is Fairness Truly Fair? Towards Reliable Lipschitz Fairness in Multi-Task Learning via Fixed-\texorpdfstring{$\delta$}{delta} Alignment

Researchers propose ReLiF, a framework addressing fairness evaluation problems in multi-task machine learning by using fixed evaluation thresholds rather than model-dependent ones. The work identifies how different algorithms can appear unfairly comparable under inconsistent fairness metrics and demonstrates that proper auditing protocols reveal genuine utility-fairness trade-offs obscured by conventional methods.

🏢 Meta

AINeutralarXiv – CS AI · May 126/10

🧠

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

Researchers present a new evaluation protocol for AI pentesting agents that moves beyond simplified benchmarks to assess real-world vulnerability discovery capabilities. The framework combines structured ground-truth validation with LLM-based semantic matching and includes efficiency metrics, addressing a critical gap in how offensive security AI systems are currently measured.