y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#empirical-study News & Analysis

9 articles tagged with #empirical-study. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

9 articles
AIBearisharXiv – CS AI · 6d ago7/10
🧠

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Researchers conducted 400 autonomous penetration testing runs across four LLM models against a fixed vulnerable target to measure attack consistency. Results show significant variation in exploitation success rates (25-85%) and distinctive failure modes per model, with Claude and Gemini 2.5 Flash-Lite substantially outperforming GPT-4o-mini and Qwen, raising critical questions about LLM reliability in security-critical autonomous operations.

🏢 Anthropic🧠 GPT-4🧠 Claude
AINeutralarXiv – CS AI · Apr 207/10
🧠

Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning

Researchers conducted a comprehensive empirical study on scaling laws for large language models during reinforcement learning post-training, using Qwen2.5 models ranging from 0.5B to 72B parameters. The study reveals that larger models demonstrate superior learning efficiency, performance can be predicted via power-law models, and data reuse proves highly effective in constrained environments, providing practical guidelines for optimizing LLM reasoning capabilities.

AIBearisharXiv – CS AI · Apr 107/10
🧠

Beyond Functional Correctness: Design Issues in AI IDE-Generated Large-Scale Projects

Researchers evaluated Cursor, an AI-powered IDE, on its ability to generate large-scale software projects and found it achieves 91% functional correctness but produces significant design issues including code duplication, complexity violations, and framework best-practice breaches that threaten long-term maintainability.

AINeutralarXiv – CS AI · May 126/10
🧠

Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification

Researchers deployed thirteen AI agents on Moltbook, a Reddit-like social network for AI systems, to study how configuration specifications affect emergent social behavior. Results show personality specification is the dominant factor influencing agent responses, while underlying LLM models and operational rules have more moderate effects on communication style and topic engagement.

AINeutralarXiv – CS AI · May 116/10
🧠

Prompt Engineering Strategies for LLM-based Qualitative Coding of Psychological Safety in Software Engineering Communities: A Controlled Empirical Study

Researchers conducted a controlled empirical study evaluating three LLMs (Claude Haiku, DeepSeek-Chat, Gemini 2.5 Flash) for qualitative coding of psychological safety in software engineering communities. Multi-shot prompting improved Claude Haiku's performance but not the others, while all models exhibited systematic biases in coding predictions, providing evidence-based guidelines for LLM-assisted qualitative research.

🧠 Claude🧠 Gemini
AINeutralarXiv – CS AI · Apr 146/10
🧠

Taking a Pulse on How Generative AI is Reshaping the Software Engineering Research Landscape

A large-scale survey of 457 software engineering researchers reveals that generative AI adoption is widespread in academic research, concentrated primarily in writing and early-stage tasks. While researchers perceive significant productivity gains, persistent concerns about accuracy, bias, and lack of governance frameworks highlight the need for clearer guidelines on responsible AI integration in academic practice.

AINeutralarXiv – CS AI · Apr 106/10
🧠

Machine Unlearning in the Era of Quantum Machine Learning: An Empirical Study

Researchers present the first empirical study of machine unlearning in hybrid quantum-classical neural networks, adapting classical unlearning methods to quantum settings and introducing quantum-specific strategies. The study reveals that quantum models can effectively support unlearning, with performance varying based on circuit depth and entanglement structure, establishing baseline insights for privacy-preserving quantum machine learning systems.

AINeutralarXiv – CS AI · Mar 55/10
🧠

Beyond the Prompt: An Empirical Study of Cursor Rules

Researchers conducted a large-scale empirical study analyzing 401 open-source repositories to understand how developers use cursor rules - persistent, machine-readable directives that provide context to AI coding assistants. The study identified five key themes of project context that developers consider essential: Conventions, Guidelines, Project Information, LLM Directives, and Examples.

AINeutralarXiv – CS AI · Mar 175/10
🧠

An Empirical Investigation of Pre-Trained Deep Learning Model Reuse in the Scientific Process

Researchers conducted the first empirical study analyzing how natural scientists reuse pre-trained deep learning models across 17,511 peer-reviewed papers from 2000-2025. The study found that biochemistry and molecular biology lead in model reuse, with adaptation being the most common reuse pattern, primarily impacting the testing phase of scientific research.