#software-engineering News & Analysis

66 articles tagged with #software-engineering. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

66 articles

AINeutralarXiv – CS AI · Mar 55/10

🧠

Beyond the Prompt: An Empirical Study of Cursor Rules

Researchers conducted a large-scale empirical study analyzing 401 open-source repositories to understand how developers use cursor rules - persistent, machine-readable directives that provide context to AI coding assistants. The study identified five key themes of project context that developers consider essential: Conventions, Guidelines, Project Information, LLM Directives, and Examples.

AIBullisharXiv – CS AI · Mar 36/1010

🧠

From Goals to Aspects, Revisited: An NFR Pattern Language for Agentic AI Systems

Researchers have developed a pattern language methodology to systematically identify and modularize crosscutting concerns in agentic AI systems, addressing issues like security, reliability, and cost management that contribute to high AI project failure rates. The approach uses goal models to discover reusable patterns and implements them through aspect-oriented programming in Rust.

AIBullisharXiv – CS AI · Mar 36/107

🧠

SWE-Hub: A Unified Production System for Scalable, Executable Software Engineering Tasks

Researchers introduce SWE-Hub, a comprehensive system for generating scalable, executable software engineering tasks for training AI agents. The platform addresses current limitations in AI software development by providing unified environment automation, bug synthesis, and diverse task generation across multiple programming languages.

AIBearisharXiv – CS AI · Mar 37/108

🧠

Are LLMs Reliable Code Reviewers? Systematic Overcorrection in Requirement Conformance Judgement

Research reveals that Large Language Models (LLMs) systematically fail at code review tasks, frequently misclassifying correct code as defective when matching implementations to natural language requirements. The study found that more detailed prompts actually increase misjudgment rates, raising concerns about LLM reliability in automated development workflows.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Theory of Code Space: Do Code Agents Understand Software Architecture?

Researchers introduce Theory of Code Space (ToCS), a new benchmark that evaluates AI agents' ability to understand software architecture across multi-file codebases. The study reveals significant performance gaps between frontier LLM agents and rule-based baselines, with F1 scores ranging from 0.129 to 0.646.

AIBullisharXiv – CS AI · Mar 36/107

🧠

ContextCov: Deriving and Enforcing Executable Constraints from Agent Instruction Files

Researchers have developed ContextCov, a framework that converts passive natural language instructions for AI agents into active, executable guardrails to prevent code violations. The system addresses 'Context Drift' where AI agents deviate from project guidelines, creating automated compliance checks across static code analysis, runtime commands, and architectural validation.

$COMP

AIBullisharXiv – CS AI · Mar 37/108

🧠

FastCode: Fast and Cost-Efficient Code Understanding and Reasoning

Researchers introduce FastCode, a new framework for AI-assisted software engineering that improves code understanding and reasoning efficiency. The system uses structural scouting to navigate codebases without full-text ingestion, significantly reducing computational costs while maintaining accuracy across multiple benchmarks.

AIBullisharXiv – CS AI · Mar 36/102

🧠

SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents

Researchers introduced SWE-MiniSandbox, a container-free method for training software engineering AI agents using reinforcement learning that reduces disk usage to 5% and environment setup time to 25% of traditional container-based approaches. The system uses kernel-level isolation and lightweight pre-caching instead of bulky container images while maintaining comparable performance.

AIBullisharXiv – CS AI · Feb 276/106

🧠

ESAA: Event Sourcing for Autonomous Agents in LLM-Based Software Engineering

Researchers have introduced ESAA (Event Sourcing for Autonomous Agents), a new architecture that improves LLM-based autonomous agents by separating cognitive intention from state mutation using structured JSON events and deterministic orchestration. The system addresses key limitations like context degradation and execution reliability, with successful validation through multi-agent case studies using various LLMs including Claude Sonnet and GPT-5.

AIBullishOpenAI News · Dec 126/108

🧠

How We Used Codex to Ship Sora for Android in 28 Days

OpenAI successfully developed and shipped Sora for Android in just 28 days by leveraging Codex AI assistance. The rapid development was achieved through AI-powered planning, code translation, and parallel coding workflows that enabled a small team to deliver reliable results quickly.

AIBullishOpenAI News · Mar 66/106

🧠

Accelerating engineering cycles 20% with OpenAI

OpenAI reports that their AI tools are accelerating engineering development cycles by 20%. This represents a significant productivity gain in software engineering workflows through AI integration.

AINeutralOpenAI News · Feb 186/106

🧠

Introducing the SWE-Lancer benchmark

A new benchmark called SWE-Lancer has been introduced to evaluate whether frontier large language models can earn $1 million through real-world freelance software engineering work. This benchmark tests AI capabilities in practical, revenue-generating programming tasks rather than traditional academic assessments.

AIBullishOpenAI News · Aug 135/105

🧠

Introducing SWE-bench Verified

SWE-bench Verified is being released as a human-validated subset of the original SWE-bench benchmark. This new version aims to provide more reliable evaluation of AI models' capabilities in solving real-world software engineering problems.

AINeutralarXiv – CS AI · Apr 75/10

🧠

Measuring LLM Trust Allocation Across Conflicting Software Artifacts

Researchers developed TRACE, a framework to evaluate how LLMs allocate trust between conflicting software artifacts like code, documentation, and tests. The study found that current LLMs are better at identifying natural-language specification issues than detecting subtle code-level problems, with models showing systematic blind spots when implementations drift while documentation remains plausible.

AINeutralarXiv – CS AI · Mar 175/10

🧠

Describing Agentic AI Systems with C4: Lessons from Industry Projects

Researchers propose a new C4-based documentation framework specifically designed for agentic AI systems, which operate through specialized agents collaborating via artifact exchange and tool invocation. The approach provides structured modeling vocabulary and hierarchical description techniques to capture the unique architectural patterns of these systems for industrial applications.

AINeutralOpenAI News · Feb 114/106

🧠

Harness engineering: leveraging Codex in an agent-first world

This appears to be a technical article by Ryan Lopopolo discussing engineering approaches for leveraging Codex (OpenAI's code generation model) in agent-first development environments. The article focuses on practical implementation strategies for integrating AI code generation tools into modern software development workflows.

← PrevPage 3 of 3