y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#ai-agents News & Analysis

449 articles tagged with #ai-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

449 articles
AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

SciDER: Scientific Data-centric End-to-end Researcher

Researchers have introduced SciDER, an AI-powered system that automates the entire scientific research process from data analysis to hypothesis generation and code execution. The system uses specialized AI agents that can collaboratively process raw experimental data and outperforms existing general-purpose AI models in scientific discovery tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 36/106
๐Ÿง 

S5-HES Agent: Society 5.0-driven Agentic Framework to Democratize Smart Home Environment Simulation

Researchers have developed S5-HES Agent, an AI-driven framework that democratizes smart home research by enabling natural language configuration of simulations without programming expertise. The system uses large language models and retrieval-augmented generation to make smart home environment testing accessible to broader research communities beyond traditional technical experts.

$NEAR
AIBullisharXiv โ€“ CS AI ยท Mar 37/107
๐Ÿง 

ToolRLA: Fine-Grained Reward Decomposition for Tool-Integrated Reinforcement Learning Alignment in Domain-Specific Agents

Researchers developed ToolRLA, a three-stage reinforcement learning pipeline that significantly improves AI agents' ability to use external tools and APIs for domain-specific tasks. The system achieved 47% higher task completion rates and 93% lower regulatory violations when deployed in a real-world financial advisory copilot serving 80+ advisors with 1,200+ daily queries.

AIBullisharXiv โ€“ CS AI ยท Mar 37/106
๐Ÿง 

CeProAgents: A Hierarchical Agents System for Automated Chemical Process Development

Researchers propose CeProAgents, a hierarchical multi-agent system that automates chemical process development using AI agents specialized in knowledge, concept, and parameter tasks. The system introduces CeProBench, a comprehensive benchmark for evaluating AI capabilities in chemical engineering applications.

AIBullisharXiv โ€“ CS AI ยท Mar 37/108
๐Ÿง 

FT-Dojo: Towards Autonomous LLM Fine-Tuning with Language Agents

Researchers introduce FT-Dojo, an interactive environment for studying autonomous LLM fine-tuning, along with FT-Agent, an AI system that can automatically fine-tune language models without human intervention. The system achieved best performance on 10 out of 13 tasks across five domains, demonstrating the potential for fully automated machine learning workflows while revealing current limitations in AI reasoning capabilities.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

Researchers introduce CoVe, a framework for training interactive tool-use AI agents that uses constraint-guided verification to generate high-quality training data. The compact CoVe-4B model achieves competitive performance with models 17 times larger on benchmark tests, with the team open-sourcing code, models, and 12K training trajectories.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Researchers propose ActMem, a novel memory framework for LLM agents that combines memory retrieval with active causal reasoning to handle complex decision-making scenarios. The framework transforms dialogue history into structured causal graphs and uses counterfactual reasoning to resolve conflicts between past states and current intentions, significantly outperforming existing baselines in memory-dependent tasks.

AIBearisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

Position: AI Agents Are Not (Yet) a Panacea for Social Simulation

Researchers argue that LLM-based AI agents are not yet effective for social simulation, despite growing optimism in the field. The paper identifies systematic mismatches between what current agent systems produce and what scientific simulation requires, calling for more rigorous validation frameworks.

$OP
AINeutralarXiv โ€“ CS AI ยท Mar 37/106
๐Ÿง 

Formal Analysis and Supply Chain Security for Agentic AI Skills

Researchers developed SkillFortify, the first formal analysis framework for securing AI agent skill supply chains, addressing critical vulnerabilities exposed by attacks like ClawHavoc that infiltrated over 1,200 malicious skills. The framework achieved 96.95% F1 score with 100% precision and zero false positives in detecting malicious AI agent skills.

AI ร— CryptoBullisharXiv โ€“ CS AI ยท Mar 37/109
๐Ÿค–

AESP: A Human-Sovereign Economic Protocol for AI Agents with Privacy-Preserving Settlement

Researchers have developed the Agent Economic Sovereignty Protocol (AESP), a new framework that allows AI agents to conduct autonomous financial transactions at machine speed while maintaining human control and governance boundaries. The protocol uses five key mechanisms including policy engines, human oversight, dual-signed commitments, privacy preservation, and cryptographic substrates to ensure agents remain economically capable but never fully sovereign.

AIBullisharXiv โ€“ CS AI ยท Mar 37/108
๐Ÿง 

WirelessAgent++: Automated Agentic Workflow Design and Benchmarking for Wireless Networks

Researchers propose WirelessAgent++, an automated framework for designing AI agent workflows in wireless networks using Monte Carlo Tree Search. The system achieves superior performance on wireless tasks with test scores up to 97%, outperforming existing methods by up to 31% while maintaining low computational costs under $5 per task.

AINeutralarXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

Theory of Code Space: Do Code Agents Understand Software Architecture?

Researchers introduce Theory of Code Space (ToCS), a new benchmark that evaluates AI agents' ability to understand software architecture across multi-file codebases. The study reveals significant performance gaps between frontier LLM agents and rule-based baselines, with F1 scores ranging from 0.129 to 0.646.

AIBullisharXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

ContextCov: Deriving and Enforcing Executable Constraints from Agent Instruction Files

Researchers have developed ContextCov, a framework that converts passive natural language instructions for AI agents into active, executable guardrails to prevent code violations. The system addresses 'Context Drift' where AI agents deviate from project guidelines, creating automated compliance checks across static code analysis, runtime commands, and architectural validation.

$COMP
AIBearisharXiv โ€“ CS AI ยท Mar 37/107
๐Ÿง 

Artificial Superintelligence May be Useless: Equilibria in the Economy of Multiple AI Agents

A new research paper analyzes economic equilibria between AI and human agents in trading scenarios, finding that unless agents can at least double their marginal utility from purchases, no trading will occur. The study reveals that more powerful AI agents may contribute zero utility to less capable agents in certain equilibria.

AIBullisharXiv โ€“ CS AI ยท Mar 37/109
๐Ÿง 

SimAB: Simulating A/B Tests with Persona-Conditioned AI Agents for Rapid Design Evaluation

SimAB is a new system that uses persona-conditioned AI agents to simulate A/B tests for rapid design evaluation without requiring real user traffic. The system achieved 67% overall accuracy against 47 historical A/B tests, rising to 83% for high-confidence cases, potentially transforming how companies validate design decisions.

AIBullisharXiv โ€“ CS AI ยท Mar 37/104
๐Ÿง 

Modular Memory is the Key to Continual Learning Agents

Researchers propose combining In-Weight Learning (IWL) and In-Context Learning (ICL) through modular memory architectures to solve continual learning challenges in AI. The framework aims to enable AI agents to continuously adapt and accumulate knowledge without catastrophic forgetting, addressing key limitations of current foundation models.

AIBullisharXiv โ€“ CS AI ยท Mar 36/105
๐Ÿง 

Agentic Code Reasoning

Researchers introduce 'semi-formal reasoning' for LLM agents to analyze code semantics without execution, showing significant accuracy improvements across multiple tasks. The methodology achieves 88-93% accuracy on patch verification and 87% on code question answering, potentially enabling practical applications in automated code review and static analysis.

AIBullisharXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles

Researchers have developed State-aware Reasoning (StaR), a new multimodal AI method that significantly improves AI agents' ability to interact with graphical user interfaces, particularly with toggle controls. The method enables agents to better perceive current states and execute instructions accordingly, improving toggle execution accuracy by over 30%.

AINeutralarXiv โ€“ CS AI ยท Mar 35/103
๐Ÿง 

AWARE-US: Preference-Aware Infeasibility Resolution in Tool-Calling Agents

Researchers developed AWARE-US, a system to improve AI agents' ability to handle failed database queries by intelligently relaxing the least important user constraints rather than simply returning 'no results'. The system uses three LLM-based methods to infer constraint importance from dialogue, achieving up to 56% accuracy in correct constraint relaxation.

AINeutralarXiv โ€“ CS AI ยท Mar 35/104
๐Ÿง 

SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

Researchers introduced SimuHome, a high-fidelity smart home simulator and benchmark with 600 episodes for testing LLM-based smart home agents. The system uses the Matter protocol standard and enables time-accelerated simulation to evaluate how AI agents handle device control, environmental monitoring, and workflow scheduling in smart homes.

AIBullisharXiv โ€“ CS AI ยท Mar 36/102
๐Ÿง 

SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents

Researchers introduced SWE-MiniSandbox, a container-free method for training software engineering AI agents using reinforcement learning that reduces disk usage to 5% and environment setup time to 25% of traditional container-based approaches. The system uses kernel-level isolation and lightweight pre-caching instead of bulky container images while maintaining comparable performance.

AIBullisharXiv โ€“ CS AI ยท Mar 36/103
๐Ÿง 

HIMM: Human-Inspired Long-Term Memory Modeling for Embodied Exploration and Question Answering

Researchers propose HIMM, a new memory framework for AI embodied agents that separates episodic and semantic memory to improve long-term performance. The system achieves significant gains on benchmarks, with 7.3% improvement in LLM-Match and 11.4% in LLM MatchXSPL, addressing key challenges in deploying multimodal language models as embodied agent brains.

AI ร— CryptoBearishCoinTelegraph ยท Mar 26/108
๐Ÿค–

Energym AI dystopia goes viral as crypto projects tout user-owned AI agents

A viral Black Mirror-style 'Energym' spoof depicting 80% job losses to AI is circulating amid real-world tech layoffs and declining white-collar job openings. The dystopian scenario resonates as tech companies continue mass workforce reductions while crypto projects promote user-owned AI agents as an alternative model.

Energym AI dystopia goes viral as crypto projects tout user-owned AI agents
AINeutralarXiv โ€“ CS AI ยท Mar 27/1012
๐Ÿง 

An Agentic LLM Framework for Adverse Media Screening in AML Compliance

Researchers have developed an agentic LLM framework using Retrieval-Augmented Generation to automate adverse media screening for anti-money laundering compliance in financial institutions. The system addresses high false-positive rates in traditional keyword-based approaches by implementing multi-step web searches and computing Adverse Media Index scores to distinguish between high-risk and low-risk individuals.