449 articles tagged with #ai-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers have introduced SciDER, an AI-powered system that automates the entire scientific research process from data analysis to hypothesis generation and code execution. The system uses specialized AI agents that can collaboratively process raw experimental data and outperforms existing general-purpose AI models in scientific discovery tasks.
AIBullisharXiv โ CS AI ยท Mar 36/106
๐ง Researchers have developed S5-HES Agent, an AI-driven framework that democratizes smart home research by enabling natural language configuration of simulations without programming expertise. The system uses large language models and retrieval-augmented generation to make smart home environment testing accessible to broader research communities beyond traditional technical experts.
$NEAR
AIBullisharXiv โ CS AI ยท Mar 37/107
๐ง Researchers developed ToolRLA, a three-stage reinforcement learning pipeline that significantly improves AI agents' ability to use external tools and APIs for domain-specific tasks. The system achieved 47% higher task completion rates and 93% lower regulatory violations when deployed in a real-world financial advisory copilot serving 80+ advisors with 1,200+ daily queries.
AIBullisharXiv โ CS AI ยท Mar 37/106
๐ง Researchers propose CeProAgents, a hierarchical multi-agent system that automates chemical process development using AI agents specialized in knowledge, concept, and parameter tasks. The system introduces CeProBench, a comprehensive benchmark for evaluating AI capabilities in chemical engineering applications.
AIBullisharXiv โ CS AI ยท Mar 37/108
๐ง Researchers introduce FT-Dojo, an interactive environment for studying autonomous LLM fine-tuning, along with FT-Agent, an AI system that can automatically fine-tune language models without human intervention. The system achieved best performance on 10 out of 13 tasks across five domains, demonstrating the potential for fully automated machine learning workflows while revealing current limitations in AI reasoning capabilities.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers introduce CoVe, a framework for training interactive tool-use AI agents that uses constraint-guided verification to generate high-quality training data. The compact CoVe-4B model achieves competitive performance with models 17 times larger on benchmark tests, with the team open-sourcing code, models, and 12K training trajectories.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers propose ActMem, a novel memory framework for LLM agents that combines memory retrieval with active causal reasoning to handle complex decision-making scenarios. The framework transforms dialogue history into structured causal graphs and uses counterfactual reasoning to resolve conflicts between past states and current intentions, significantly outperforming existing baselines in memory-dependent tasks.
AIBearisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers argue that LLM-based AI agents are not yet effective for social simulation, despite growing optimism in the field. The paper identifies systematic mismatches between what current agent systems produce and what scientific simulation requires, calling for more rigorous validation frameworks.
$OP
AINeutralarXiv โ CS AI ยท Mar 37/106
๐ง Researchers developed SkillFortify, the first formal analysis framework for securing AI agent skill supply chains, addressing critical vulnerabilities exposed by attacks like ClawHavoc that infiltrated over 1,200 malicious skills. The framework achieved 96.95% F1 score with 100% precision and zero false positives in detecting malicious AI agent skills.
AI ร CryptoBullisharXiv โ CS AI ยท Mar 37/109
๐คResearchers have developed the Agent Economic Sovereignty Protocol (AESP), a new framework that allows AI agents to conduct autonomous financial transactions at machine speed while maintaining human control and governance boundaries. The protocol uses five key mechanisms including policy engines, human oversight, dual-signed commitments, privacy preservation, and cryptographic substrates to ensure agents remain economically capable but never fully sovereign.
AIBullisharXiv โ CS AI ยท Mar 37/108
๐ง Researchers propose WirelessAgent++, an automated framework for designing AI agent workflows in wireless networks using Monte Carlo Tree Search. The system achieves superior performance on wireless tasks with test scores up to 97%, outperforming existing methods by up to 31% while maintaining low computational costs under $5 per task.
AINeutralarXiv โ CS AI ยท Mar 36/107
๐ง Researchers introduce Theory of Code Space (ToCS), a new benchmark that evaluates AI agents' ability to understand software architecture across multi-file codebases. The study reveals significant performance gaps between frontier LLM agents and rule-based baselines, with F1 scores ranging from 0.129 to 0.646.
AIBullisharXiv โ CS AI ยท Mar 36/107
๐ง Researchers have developed ContextCov, a framework that converts passive natural language instructions for AI agents into active, executable guardrails to prevent code violations. The system addresses 'Context Drift' where AI agents deviate from project guidelines, creating automated compliance checks across static code analysis, runtime commands, and architectural validation.
$COMP
AIBearisharXiv โ CS AI ยท Mar 37/107
๐ง A new research paper analyzes economic equilibria between AI and human agents in trading scenarios, finding that unless agents can at least double their marginal utility from purchases, no trading will occur. The study reveals that more powerful AI agents may contribute zero utility to less capable agents in certain equilibria.
AIBullisharXiv โ CS AI ยท Mar 37/109
๐ง SimAB is a new system that uses persona-conditioned AI agents to simulate A/B tests for rapid design evaluation without requiring real user traffic. The system achieved 67% overall accuracy against 47 historical A/B tests, rising to 83% for high-confidence cases, potentially transforming how companies validate design decisions.
AIBullisharXiv โ CS AI ยท Mar 37/104
๐ง Researchers propose combining In-Weight Learning (IWL) and In-Context Learning (ICL) through modular memory architectures to solve continual learning challenges in AI. The framework aims to enable AI agents to continuously adapt and accumulate knowledge without catastrophic forgetting, addressing key limitations of current foundation models.
AIBullisharXiv โ CS AI ยท Mar 36/105
๐ง Researchers introduce 'semi-formal reasoning' for LLM agents to analyze code semantics without execution, showing significant accuracy improvements across multiple tasks. The methodology achieves 88-93% accuracy on patch verification and 87% on code question answering, potentially enabling practical applications in automated code review and static analysis.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers have developed State-aware Reasoning (StaR), a new multimodal AI method that significantly improves AI agents' ability to interact with graphical user interfaces, particularly with toggle controls. The method enables agents to better perceive current states and execute instructions accordingly, improving toggle execution accuracy by over 30%.
AINeutralarXiv โ CS AI ยท Mar 36/104
๐ง Researchers introduced EHR-ChatQA, a new benchmark for testing AI agents that interact with Electronic Health Record databases through natural language queries. The benchmark reveals significant reliability gaps in current state-of-the-art LLMs, with success rates dropping substantially when consistency across multiple trials is required.
AINeutralarXiv โ CS AI ยท Mar 35/103
๐ง Researchers developed AWARE-US, a system to improve AI agents' ability to handle failed database queries by intelligently relaxing the least important user constraints rather than simply returning 'no results'. The system uses three LLM-based methods to infer constraint importance from dialogue, achieving up to 56% accuracy in correct constraint relaxation.
AINeutralarXiv โ CS AI ยท Mar 35/104
๐ง Researchers introduced SimuHome, a high-fidelity smart home simulator and benchmark with 600 episodes for testing LLM-based smart home agents. The system uses the Matter protocol standard and enables time-accelerated simulation to evaluate how AI agents handle device control, environmental monitoring, and workflow scheduling in smart homes.
AIBullisharXiv โ CS AI ยท Mar 36/102
๐ง Researchers introduced SWE-MiniSandbox, a container-free method for training software engineering AI agents using reinforcement learning that reduces disk usage to 5% and environment setup time to 25% of traditional container-based approaches. The system uses kernel-level isolation and lightweight pre-caching instead of bulky container images while maintaining comparable performance.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers propose HIMM, a new memory framework for AI embodied agents that separates episodic and semantic memory to improve long-term performance. The system achieves significant gains on benchmarks, with 7.3% improvement in LLM-Match and 11.4% in LLM MatchXSPL, addressing key challenges in deploying multimodal language models as embodied agent brains.
AI ร CryptoBearishCoinTelegraph ยท Mar 26/108
๐คA viral Black Mirror-style 'Energym' spoof depicting 80% job losses to AI is circulating amid real-world tech layoffs and declining white-collar job openings. The dystopian scenario resonates as tech companies continue mass workforce reductions while crypto projects promote user-owned AI agents as an alternative model.
AINeutralarXiv โ CS AI ยท Mar 27/1012
๐ง Researchers have developed an agentic LLM framework using Retrieval-Augmented Generation to automate adverse media screening for anti-money laundering compliance in financial institutions. The system addresses high false-positive rates in traditional keyword-based approaches by implementing multi-step web searches and computing Adverse Media Index scores to distinguish between high-risk and low-risk individuals.