AIBullisharXiv – CS AI · Jun 57/10
🧠ABBEL is a new recursive summarization framework that enables AI agents to maintain memory-efficient interaction histories by storing information as natural-language belief states rather than full context. The approach uses reinforcement learning techniques to improve belief generation quality, achieving 40% better performance than prior memory-constrained agents while using 67% less memory.
AINeutralarXiv – CS AI · May 117/10
🧠Researchers introduce Agentick, a unified benchmark for evaluating diverse AI agents—from reinforcement learning to large language models—across 37 procedurally generated tasks. Testing 27 configurations reveals no single approach dominates, with GPT-4 mini leading overall while specialized methods excel in specific domains, suggesting significant optimization potential across all agent paradigms.
🏢 Meta🧠 GPT-5
AINeutralarXiv – CS AI · Jun 105/10
🧠Researchers introduce SCOPE, a new machine learning approach for Prescriptive Process Monitoring that optimizes sequential business interventions using causal inference rather than simulation-based reinforcement learning. The method addresses a critical gap in existing systems by accounting for how multiple interventions interact over time while working directly with observational data, demonstrated through testing on synthetic and semi-synthetic datasets.
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers introduce MINTS (Minimalist Thompson Sampling), a Bayesian framework that simplifies sequential decision-making under uncertainty by placing priors only on optimal parameters while eliminating unnecessary variables through profile likelihood. The approach achieves near-optimal regret bounds for multi-armed bandits and automatically adapts to structural constraints, matching classical performance benchmarks.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers have developed attribution techniques that explain decision-making in Markov Decision Processes (MDPs), extending explainability methods beyond static inputs to sequential decision-making systems. The approach assigns importance scores to states and execution paths, enabling more interpretable AI agents in dynamic environments.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers demonstrate that large language models can be effectively fine-tuned to perform sequential decision-making tasks across MDPs, POMDPs, and ambiguous environments by learning from offline trajectory data. The approach achieves stronger performance than baseline methods, particularly in complex, partially-observed scenarios, with theoretical analysis showing the fine-tuned attention mechanisms implicitly estimate optimal Q-functions.
AIBullisharXiv – CS AI · May 96/10
🧠PRISM is a new AI framework that improves embodied agents by coupling Vision-Language Models with Large Language Models through dynamic question-answer interactions, addressing the perception-reasoning gap in multimodal AI systems. The framework demonstrates significant performance improvements on benchmark tasks like ALFWorld and R2R, showing that interactive, goal-oriented perception yields superior understanding compared to standalone visual analysis.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers evaluated whether large language models can function as text-only controllers for navigation and exploration in unknown environments under partial observability. Testing nine contemporary LLMs on ASCII gridworld tasks, they found reasoning-tuned models reliably complete navigation goals but remain inefficient compared to optimal paths, with few-shot prompting reducing invalid moves and improving path efficiency.
AIBullisharXiv – CS AI · Mar 166/10
🧠Researchers introduce a formal planning framework that maps LLM-based web agents to traditional search algorithms, enabling better diagnosis of failures in autonomous web tasks. The study compares different agent architectures using novel evaluation metrics and a dataset of 794 human-labeled trajectories from WebArena benchmark.