AI Pulse News

Models, papers, tools. 39,986 articles with AI-powered sentiment analysis and key takeaways.

39986 articles

AINeutralarXiv – CS AI · Jun 95/10

🧠

Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts

Researchers introduce Dri-MED, a machine learning algorithm designed to handle multi-armed bandit problems with personalized user preferences, drifting context distributions, and baseline performance constraints. The algorithm achieves improved regret bounds while minimizing constraint violations, demonstrating practical advantages over conservative baseline approaches in experimental settings.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Topological Neural Operators

Researchers introduce Topological Neural Operators (TNOs), a novel framework for machine learning that processes data across multi-dimensional topological structures rather than just points or edges. The approach uses Discrete Exterior Calculus to model interactions while preserving geometric and physical properties, demonstrating improved accuracy on PDE benchmarks including irregular geometry problems.

AINeutralarXiv – CS AI · Jun 96/10

🧠

PTL-Diffusion: Manifold-Aware Diffusion with Periodic Terminal Laws

Researchers propose PTL-Diffusion, a novel diffusion model framework that replaces single Gaussian terminal distributions with periodic families of Gaussian laws to better capture manifold structure in data. The approach embeds phase information directly into forward process dynamics rather than only in the denoising network, showing improved performance on point-cloud and facial datasets compared to standard DDPM baselines.

AINeutralarXiv – CS AI · Jun 96/10

🧠

An Agency-Transferring Model-Free Policy Enhancement Technique

Researchers propose a reinforcement learning technique that accelerates policy training by gradually transferring control from a baseline policy to a learnable policy, achieving faster convergence and superior performance compared to training from scratch while maintaining high success rates throughout the learning process.

AINeutralarXiv – CS AI · Jun 96/10

🧠

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Researchers introduce OmniGameArena, a comprehensive UE5-based benchmark for evaluating vision-language model agents across diverse game environments (solo, PvP, cooperative), along with the Improvement Dynamics Curve methodology that tracks agent performance evolution through iterative refinement rather than single snapshots.

AINeutralarXiv – CS AI · Jun 96/10

🧠

A Survey on Large Language Model-Based Game Agents

A comprehensive survey examines Large Language Model-based game agents (LLMGAs) as testbeds for artificial general intelligence capabilities. The research synthesizes LLM game agent design through a unified architecture covering memory, reasoning, and perception-action interfaces at single-agent levels, plus communication protocols and organizational models for multi-agent coordination across six major game genres.

AINeutralarXiv – CS AI · Jun 96/10

🧠

TQA-Bench: Evaluating LLMs for Multi-Table Question Answering

Researchers introduce TQA-Bench, a comprehensive benchmark for evaluating large language models on multi-table question answering tasks using real-world datasets with variable context lengths (8K-64K tokens). The evaluation of LLMs ranging from 2 billion to 671 billion parameters reveals significant performance gaps in handling complex relational data structures, addressing a critical gap in existing benchmarks that focus primarily on single-table QA.

AIBullisharXiv – CS AI · Jun 96/10

🧠

IDEQ -- Improving Diffusion Models for the Traveling Salesman Problem (TSP) by Leveraging the Structure of the Solution Space

Researchers introduce IDEQ, an improved diffusion model approach for solving the Traveling Salesman Problem that achieves state-of-the-art results for neural network-based methods, matching or exceeding traditional heuristics like LKH3 on benchmark instances while maintaining better scalability.

AINeutralarXiv – CS AI · Jun 96/10

🧠

HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions

Researchers introduce HA-VLN 2.0, a benchmark for vision-and-language navigation that explicitly incorporates human-aware constraints in both discrete and continuous environments. The study reveals significant performance degradation in leading navigation agents when confronted with dynamic multi-human interactions, emphasizing the critical need for social-awareness modeling in autonomous navigation systems.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Modeling the Diachronic Evolution of Legal Norms: An LRMoo-Based, Component-Level, Event-Centric Approach to Legal Knowledge Graphs

Researchers propose a formal temporal modeling framework using the LRMoo ontology to represent how legal norms evolve over time, enabling precise point-in-time reconstruction of legal texts. The approach treats legal amendments as event-centric chains of versioned works, addressing a critical gap in automated legal processing that could improve AI reliability in legal applications.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Discovering heuristics in a complex SAT solver with large language models

Researchers have developed AutoModSAT, a framework that leverages large language models to automatically discover and optimize heuristics in SAT solvers, achieving 40% performance improvements over baseline solvers. The approach combines modular solver design with LLM-guided function generation and evolutionary algorithms, demonstrating significant practical gains across diverse datasets.

AIBullisharXiv – CS AI · Jun 96/10

🧠

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning

Researchers introduce CLPO, a curriculum learning framework that dynamically adapts training difficulty for large language models during reinforcement learning. The approach automatically identifies solved, medium, and hard problems, then strategically restructures tasks to match the model's evolving capabilities, achieving substantial improvements over existing methods on mathematical and reasoning benchmarks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

MatSciBench: Benchmarking the Reasoning Ability of Large Language Models in Materials Science

Researchers introduced MatSciBench, a comprehensive benchmark of 1,340 college-level materials science problems designed to evaluate large language models' reasoning abilities in this specialized domain. Testing leading LLMs revealed significant limitations, with DeepSeek-R1 achieving 75.22% accuracy on text questions and GPT-4 reaching 53.02% on multimodal tasks, highlighting gaps in domain knowledge, calculation accuracy, and scientific figure interpretation.

🧠 GPT-5

AINeutralarXiv – CS AI · Jun 96/10

🧠

TempoBench: Evaluating Temporal Causal Reasoning in Large Language Models

Researchers introduce TempoBench, a formally verified benchmark for evaluating temporal causal reasoning in large language models, revealing a significant gap between forward simulation performance (96% accuracy) and causal reasoning ability (below 25%). The study demonstrates that LLMs struggle with identifying minimal causal inputs, instead over-specifying by listing all possible inputs rather than reasoning about necessity.

AINeutralarXiv – CS AI · Jun 96/10

🧠

A Geometric Unification of Concept Learning with Concept Cones

Researchers demonstrate that Concept Bottleneck Models and Sparse Autoencoders, two distinct interpretability approaches in machine learning, share an underlying geometric structure based on concept cones. This unification enables quantitative evaluation of how well unsupervised concept discovery aligns with human-defined concepts, advancing AI interpretability standards.

AINeutralarXiv – CS AI · Jun 96/10

🧠

A Geometric Theory of Cognition for Machine Intelligence

Researchers propose a geometric framework for machine intelligence where cognitive computation emerges from Riemannian gradient flow on learned latent manifolds, eliminating the need for explicit memory modules. The approach demonstrates superior robustness across reinforcement learning tasks involving partial observability, sensory disruptions, and long-horizon prediction compared to feedforward baselines.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning

Researchers propose Thinking-Based Non-Thinking (TNT), a novel approach to train hybrid reasoning models that dynamically choose between fast responses and extended reasoning without the reward hacking problems that plague existing reinforcement learning methods. The technique achieves approximately 50% token efficiency gains while maintaining or improving accuracy across mathematical benchmarks, addressing a critical bottleneck in deploying large reasoning models.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Payoff scaling shapes cooperation in LLM agents across languages

Researchers analyzed how Large Language Models behave in repeated game scenarios, finding that LLMs become more cooperative as financial stakes increase—contrary to evolutionary game theory predictions. The study reveals that alignment training and human reasoning patterns embedded in LLM training data override expected selfish behavior, with implications for designing multi-agent AI systems in high-stakes environments.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Researchers propose WMSS, a post-training optimization method that leverages weak model checkpoints to improve strong language models beyond conventional saturation points. The approach identifies and addresses learning gaps through entropy dynamics, achieving performance gains in mathematical reasoning and code generation without additional inference costs.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Web Agents Should Use Typed Actions Instead of Click-Based Browsing

A research paper proposes replacing click-based web automation with typed actions backed by semantic APIs, arguing this shift would make AI agents more reliable, auditable, and cost-effective. The authors introduce 'web verbs' as a standardized interface for web operations that could improve agent behavior and enable trustworthy automation at scale.

AIBullisharXiv – CS AI · Jun 96/10

🧠

CatalyticMLLM: A Graph-Text Multimodal Large Language Model for Catalytic Materials

CatalyticMLLM presents a unified graph-text multimodal large language model that integrates property prediction and inverse structural design for catalytic materials within a single framework. This approach overcomes limitations of traditional decoupled systems by eliminating representation space inconsistencies and evaluator bias, enabling more stable closed-loop optimization workflows for materials discovery.

AINeutralarXiv – CS AI · Jun 96/10

🧠

When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach

Researchers propose Strategic Prior-data Fitted Network (SPN), a framework addressing how tabular foundation models fail when users strategically manipulate data post-deployment. The method adapts pretrained models to strategic environments through inference-time adjustments without retraining, demonstrating improved robustness on real-world datasets.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Beyond Rational Illusion: Behaviorally Realistic Strategic Classification

Researchers introduce a new framework for strategic classification that accounts for behavioral biases rather than assuming perfect rationality from agents. The Prospect-Guided Strategic Framework (Pro-SF) incorporates psychological principles from prospect theory to better model real-world decision-making in adversarial machine learning contexts.

$MKR

AINeutralarXiv – CS AI · Jun 96/10

🧠

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

Researchers demonstrate that general-purpose persona steering vectors can reduce AI model sycophancy (agreement with incorrect users) nearly as effectively as specialized steering methods, while maintaining accuracy on correct statements. This challenges the assumption that sycophancy requires targeted mitigation and suggests it operates as a persona-level property rather than a single manipulable direction.

AINeutralarXiv – CS AI · Jun 96/10

🧠

MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance

Researchers introduced MBABench, a new evaluation framework for testing LLM agents on end-to-end financial spreadsheet tasks—a capability increasingly demanded by enterprises but not yet adequately measured by existing benchmarks. The study found that even top-performing models like Claude fall short of professional finance standards, struggling with complex multi-step workflows and degrading sharply in quality as task difficulty increases.

🧠 Claude

← PrevPage 527 of 1600Next →