AI Pulse News

Models, papers, tools. 39,814 articles with AI-powered sentiment analysis and key takeaways.

39814 articles

GeneralNeutralarXiv – CS AI · Jun 95/10

📰

Transforming Police-Car Swerving for Mitigating Isolated Stop-and-Go Traffic Waves: A Practice-Oriented Jam-Absorption Driving Strategy

Researchers propose a practical jam-absorption driving (JAD) strategy inspired by police-car swerving to suppress stop-and-go traffic waves on freeways. The SD-JAD approach uses two roadside detectors to measure key parameters and guide a vehicle through strategic slow-in/fast-out maneuvers, successfully preventing wave propagation without creating secondary congestion.

AINeutralarXiv – CS AI · Jun 95/10

🧠

Hybrid E-Assessment in Higher Education: Semi-Automated Grading of Paper-Based Written Examinations

Researchers propose a hybrid e-assessment system for higher education that combines paper-based examinations with semi-automated grading using vision-capable large language models. The approach addresses limitations of fully digital assessment while maintaining pedagogical integrity and scalability through handwritten character recognition and validation protocols.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Can the Environment Speak for Itself? $T^{2}$-GRPO: A Turn-Trajectory Group Relative Policy Optimization for Caregiver Agents

Researchers propose T²-GRPO, a reinforcement learning framework that optimizes large language models for dementia caregiver agents by balancing immediate patient feedback with long-term care outcomes. The method uses environment-grounded rewards and safety constraints to improve emotional intelligence in AI caregiving scenarios.

AIBullisharXiv – CS AI · Jun 96/10

🧠

FAME: Forecastability-Aware Mixture of Experts for Heterogeneous Time Series Forecasting

Researchers introduce FAME, a sparse mixture-of-experts framework that dynamically routes time series forecasting tasks to specialized models based on data characteristics. Tested on a production retail dataset with 5,000+ vending machines, the system achieves 12.4% MSE improvement over single-model baselines while using only 1.92 experts per series, demonstrating practical advantages for large-scale commercial forecasting systems.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Order Matters: Unveiling the Hidden Impact of Macro Placement Sequences via Proxy-Guided LLM Evolution

Researchers present OrderPlace, an AI framework that optimizes macro placement sequencing in chip design by using large language models to discover superior ordering strategies. The work demonstrates that placement order significantly impacts solution quality in physical design, with novel sequences achieving 34% wirelength reduction compared to existing methods.

GeneralNeutralarXiv – CS AI · Jun 96/10

📰

Trustworthy Smart Fabs via Professional Proxies: Scaling Safe and Sustainable by Design (SSbD) through Industrial Data Spaces

Researchers propose a zero-trust framework using AI-powered 'Professional Proxies' and hardware-isolated trust zones to help semiconductor manufacturers comply with EU sustainability regulations while protecting proprietary data. The approach enables factories to generate cryptographically signed compliance tokens without exposing manufacturing secrets, addressing a growing governance bottleneck across advanced chip production.

AINeutralarXiv – CS AI · Jun 96/10

🧠

RTL-BenchLS: A Large-Scale Benchmark for RTL Reasoning and Generation with Large Language Models

Researchers introduce RTL-BenchLS, a large-scale benchmark containing over 10,000 formally verified Verilog designs for evaluating large language models on hardware design tasks. The benchmark addresses limitations of existing datasets through three novel self-supervised tasks beyond specification-to-RTL generation, with top models achieving only 12-28% accuracy, demonstrating substantial room for improvement in LLM-based hardware automation.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Baichuan-M4: A Clinical-Grade Medical Agent System for Continuous Care

Baichuan Intelligence has unveiled Baichuan-M4, a clinical-grade medical AI system designed for continuous patient care rather than isolated medical queries. The system integrates a specialized runtime environment, advanced reinforcement learning training, and clinical tools including patient memory management and multimodal medical analysis, achieving a 3.3% hallucination rate across multiple medical evaluation benchmarks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

The Token Not Taken: Sampling, State, and the Variability of AI Agent Outputs

A new arXiv paper analyzes the sources of variability in agentic AI systems, distinguishing between token-sampling randomness intrinsic to foundation models and external factors like environmental changes and infrastructure effects. The research clarifies when AI agent outputs are genuinely stochastic versus reproducible, with implications for understanding AI reliability in production deployments.

AINeutralarXiv – CS AI · Jun 96/10

🧠

LATTEArena: An Evaluation Framework for LLM-powered Tabular Feature Engineering (Extended Version)

Researchers introduce LATTEArena, a standardized evaluation framework for comparing LLM-powered tabular feature engineering methods. The framework decomposes 15 representative techniques into reusable components and reveals that Tree-of-Thought combined with Monte Carlo Tree Search offers optimal cost-effectiveness, while RPN and Code formats excel at different task types.

🏢 Meta

AINeutralarXiv – CS AI · Jun 96/10

🧠

A Multi-Agent System for IPMSM Design Optimization via an FEA-AI Hybrid Approach

Researchers propose an automated multi-agent AI system for optimizing Interior Permanent Magnet Synchronous Motor (IPMSM) design that combines retrieval-augmented generation, finite element analysis, and machine learning surrogates. The framework addresses traditional bottlenecks in motor design by automating problem setup, reducing computational costs, and improving prediction reliability through uncertainty-aware switching between AI inference and high-fidelity simulation.

AINeutralarXiv – CS AI · Jun 96/10

🧠

REFLECT: Intervention-Supported Error Attribution for Silent Failures in LLM Agent Traces

REFLECT is a new method for identifying errors in long reasoning traces produced by LLM agents, particularly addressing the challenging "silent failure" problem where outputs appear plausible but are incorrect. The approach improves upon existing error-localization techniques by using controlled replay and contrastive evidence to refine error attribution, achieving higher accuracy across multiple benchmarks without requiring ground-truth answers.

AINeutralarXiv – CS AI · Jun 95/10

🧠

DynaOD: Dynamic Origin-Destination Flow Generation with Discrete-to-Continuous Temporal Semantic Modeling

DynaOD is a machine learning framework that generates realistic urban mobility patterns by modeling temporal dynamics through discrete directional trends and continuous evolution, without requiring historical origin-destination data. The approach uses semantic temporal signals to condition pretrained OD generators, achieving better accuracy and distributional fidelity than existing methods with cross-city transferability.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Graph2Idea:Retrieval-Augmented Scientific Idea Generation with Graph-Structured Contexts

Researchers propose Graph2Idea, an AI framework that uses knowledge graphs to improve scientific idea generation by converting retrieved papers into structured knowledge relationships rather than flat text. The method demonstrates significant improvements in novelty, quality, and feasibility of generated research ideas compared to existing LLM-based approaches.

AIBullisharXiv – CS AI · Jun 96/10

🧠

A Regret Minimization Framework on Preference Learning in Large Language Models

Researchers introduce Regret-based Preference Optimization (RePO), a new framework for training large language models that reinterprets reinforcement learning from human feedback (RLHF) through regret minimization rather than reward maximization. The approach models human preferences as behavior-conditioned assessments of relative suboptimality, showing consistent performance gains on mathematical reasoning and preference benchmarks.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation

Researchers propose Dual-Path Vision Token Routing (DPVR), a framework that optimizes multimodal large language models by routing vision tokens away from deep transformer layers where they saturate early, instead fusing visual and textual information only in the final layer. The approach reduces computational overhead by 3% while maintaining competitive performance, challenging the assumption that vision tokens must traverse all deep language-model layers.

AINeutralarXiv – CS AI · Jun 96/10

🧠

IMUG-Bench: Benchmarking Unified Multimodal Models on Interleaved Understanding and Generation

Researchers introduce IMUG-Bench, a comprehensive benchmark designed to evaluate unified multimodal models (UMMs) on their ability to handle multi-turn interleaved image-text dialogues. The benchmark reveals that current models struggle with exposure bias in generation tasks and that test-time scaling strategies like Chain-of-Thought can improve performance.

AINeutralarXiv – CS AI · Jun 96/10

🧠

MASS: Deep Research for Social Sciences with Memory-Augmented Social Simulation

Researchers introduce MASS (Memory-Augmented Social Simulation), a framework that enhances LLM-based research agents by integrating realistic social simulations rather than relying solely on literature retrieval. The system combines dynamic goal-path planning, multi-disciplinary behavior datasets, and an Ebbinghaus-inspired forgetting mechanism to improve research creativity and empirical grounding, achieving 6.81% quality improvement and 17.19% insight gains over baseline LLMs.

AINeutralarXiv – CS AI · Jun 96/10

🧠

FF-JEPA: Long-Horizon Planning in World Models with Latent Planners

Researchers propose FF-JEPA, a hierarchical world model architecture that enables long-horizon planning by combining action-conditioned and action-free latent planners, eliminating the need for explicit goal images and addressing computational inefficiencies in previous JEPA-based planning approaches.

AINeutralarXiv – CS AI · Jun 96/10

🧠

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

TRL-Bench introduces a standardized benchmark for evaluating tabular data encoders across different training paradigms, releasing curated datasets and demonstrating that encoder quality is task-dependent rather than universally superior. The framework enables fair comparison of 20 models across representation-level tasks, revealing that no single encoder dominates across all scenarios.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Leveraging Structural Constraints for Diffusion-based Neural TSP Solvers

Researchers introduce Projected Consistency Inference (PCI), a neural optimization method that solves the Traveling Salesman Problem more efficiently than gradient-based approaches by using structure-aware projections and local search instead of computationally expensive refinement. PCI achieves better optimality gaps (0.17% for 500 cities, 0.31% for 1000 cities) while reducing inference time by 30-40% compared to state-of-the-art FT2T methods.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Capability-Aligned Hierarchical Learning for Tool-Augmented LLMs

Researchers propose Capability-Aligned Hierarchical Learning (CAHL), a method that jointly optimizes high-level planning and low-level tool execution in large language models using reinforcement learning. The approach addresses a critical misalignment problem in hierarchical LLM systems where planners and executors operate independently, demonstrating improved performance across multiple tool-use benchmarks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

From Coarse to Fine: Managing Temporal Granularity in Spatio-Temporal Data for Fine-Grained Traffic Prediction

Researchers propose STRP, a machine learning framework that predicts fine-grained traffic patterns from coarse-grained historical data, addressing a critical mismatch between how traffic data is stored and how it needs to be used. The solution combines tree convolution and inverse dilated convolution to efficiently model spatial and temporal dependencies, outperforming existing approaches while reducing computational overhead.

AINeutralarXiv – CS AI · Jun 96/10

🧠

RunAgent SuperBrowser: A Theory of Autonomous Web Navigation Grounded in Human Browsing Behaviour

RunAgent has developed SuperBrowser, an autonomous web navigation agent that mimics human browsing behavior through selective perception and structured memory management. The system achieves 89.47% success on the Mind2Web Hard benchmark, outperforming all published open-source baselines by applying consistent cognitive principles throughout its architecture.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Correct Looks Better: Pairwise Comparisons Reveal Accuracy Rankings

A new study demonstrates that pairwise comparison methods like Elo, commonly used to evaluate generative AI models, produce rankings that correlate strongly (>0.9 Spearman correlation) with ground-truth accuracy benchmarks. The research shows these comparative evaluations substantially outperform direct judging when evaluators are weak and are largely resistant to stylistic bias and judge preference, though minor effects like answer repetition can influence outcomes.

← PrevPage 511 of 1593Next →