🧠

AI

12,714 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12714 articles

AINeutralarXiv – CS AI · Apr 146/10

🧠

Network Effects and Agreement Drift in LLM Debates

Researchers examining LLM agent behavior in simulated debates discovered a phenomenon called 'agreement drift,' where AI agents systematically shift toward specific positions on opinion scales in ways that don't mirror human behavior. The study reveals critical biases in using LLMs as proxies for human social systems, particularly when modeling minority groups or unbalanced social contexts.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration

Researchers propose NExt, a nonlinear extrapolation framework that accelerates reinforcement learning with verifiable rewards (RLVR) for large language models by modeling low-rank parameter trajectories. The method reduces computational overhead by approximately 37.5% while remaining compatible with various RLVR algorithms, addressing a key bottleneck in scaling LLM training.

AINeutralarXiv – CS AI · Apr 146/10

🧠

SLALOM: Simulation Lifecycle Analysis via Longitudinal Observation Metrics for Social Simulation

Researchers introduce SLALOM, a validation framework addressing the credibility crisis of LLM-based social simulations by shifting focus from outcome accuracy to process fidelity. The framework uses Dynamic Time Warping to compare simulated trajectories against empirical data across intermediate checkpoints, enabling quantitative assessment of whether simulations achieve realistic social mechanisms rather than merely correct endpoints.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization

Researchers propose Policy Split, a novel reinforcement learning approach for LLMs that uses dual-mode entropy regularization to balance exploration with task accuracy. By bifurcating policy into normal and high-entropy modes, the method enables diverse behavioral patterns while maintaining performance, showing improvements over existing entropy-guided RL baselines.

AINeutralarXiv – CS AI · Apr 146/10

🧠

NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment

Researchers introduced NovBench, the first large-scale benchmark for evaluating how well large language models can assess research novelty in academic papers. The benchmark comprises 1,684 paper-review pairs from a leading NLP conference and reveals that current LLMs struggle with scientific novelty comprehension despite promise in peer review support.

AINeutralarXiv – CS AI · Apr 146/10

🧠

A Triadic Suffix Tokenization Scheme for Numerical Reasoning

Researchers propose Triadic Suffix Tokenization (TST), a novel tokenization scheme that addresses how large language models process numbers by fragmenting digits into three-digit groups with explicit magnitude markers. The method aims to improve arithmetic and scientific reasoning in LLMs by preserving decimal structure and positional information, with two implementation variants offering scalability across 33 orders of magnitude.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Layerwise Dynamics for In-Context Classification in Transformers

Researchers have developed a method to make transformer neural networks interpretable by studying how they perform in-context classification from few examples. By enforcing permutation equivariance constraints, they extracted an explicit algorithmic update rule that reveals how transformers dynamically adjust to new data, offering the first identifiable recursion of this kind.

AIBullisharXiv – CS AI · Apr 146/10

🧠

CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead

Researchers propose CUTEv2, a unified matrix extension architecture for CPUs that decouples matrix units from the pipeline to enable efficient AI workload processing across diverse architectures. The design achieves significant speedups (1.57x-2.31x) on major AI models while occupying minimal silicon area (0.53 mm² in 14nm), demonstrating practical viability for open-source CPU development.

🧠 Llama

AINeutralarXiv – CS AI · Apr 146/10

🧠

RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents

RPA-Check introduces an automated four-stage framework for evaluating Large Language Model-based Role-Playing Agents in complex scenarios, addressing the gap in standard NLP metrics for assessing role adherence and narrative consistency. Testing across legal scenarios reveals that smaller, instruction-tuned models (8-9B parameters) outperform larger models in procedural consistency, suggesting optimal performance doesn't correlate with model scale.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

Researchers introduce ToM-SB, a novel challenge where AI defenders must use theory-of-mind reasoning to deceive attackers trying to extract sensitive information. Through reinforcement learning, trained models outperform frontier LLMs like GPT-4 and Gemini-Pro, revealing an emergent bidirectional relationship between belief modeling and deception capabilities.

🧠 GPT-5

AINeutralarXiv – CS AI · Apr 146/10

🧠

Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning

Researchers introduce Legal2LogicICL, an LLM-based framework that improves the conversion of natural-language legal cases into logical formulas through retrieval-augmented few-shot learning. The method addresses data scarcity in legal AI systems and introduces a new annotated dataset (Legal2Proleg) to advance interpretable legal reasoning without requiring model fine-tuning.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Fairness is Not Flat: Geometric Phase Transitions Against Shortcut Learning

Researchers propose a geometric methodology using a Topological Auditor to detect and eliminate shortcut learning in deep neural networks, forcing models to learn fair representations. The approach reduces demographic bias vulnerabilities from 21.18% to 7.66% while operating more efficiently than existing post-hoc debiasing techniques.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Discourse Diversity in Multi-Turn Empathic Dialogue

Researchers demonstrate that large language models exhibit excessive repetition of discourse tactics in multi-turn empathic conversations, reusing communication strategies at nearly double the human rate. They introduce MINT, a reinforcement learning framework that optimizes for both empathy quality and discourse move diversity, achieving 25.3% improvements in empathy while reducing repetitive tactics by 26.3%.

AIBullisharXiv – CS AI · Apr 146/10

🧠

StarVLA-$\alpha$: Reducing Complexity in Vision-Language-Action Systems

StarVLA-α introduces a simplified baseline architecture for Vision-Language-Action robotic systems that achieves competitive performance across multiple benchmarks without complex engineering. The model demonstrates that a strong vision-language backbone combined with minimal design choices can match or exceed existing specialized approaches, suggesting the VLA field has been over-engineered.

AINeutralarXiv – CS AI · Apr 146/10

🧠

A Mechanistic Analysis of Looped Reasoning Language Models

Researchers conducted a mechanistic analysis of looped reasoning language models, discovering that these recurrent architectures learn inference stages similar to feedforward models but execute them iteratively. The study reveals that recurrent blocks converge to distinct fixed points with stable attention behavior, providing architectural insights for improving LLM reasoning capabilities.

AINeutralarXiv – CS AI · Apr 146/10

🧠

C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts

Researchers have introduced C-ReD, a Chinese benchmark dataset for detecting AI-generated text that addresses gaps in model diversity and data homogeneity. The dataset, derived from real-world prompts, demonstrates reliable in-domain detection and strong generalization to unseen language models, with resources publicly available on GitHub.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Interactive Learning for LLM Reasoning

Researchers introduce ILR, a novel multi-agent learning framework that enables Large Language Models to enhance their independent reasoning through interactive training with other LLMs, then solve problems autonomously without re-executing the multi-agent system. The approach combines dynamic interaction strategies and perception calibration, delivering up to 5% performance improvements across mathematical, coding, and reasoning benchmarks.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Advancing Reasoning in Diffusion Language Models with Denoising Process Rewards

Researchers introduce a novel reinforcement learning approach for diffusion-based language models that uses process-level rewards during the denoising trajectory, rather than outcome-based rewards alone. This method improves reasoning stability and interpretability while enabling practical supervision at scale, advancing the capability of non-autoregressive text generation systems.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Plug-and-Play Dramaturge: A Divide-and-Conquer Approach for Iterative Narrative Script Refinement via Collaborative LLM Agents

Researchers propose Dramaturge, a multi-agent LLM system that uses hierarchical divide-and-conquer methodology to iteratively refine narrative scripts. The approach addresses limitations in single-pass LLM generation by coordinating global structural reviews with scene-level refinements across multiple iterations, demonstrating superior output quality compared to baseline methods.

AINeutralarXiv – CS AI · Apr 146/10

🧠

X-SYS: A Reference Architecture for Interactive Explanation Systems

Researchers introduce X-SYS, a reference architecture for building interactive explanation systems that operationalize explainable AI (XAI) across production environments. The framework addresses the gap between XAI algorithms and deployable systems by organizing around four quality attributes (scalability, traceability, responsiveness, adaptability) and five service components, with SemanticLens as a concrete implementation for vision-language models.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Do Machines Fail Like Humans? A Human-Centred Out-of-Distribution Spectrum for Mapping Error Alignment

Researchers propose a human-centered framework for evaluating whether AI systems fail in ways similar to humans by measuring out-of-distribution performance across a spectrum of perceptual difficulty rather than arbitrary distortion levels. Testing this approach on vision models reveals that vision-language models show the most consistent human alignment, while CNNs and ViTs demonstrate regime-dependent performance differences depending on task difficulty.

AINeutralarXiv – CS AI · Apr 146/10

🧠

SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents

Researchers introduce SEARL, a self-evolving agent framework that optimizes policy and tool memory jointly to enable efficient learning in resource-constrained environments. The approach addresses limitations of existing methods by constructing structured experience memory that densifies sparse rewards and facilitates tool reuse across tasks.

AINeutralarXiv – CS AI · Apr 146/10

🧠

SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions

Researchers introduce SciTune, a framework for fine-tuning large language models with human-curated scientific multimodal instructions from academic publications. The resulting LLaMA-SciTune model demonstrates superior performance on scientific benchmarks compared to state-of-the-art alternatives, with results suggesting that high-quality human-generated data outweighs the volume advantage of synthetic training data for specialized scientific tasks.

AIBullisharXiv – CS AI · Apr 146/10

🧠

An Iterative Utility Judgment Framework Inspired by Philosophical Relevance via LLMs

Researchers propose ITEM, an iterative utility judgment framework that enhances retrieval-augmented generation (RAG) systems by aligning with philosophical principles of relevance. The framework improves how large language models prioritize and process information from retrieval results, demonstrating measurable improvements across multiple benchmarks in ranking, utility assessment, and answer generation.

AINeutralarXiv – CS AI · Apr 146/10

🧠

The Phantom of PCIe: Constraining Generative Artificial Intelligences for Practical Peripherals Trace Synthesizing

Researchers introduce Phantom, a framework that combines generative AI with constraint-based post-processing to synthesize valid PCIe protocol traces for hardware simulation. The system addresses a critical limitation of naive AI generation—hallucination of protocol-violating sequences—achieving up to 1000x improvements in task-specific metrics compared to existing approaches.

← PrevPage 148 of 509Next →