AI Pulse News

Models, papers, tools. 17,128 articles with AI-powered sentiment analysis and key takeaways.

17128 articles

AIBullisharXiv – CS AI · Mar 117/10

🧠

Real-Time Trust Verification for Safe Agentic Actions using TrustBench

Researchers introduced TrustBench, a real-time verification framework that prevents harmful actions by AI agents before execution, achieving 87% reduction in harmful actions across multiple tasks. The system uses domain-specific plugins for healthcare, finance, and technical domains with sub-200ms latency, marking a shift from post-execution evaluation to preventive action verification.

AIBullisharXiv – CS AI · Mar 117/10

🧠

Logos: An evolvable reasoning engine for rational molecular design

Researchers introduce Logos, a compact AI model that combines multi-step logical reasoning with chemical consistency for molecular design. The model achieves strong performance in structural accuracy and chemical validity while using fewer parameters than larger language models, and provides transparent reasoning that can be inspected by humans.

AIBearisharXiv – CS AI · Mar 117/10

🧠

Abundant Intelligence and Deficient Demand: A Macro-Financial Stress Test of Rapid AI Adoption

A research paper presents a macro-financial stress test analyzing rapid AI adoption, identifying a critical mismatch between AI-generated abundance and demand deficiency due to economic institutions anchored to human cognitive scarcity. The study finds that high-income earners face the highest AI exposure, potentially triggering explosive crises in $2.5 trillion private credit and $13 trillion mortgage markets through displacement spirals and intermediation collapse.

AINeutralarXiv – CS AI · Mar 117/10

🧠

AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems

Researchers have developed an open-source benchmark dataset to evaluate AI systems' compliance with the EU AI Act, specifically focusing on NLP and RAG systems. The dataset enables automated assessment of risk classification, article retrieval, and question-answering tasks, achieving 0.87 and 0.85 F1-scores for prohibited and high-risk scenarios.

AINeutralarXiv – CS AI · Mar 117/10

🧠

Curveball Steering: The Right Direction To Steer Isn't Always Linear

Researchers propose 'Curveball steering', a nonlinear method for controlling large language model behavior that outperforms traditional linear approaches. The study challenges the Linear Representation Hypothesis by showing that LLM activation spaces have substantial geometric distortions that require geometry-aware interventions.

AINeutralarXiv – CS AI · Mar 117/10

🧠

An Empirical Study and Theoretical Explanation on Task-Level Model-Merging Collapse

Researchers have identified a phenomenon called 'merging collapse' where combining independently fine-tuned large language models leads to catastrophic performance degradation. The study reveals that representational incompatibility between tasks, rather than parameter conflicts, is the primary cause of merging failures.

AINeutralarXiv – CS AI · Mar 117/10

🧠

Vibe-Creation: The Epistemology of Human-AI Emergent Cognition

Researchers propose a new theoretical framework called the 'Third Entity' to describe the emergent cognitive formation that arises from human-AI interactions, introducing the concept of 'vibe-creation' as a pre-reflective cognitive mode. The paper argues this represents the automation of tacit knowledge with significant implications for epistemology, education, and how we understand human-AI collaboration.

AINeutralarXiv – CS AI · Mar 117/10

🧠

MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants

Researchers introduce MiniAppBench, a new benchmark for evaluating Large Language Models' ability to generate interactive HTML applications rather than static text responses. The benchmark includes 500 real-world tasks and an agentic evaluation framework called MiniAppEval that uses browser automation for testing.

AIBullisharXiv – CS AI · Mar 117/10

🧠

World2Mind: Cognition Toolkit for Allocentric Spatial Reasoning in Foundation Models

Researchers introduce World2Mind, a training-free spatial intelligence toolkit that enhances foundation models' 3D spatial reasoning capabilities by up to 18%. The system uses 3D reconstruction and cognitive mapping to create structured spatial representations, enabling text-only models to perform complex spatial reasoning tasks.

🧠 GPT-5

AINeutralarXiv – CS AI · Mar 117/10

🧠

OOD-MMSafe: Advancing MLLM Safety from Harmful Intent to Hidden Consequences

Researchers introduce OOD-MMSafe, a new benchmark revealing that current Multimodal Large Language Models fail to identify hidden safety risks up to 67.5% of the time. They developed CASPO framework which dramatically reduces failure rates to under 8% for risk identification in consequence-driven safety scenarios.

AINeutralarXiv – CS AI · Mar 117/10

🧠

Quantifying the Necessity of Chain of Thought through Opaque Serial Depth

Researchers introduce 'opaque serial depth' as a metric to measure how much reasoning large language models can perform without externalizing it through chain of thought processes. The study provides computational bounds for Gemma 3 models and releases open-source tools to calculate these bounds for any neural network architecture.

AIBullisharXiv – CS AI · Mar 117/10

🧠

Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction

Researchers have developed two software techniques (OAS and MBS) that dramatically improve MXFP4 quantization accuracy for Large Language Models, reducing the performance gap with NVIDIA's NVFP4 from 10% to below 1%. This breakthrough makes MXFP4 a viable alternative while maintaining 12% hardware efficiency advantages in tensor cores.

🏢 Nvidia

AINeutralarXiv – CS AI · Mar 117/10

🧠

ALADIN: Accuracy-Latency-Aware Design-space Inference Analysis for Embedded AI Accelerators

Researchers have developed ALADIN, a framework for analyzing accuracy-latency trade-offs in AI accelerators for embedded systems. The tool enables evaluation of quantized neural networks without requiring deployment on target hardware, potentially reducing development time and costs for AI chip designers.

AIBearisharXiv – CS AI · Mar 117/10

🧠

Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems

Research suggests that alignment techniques in large language models may produce collective pathological behaviors when AI agents interact under social pressure. The study found that invisible censorship and complex alignment constraints can lead to harmful group dynamics, challenging current AI safety approaches.

🧠 Llama

AIBullisharXiv – CS AI · Mar 117/10

🧠

ARKV: Adaptive and Resource-Efficient KV Cache Management under Limited Memory Budget for Long-Context Inference in LLMs

Researchers propose ARKV, a new framework for managing memory in large language models that reduces KV cache memory usage by 4x while preserving 97% of baseline accuracy. The adaptive system dynamically allocates precision levels to cached tokens based on attention patterns, enabling more efficient long-context inference without requiring model retraining.

AIBullisharXiv – CS AI · Mar 117/10

🧠

Zipage: Maintain High Request Concurrency for LLM Reasoning through Compressed PagedAttention

Researchers have developed Zipage, a new high-concurrency inference engine for large language models that uses Compressed PagedAttention to solve memory bottlenecks. The system achieves 95% performance of full KV inference engines while delivering over 2.1x speedup on mathematical reasoning tasks.

AINeutralarXiv – CS AI · Mar 117/10

🧠

Diagnosing FP4 inference: a layer-wise and block-wise sensitivity analysis of NVFP4 and MXFP4

Research analyzes FP4 quantization sensitivity across different layers in large language models using NVFP4 and MXFP4 formats on Qwen2.5 models. The study finds MLP projection layers are most sensitive to quantization, while attention layers show substantial robustness to FP4 precision reduction.

AIBullisharXiv – CS AI · Mar 117/10

🧠

Hindsight Credit Assignment for Long-Horizon LLM Agents

Researchers introduced HCAPO, a new framework that uses hindsight credit assignment to improve Large Language Model agents' performance in long-horizon tasks. The system leverages LLMs as post-hoc critics to refine decision-making, achieving 7.7% and 13.8% improvements over existing methods on WebShop and ALFWorld benchmarks respectively.

AIBullisharXiv – CS AI · Mar 117/10

🧠

Large Language Model-Assisted Superconducting Qubit Experiments

Researchers have developed a framework that uses large language models (LLMs) to automate superconducting qubit experiments, potentially streamlining quantum computing research. The system successfully demonstrated autonomous resonator characterization and quantum non-demolition measurements, offering a more user-friendly approach to controlling complex quantum hardware.

AINeutralarXiv – CS AI · Mar 117/10

🧠

Clear, Compelling Arguments: Rethinking the Foundations of Frontier AI Safety Cases

This research paper proposes rethinking safety cases for frontier AI systems by drawing on methodologies from traditional safety-critical industries like aerospace and nuclear. The authors critique current alignment community approaches and present a case study focusing on Deceptive Alignment and CBRN capabilities to establish more robust safety frameworks.

AIBearisharXiv – CS AI · Mar 117/10

🧠

NetDiffuser: Deceiving DNN-Based Network Attack Detection Systems with Diffusion-Generated Adversarial Traffic

Researchers developed NetDiffuser, a framework that uses diffusion models to generate natural adversarial examples capable of deceiving AI-based network intrusion detection systems. The system achieved up to 29.93% higher attack success rates compared to baseline attacks, highlighting significant vulnerabilities in current deep learning-based security systems.

AINeutralarXiv – CS AI · Mar 117/10

🧠

Quantifying Uncertainty in AI Visibility: A Statistical Framework for Generative Search Measurement

A research study reveals that AI-powered search engines like Perplexity, SearchGPT, and Google Gemini produce highly variable citation results for identical queries, making single-run visibility metrics unreliable. The study demonstrates that citation distributions follow power-law patterns with substantial variability, and argues that uncertainty estimates are essential for accurate measurement of domain visibility in generative search.

🏢 OpenAI🏢 Perplexity🧠 Gemini

AIBullisharXiv – CS AI · Mar 117/10

🧠

BiCLIP: Domain Canonicalization via Structured Geometric Transformation

Researchers introduce BiCLIP, a new framework that improves vision-language models' ability to adapt to specialized domains through geometric transformations. The approach achieves state-of-the-art results across 11 benchmarks while maintaining simplicity and low computational requirements.

AIBearisharXiv – CS AI · Mar 117/10

🧠

Security Considerations for Multi-agent Systems

A comprehensive study reveals that multi-agent AI systems (MAS) face distinct security vulnerabilities that existing frameworks inadequately address. The research evaluated 16 AI security frameworks against 193 identified threats across 9 categories, finding that no framework achieves majority coverage in any single category, with non-determinism and data leakage being the most under-addressed areas.

AIBullisharXiv – CS AI · Mar 117/10

🧠

The Missing Memory Hierarchy: Demand Paging for LLM Context Windows

Researchers developed Pichay, a demand paging system that treats LLM context windows like computer memory with hierarchical caching. The system reduces context consumption by up to 93% in production by evicting stale content and managing memory more efficiently, addressing fundamental scalability issues in AI systems.

← PrevPage 115 of 686Next →