AI Pulse News

Models, papers, tools. 16,006 articles with AI-powered sentiment analysis and key takeaways.

16006 articles

AI × CryptoBearishcrypto.news · Mar 277/10

🤖

How to verify an exchanger: red flags, reviews, and proof points

The article discusses the rising threat of AI-powered crypto scams and fake exchanges that exploit user urgency and poor verification practices. It highlights how easily fraudulent crypto platforms can mimic legitimate exchanges to drain user funds.

AIBearisharXiv – CS AI · Mar 277/10

🧠

LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts

Researchers have identified a new vulnerability in large language models called 'natural distribution shifts' where seemingly benign prompts can bypass safety mechanisms to reveal harmful content. They developed ActorBreaker, a novel attack method that uses multi-turn prompts to gradually expose unsafe content, and proposed expanding safety training to address this vulnerability.

AIBullisharXiv – CS AI · Mar 277/10

🧠

LLM4AD: Large Language Models for Autonomous Driving -- Concept, Review, Benchmark, Experiments, and Future Trends

Researchers have published a comprehensive review of Large Language Models for Autonomous Driving (LLM4AD), introducing new benchmarks and conducting real-world experiments on autonomous vehicle platforms. The paper explores how LLMs can enhance perception, decision-making, and motion control in self-driving cars, while identifying key challenges including latency, security, and safety concerns.

AIBearisharXiv – CS AI · Mar 277/10

🧠

The LLM Bottleneck: Why Open-Source Vision LLMs Struggle with Hierarchical Visual Recognition

Research reveals that open-source large language models (LLMs) lack hierarchical knowledge of visual taxonomies, creating a bottleneck for vision LLMs in hierarchical visual recognition tasks. The study used one million visual question answering tasks across six taxonomies to demonstrate this limitation, finding that even fine-tuning cannot overcome the underlying LLM knowledge gaps.

AIBullisharXiv – CS AI · Mar 277/10

🧠

The Future of AI-Driven Software Engineering

A paradigm shift is occurring in software engineering as AI systems like LLMs increasingly boost development productivity. The paper presents a vision for growing symbiotic partnerships between human developers and AI, identifying key research challenges the software engineering community must address.

AIBullisharXiv – CS AI · Mar 277/10

🧠

DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

Researchers introduce DRIFT, a new security framework designed to protect AI agents from prompt injection attacks through dynamic rule enforcement and memory isolation. The system uses a three-component approach with a Secure Planner, Dynamic Validator, and Injection Isolator to maintain security while preserving functionality across diverse AI models.

AIBearisharXiv – CS AI · Mar 277/10

🧠

Shape and Substance: Dual-Layer Side-Channel Attacks on Local Vision-Language Models

Researchers discovered significant privacy vulnerabilities in local Vision-Language Models that use Dynamic High-Resolution preprocessing. The dual-layer attack framework can exploit execution-time variations and cache patterns to infer sensitive information about processed images, even when models run locally for privacy.

AINeutralarXiv – CS AI · Mar 277/10

🧠

How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models

Researchers conducted the first systematic study of how weight pruning affects language model representations using Sparse Autoencoders across multiple models and pruning methods. The study reveals that rare features survive pruning better than common ones, suggesting pruning acts as implicit feature selection that preserves specialized capabilities while removing generic features.

🧠 Llama

AIBullisharXiv – CS AI · Mar 277/10

🧠

AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer's Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study

Researchers developed AD-CARE, an AI agent that uses large language models to diagnose Alzheimer's disease from incomplete medical data across multiple modalities. The system achieved 84.9% diagnostic accuracy across 10,303 cases and improved physician decision-making speed and accuracy in clinical studies.

AIBullisharXiv – CS AI · Mar 277/10

🧠

GlowQ: Group-Shared LOw-Rank Approximation for Quantized LLMs

Researchers propose GlowQ, a new quantization technique for large language models that reduces memory overhead and latency while maintaining accuracy. The method uses group-shared low-rank approximation to optimize deployment of quantized LLMs, showing significant performance improvements over existing approaches.

🏢 Perplexity

AIBullisharXiv – CS AI · Mar 277/10

🧠

Decidable By Construction: Design-Time Verification for Trustworthy AI

Researchers propose a framework for verifying AI model properties at design time rather than after deployment, using algebraic constraints over finitely generated abelian groups. The approach eliminates computational overhead of post-hoc verification by building trustworthiness into the model architecture from the start.

AINeutralarXiv – CS AI · Mar 277/10

🧠

CRAFT: Grounded Multi-Agent Coordination Under Partial Information

Researchers introduce CRAFT, a multi-agent benchmark that evaluates how well large language models coordinate through natural language communication under partial information constraints. The study finds that stronger reasoning abilities don't reliably translate to better coordination, with smaller open-weight models often matching or outperforming frontier systems in collaborative tasks.

AINeutralarXiv – CS AI · Mar 277/10

🧠

Does Explanation Correctness Matter? Linking Computational XAI Evaluation to Human Understanding

A user study with 200 participants found that while explanation correctness in AI systems affects human understanding, the relationship is not linear - performance drops significantly at 70% correctness but doesn't degrade further below that threshold. The research challenges assumptions that higher computational correctness metrics automatically translate to better human comprehension of AI decisions.

AINeutralarXiv – CS AI · Mar 277/10

🧠

ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

Researchers introduce ARC-AGI-3, a new benchmark for testing agentic AI systems that focuses on fluid adaptive intelligence without relying on language or external knowledge. While humans can solve 100% of the benchmark's abstract reasoning tasks, current frontier AI systems score below 1% as of March 2026.

AIBullisharXiv – CS AI · Mar 277/10

🧠

A Wireless World Model for AI-Native 6G Networks

Researchers introduce the Wireless World Model (WWM), a multi-modal AI framework for 6G networks that predicts wireless channel evolution by understanding electromagnetic wave propagation through 3D geometry. The model demonstrates superior performance across five downstream tasks and real-world measurements, outperforming existing foundation models.

AINeutralarXiv – CS AI · Mar 277/10

🧠

WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing

Researchers introduced WebTestBench, a new benchmark for evaluating automated web testing using AI agents and large language models. The study reveals significant gaps between current AI capabilities and industrial deployment needs, with LLMs struggling with test completeness, defect detection, and long-term interaction reliability.

AIBullisharXiv – CS AI · Mar 277/10

🧠

Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model

Researchers propose HIVE, a new framework for training large language models more efficiently in reinforcement learning by selecting high-utility prompts before rollout. The method uses historical reward data and prompt entropy to identify the 'learning edge' where models learn most effectively, significantly reducing computational overhead without performance loss.

AIBearisharXiv – CS AI · Mar 277/10

🧠

PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems

Researchers have developed PIDP-Attack, a new cybersecurity threat that combines prompt injection with database poisoning to manipulate AI responses in Retrieval-Augmented Generation (RAG) systems. The attack method demonstrated 4-16% higher success rates than existing techniques across multiple benchmark datasets and eight different large language models.

AINeutralarXiv – CS AI · Mar 277/10

🧠

Imperative Interference: Social Register Shapes Instruction Topology in Large Language Models

Research reveals that large language models process instructions differently across languages due to social register variations, with imperative commands carrying different obligatory force in different speech communities. The study found that declarative rewording of instructions reduces cross-linguistic variance by 81% and suggests models treat instructions as social acts rather than technical specifications.

AINeutralarXiv – CS AI · Mar 277/10

🧠

When Is Collective Intelligence a Lottery? Multi-Agent Scaling Laws for Memetic Drift in LLMs

Researchers introduce Quantized Simplex Gossip (QSG) model to explain how multi-agent LLM systems reach consensus through 'memetic drift' - where arbitrary choices compound into collective agreement. The study reveals scaling laws for when collective intelligence operates like a lottery versus amplifying weak biases, providing a framework for understanding AI system behavior in consequential decision-making.

AINeutralarXiv – CS AI · Mar 277/10

🧠

Closing the Confidence-Faithfulness Gap in Large Language Models

Researchers have identified a fundamental issue in large language models where verbalized confidence scores don't align with actual accuracy due to orthogonal encoding of these signals. They discovered a 'Reasoning Contamination Effect' where simultaneous reasoning disrupts confidence calibration, and developed a two-stage adaptive steering pipeline to improve alignment.

AIBearisharXiv – CS AI · Mar 277/10

🧠

A Decade-Scale Benchmark Evaluating LLMs' Clinical Practice Guidelines Detection and Adherence in Multi-turn Conversations

Researchers introduced CPGBench, a benchmark evaluating how well Large Language Models detect and follow clinical practice guidelines in healthcare conversations. The study found that while LLMs can detect 71-90% of clinical recommendations, they only adhere to guidelines 22-63% of the time, revealing significant gaps for safe medical deployment.

AIBearisharXiv – CS AI · Mar 277/10

🧠

The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Creates Exploitable Vulnerabilities

Research reveals that LLM system prompt configuration creates massive security vulnerabilities, with the same model's phishing detection rates ranging from 1% to 97% based solely on prompt design. The study PhishNChips demonstrates that more specific prompts can paradoxically weaken AI security by replacing robust multi-signal reasoning with exploitable single-signal dependencies.

AIBearisharXiv – CS AI · Mar 277/10

🧠

Epistemic Bias Injection: Biasing LLMs via Selective Context Retrieval

Researchers have identified a new attack vector called Epistemic Bias Injection (EBI) that manipulates AI language models by injecting factually correct but biased content into retrieval-augmented generation databases. The attack steers model outputs toward specific viewpoints while evading traditional detection methods, though a new defense mechanism called BiasDef shows promise in mitigating these threats.

AINeutralarXiv – CS AI · Mar 277/10

🧠

AI Security in the Foundation Model Era: A Comprehensive Survey from a Unified Perspective

Researchers propose a unified framework for AI security threats that categorizes attacks based on four directional interactions between data and models. The comprehensive taxonomy addresses vulnerabilities in foundation models through four categories: data-to-data, data-to-model, model-to-data, and model-to-model attacks.

← PrevPage 86 of 641Next →