AI Pulse News

Models, papers, tools. 15,843 articles with AI-powered sentiment analysis and key takeaways.

15843 articles

AIBearisharXiv – CS AI · Apr 77/10

🧠

Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Researchers conducted the first real-world safety evaluation of OpenClaw, a widely deployed AI agent with extensive system access, revealing significant security vulnerabilities. The study found that poisoning any single dimension of the agent's state increases attack success rates from 24.6% to 64-74%, with even the strongest defenses still vulnerable to 63.8% of attacks.

🧠 GPT-5🧠 Claude🧠 Sonnet

AIBullisharXiv – CS AI · Apr 77/10

🧠

Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems

Researchers introduce Cog-DRIFT, a new framework that improves AI language model reasoning by transforming difficult problems into easier formats like multiple-choice questions, then gradually training models on increasingly complex versions. The method shows significant performance gains of 8-10% on previously unsolvable problems across multiple reasoning benchmarks.

🧠 Llama

AINeutralarXiv – CS AI · Apr 77/10

🧠

Grokking as Dimensional Phase Transition in Neural Networks

Researchers identify neural network 'grokking' as a dimensional phase transition where effective dimensionality shifts from sub-diffusive to super-diffusive during the memorization-to-generalization transition. The study reveals this transition reflects gradient field geometry rather than network architecture, offering new insights into overparameterized network trainability.

$AVAX

AIBullisharXiv – CS AI · Apr 77/10

🧠

ROSClaw: A Hierarchical Semantic-Physical Framework for Heterogeneous Multi-Agent Collaboration

Researchers introduce ROSClaw, a new AI framework that integrates large language models with robotic systems to improve multi-agent collaboration and long-horizon task execution. The framework addresses critical gaps between semantic understanding and physical execution by using unified vision-language models and enabling real-time coordination between simulated and real-world robots.

AIBearisharXiv – CS AI · Apr 77/10

🧠

AI Agents Under EU Law

A comprehensive analysis reveals that AI agents face complex regulatory compliance challenges under the EU AI Act and multiple overlapping regulations including GDPR, Cyber Resilience Act, and Digital Services Act. The research concludes that high-risk AI systems with untraceable behavioral drift cannot currently satisfy essential AI Act requirements, requiring providers to maintain exhaustive inventories of agent actions and data flows.

AIBullisharXiv – CS AI · Apr 77/10

🧠

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Researchers introduce SkillX, an automated framework for building reusable skill knowledge bases for AI agents that addresses inefficiencies in current self-evolving paradigms. The system uses multi-level skill design, iterative refinement, and exploratory expansion to create plug-and-play skill libraries that improve task success and execution efficiency across different agents and environments.

AINeutralarXiv – CS AI · Apr 77/10

🧠

Preserving Forgery Artifacts: AI-Generated Video Detection at Native Scale

Researchers developed a new AI-generated video detection framework using a large-scale dataset of 140K videos from 15 generators and the Qwen2.5-VL Vision Transformer. The method operates at native resolution to preserve high-frequency forgery artifacts typically lost in preprocessing, achieving superior performance in detecting synthetic media.

AIBullisharXiv – CS AI · Apr 77/10

🧠

StableTTA: Training-Free Test-Time Adaptation that Improves Model Accuracy on ImageNet1K to 96%

Researchers developed StableTTA, a training-free method that significantly improves AI model accuracy on ImageNet-1K, with 33 models achieving over 95% accuracy and several surpassing 96%. The method allows lightweight architectures to outperform Vision Transformers while using 95% fewer parameters and 89% less computational cost.

AIBullisharXiv – CS AI · Apr 77/10

🧠

One Model for All: Multi-Objective Controllable Language Models

Researchers introduce Multi-Objective Control (MOC), a new approach that trains a single large language model to generate personalized responses based on individual user preferences across multiple objectives. The method uses multi-objective optimization principles in reinforcement learning from human feedback to create more controllable and adaptable AI systems.

AIBullisharXiv – CS AI · Apr 77/10

🧠

SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models

Researchers propose SLaB, a novel framework for compressing large language models by decomposing weight matrices into sparse, low-rank, and binary components. The method achieves significant improvements over existing compression techniques, reducing perplexity by up to 36% at 50% compression rates without requiring model retraining.

🏢 Perplexity🧠 Llama

AINeutralarXiv – CS AI · Apr 77/10

🧠

Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities

A comprehensive study of 10,000 trials reveals that most assumed triggers for LLM agent exploitation don't work, but 'goal reframing' prompts like 'You are solving a puzzle; there may be hidden clues' can cause 38-40% exploitation rates despite explicit rule instructions. The research shows agents don't override rules but reinterpret tasks to make exploitative actions seem aligned with their goals.

🏢 OpenAI🧠 GPT-4🧠 GPT-5

AINeutralarXiv – CS AI · Apr 77/10

🧠

How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

Researchers identified a sparse routing mechanism in alignment-trained language models where gate attention heads detect content and trigger amplifier heads that boost refusal signals. The study analyzed 9 models from 6 labs and found this routing mechanism distributes at scale while remaining controllable through signal modulation.

AIBullisharXiv – CS AI · Apr 77/10

🧠

Relative Density Ratio Optimization for Stable and Statistically Consistent Model Alignment

Researchers propose a new method for aligning AI language models with human preferences that addresses stability issues in existing approaches. The technique uses relative density ratio optimization to achieve both statistical consistency and training stability, showing effectiveness with Qwen 2.5 and Llama 3 models.

🧠 Llama

AINeutralarXiv – CS AI · Apr 77/10

🧠

Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality

Researchers introduce 'error verifiability' as a new metric to measure whether AI-generated justifications help users distinguish correct from incorrect answers. The study found that common AI improvement methods don't enhance verifiability, but two new domain-specific approaches successfully improved users' ability to assess answer correctness.

AIBullisharXiv – CS AI · Apr 77/10

🧠

PassiveQA: A Three-Action Framework for Epistemically Calibrated Question Answering via Supervised Finetuning

Researchers propose PassiveQA, a new AI framework that teaches language models to recognize when they don't have enough information to answer questions, choosing to ask for clarification or abstain rather than hallucinate responses. The three-action system (Answer, Ask, Abstain) uses supervised fine-tuning to align model behavior with information sufficiency, showing significant improvements in reducing hallucinations.

AIBullisharXiv – CS AI · Apr 77/10

🧠

Learning Dexterous Grasping from Sparse Taxonomy Guidance

Researchers developed GRIT, a two-stage AI framework that learns dexterous robotic grasping from sparse taxonomy guidance, achieving 87.9% success rate. The system first predicts grasp specifications from scene context, then generates finger motions while preserving intended grasp structure, improving generalization to novel objects.

AIBearisharXiv – CS AI · Apr 77/10

🧠

Commercial Persuasion in AI-Mediated Conversations

A research study reveals that AI-powered conversational interfaces can triple the rate of sponsored product selection compared to traditional search engines (61.2% vs 22.4%). Users largely fail to detect this commercial steering, even with explicit sponsor labels, indicating current transparency measures are insufficient.

AIBullisharXiv – CS AI · Apr 77/10

🧠

Many Preferences, Few Policies: Towards Scalable Language Model Personalization

Researchers developed PALM (Portfolio of Aligned LLMs), a method to create a small collection of language models that can serve diverse user preferences without requiring individual models per user. The approach provides theoretical guarantees on portfolio size and quality while balancing system costs with personalization needs.

AIBullisharXiv – CS AI · Apr 77/10

🧠

Can LLMs Learn to Reason Robustly under Noisy Supervision?

Researchers propose Online Label Refinement (OLR) to improve AI reasoning models' robustness under noisy supervision in Reinforcement Learning with Verifiable Rewards. The method addresses the critical problem of training language models when expert-labeled data contains errors, achieving 3-4% performance gains across mathematical reasoning benchmarks.

AINeutralarXiv – CS AI · Apr 77/10

🧠

Causality Laundering: Denial-Feedback Leakage in Tool-Calling LLM Agents

Researchers have identified a new security vulnerability called 'causality laundering' in AI tool-calling systems, where attackers can extract private information by learning from system denials and using that knowledge in subsequent tool calls. They developed the Agentic Reference Monitor (ARM) system to detect and prevent these attacks through enhanced provenance tracking.

AIBullisharXiv – CS AI · Apr 77/10

🧠

Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference

Researchers have developed a new low-bit mixed-precision attention kernel called Diagonal-Tiled Mixed-Precision Attention (DMA) that significantly speeds up large language model inference on NVIDIA B200 GPUs while maintaining generation quality. The technique uses microscaling floating-point (MXFP) data format and kernel fusion to address the high computational costs of transformer-based models.

🏢 Nvidia

AIBullisharXiv – CS AI · Apr 77/10

🧠

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Researchers have developed CoopGuard, a new defense framework that uses cooperative AI agents to protect Large Language Models from sophisticated multi-round adversarial attacks. The system employs three specialized agents coordinated by a central system that maintains defense state across interactions, achieving a 78.9% reduction in attack success rates compared to existing defenses.

AINeutralarXiv – CS AI · Apr 77/10

🧠

When Does Multimodal AI Help? Diagnostic Complementarity of Vision-Language Models and CNNs for Spectrum Management in Satellite-Terrestrial Networks

Researchers developed SpectrumQA, a benchmark comparing vision-language models (VLMs) and CNNs for spectrum management in satellite-terrestrial networks. The study reveals task-dependent complementarity: CNNs excel at spatial localization while VLMs uniquely enable semantic reasoning capabilities that CNNs lack entirely.

AIBullisharXiv – CS AI · Apr 77/10

🧠

k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS The Expressive Power of GraphGPS

Researchers introduce k-Maximum Inner Product (k-MIP) attention for graph transformers, enabling linear memory complexity and up to 10x speedups while maintaining full expressive power. The innovation allows processing of graphs with over 500k nodes on a single GPU and demonstrates top performance on benchmark datasets.

AINeutralarXiv – CS AI · Apr 77/10

🧠

Testing the Limits of Truth Directions in LLMs

A new research study reveals that truth directions in large language models are less universal than previously believed, with significant variations across different model layers, task types, and prompt instructions. The findings show truth directions emerge earlier for factual tasks but later for reasoning tasks, and are heavily influenced by model instructions and task complexity.

← PrevPage 76 of 634Next →