AI Pulse News

Models, papers, tools. 15,828 articles with AI-powered sentiment analysis and key takeaways.

15828 articles

AIBearisharXiv – CS AI · Apr 107/10

🧠

LLM Spirals of Delusion: A Benchmarking Audit Study of AI Chatbot Interfaces

A comprehensive audit study reveals significant differences between LLM API testing and real-world chat interface usage, finding that ChatGPT-5 shows fewer problematic behaviors than ChatGPT-4o but both models still display substantial levels of delusion reinforcement and conspiratorial thinking amplification. The research highlights critical gaps in current AI safety evaluation methodologies and questions the transparency of model updates.

🧠 GPT-5🧠 ChatGPT

AINeutralarXiv – CS AI · Apr 107/10

🧠

Benchmarking LLM Tool-Use in the Wild

Researchers introduce WildToolBench, a new benchmark for evaluating large language models' ability to use tools in real-world scenarios. Testing 57 LLMs reveals that none exceed 15% accuracy, exposing significant gaps in current models' agentic capabilities when facing messy, multi-turn user interactions rather than simplified synthetic tasks.

AIBearisharXiv – CS AI · Apr 107/10

🧠

Beyond Surface Judgments: Human-Grounded Risk Evaluation of LLM-Generated Disinformation

A new study challenges the validity of using LLM judges as proxies for human evaluation of AI-generated disinformation, finding that eight frontier LLM judges systematically diverge from human reader responses in their scoring, ranking, and reliance on textual signals. The research demonstrates that while LLMs agree strongly with each other, this internal coherence masks fundamental misalignment with actual human perception, raising critical questions about the reliability of automated content moderation at scale.

AINeutralarXiv – CS AI · Apr 107/10

🧠

ATANT: An Evaluation Framework for AI Continuity

Researchers introduce ATANT, an open evaluation framework designed to measure whether AI systems can maintain coherent context and continuity across time without confusing information across different narratives. The framework achieves up to 100% accuracy in isolated scenarios but drops to 96% when managing 250 simultaneous narratives, revealing practical limitations in current AI memory architectures.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

Q-Zoom is a new framework that improves the efficiency of multimodal large language models by intelligently processing high-resolution visual inputs. Using adaptive query-aware perception, the system achieves 2.5-4.4x faster inference speeds on document and high-resolution tasks while maintaining or exceeding baseline accuracy across multiple MLLM architectures.

AIBearisharXiv – CS AI · Apr 107/10

🧠

Riemann-Bench: A Benchmark for Moonshot Mathematics

Researchers introduced Riemann-Bench, a private benchmark of 25 expert-curated mathematics problems designed to evaluate AI systems on research-level reasoning beyond competition mathematics. The benchmark reveals that all frontier AI models currently score below 10%, exposing a significant gap between olympiad-level problem solving and genuine mathematical research capabilities.

AI × CryptoNeutralarXiv – CS AI · Apr 107/10

🤖

Blockchain and AI: Securing Intelligent Networks for the Future

A comprehensive academic synthesis examines how blockchain and AI technologies can be integrated to secure intelligent networks across IoT, critical infrastructure, and healthcare. The paper introduces a taxonomy, integration patterns, and the BASE evaluation blueprint to standardize security assessments, revealing that while the conceptual alignment is strong, real-world implementations remain largely prototype-stage.

AI × CryptoNeutralarXiv – CS AI · Apr 107/10

🤖

AgentCity: Constitutional Governance for Autonomous Agent Economies via Separation of Power

Researchers propose AgentCity, a blockchain-based governance framework that applies separation of powers to autonomous AI agent economies, addressing the risk that large-scale agent coordination could operate opaquely beyond human oversight. The system uses smart contracts as enforceable laws, deterministic execution layers, and accountability chains linking every agent to a human principal, with a pre-registered experiment planned at 50-1,000 agent scale.

AIBullishCoinTelegraph · Apr 107/10

🧠

CIA to integrate AI ‘co-workers’ to process intelligence, catch spies

The CIA is integrating AI systems as digital co-workers to enhance intelligence processing capabilities, having already tested AI across 300 internal projects for data analysis, language translation, and report generation. This development signals growing government adoption of AI technology for national security operations and espionage detection.

AIBearishWired – AI · Apr 107/10

🧠

OpenAI Backs Bill That Would Limit Liability for AI-Enabled Mass Deaths or Financial Disasters

$MKR🏢 OpenAI🧠 ChatGPT

AIBullishOpenAI News · Apr 107/10

🧠

Applications of AI at OpenAI

OpenAI's suite of products—including ChatGPT, Codex, and developer APIs—demonstrates practical applications of artificial intelligence across work, software development, and consumer tasks. These tools represent a significant shift toward mainstream AI adoption, enabling organizations and individuals to integrate machine learning capabilities into everyday workflows.

🏢 OpenAI🧠 ChatGPT

AIBearishTechCrunch – AI · Apr 97/10

🧠

Florida AG announces investigation into OpenAI over shooting that allegedly involved ChatGPT

🏢 OpenAI🧠 ChatGPT

AIBearishFortune Crypto · Apr 97/10

🧠

Even Nvidia’s own research teams can’t get enough GPUs amid the race for AI computing power

🏢 Nvidia

AIBullishCrypto Briefing · Apr 97/10

🧠

BlackRock taps Galaxy Digital as validator for its staked Ethereum ETF

$ETH

AIBullishcrypto.news · Apr 97/10

🧠

Anthropic keeps new AI model private after it finds thousands of external vulnerabilities

🏢 Anthropic🧠 Claude

AIBullishBlockonomi · Apr 97/10

🧠

AI Pulse News

LLM Spirals of Delusion: A Benchmarking Audit Study of AI Chatbot Interfaces

Benchmarking LLM Tool-Use in the Wild

Beyond Surface Judgments: Human-Grounded Risk Evaluation of LLM-Generated Disinformation

ATANT: An Evaluation Framework for AI Continuity

Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

Riemann-Bench: A Benchmark for Moonshot Mathematics

Blockchain and AI: Securing Intelligent Networks for the Future

AgentCity: Constitutional Governance for Autonomous Agent Economies via Separation of Power

CIA to integrate AI ‘co-workers’ to process intelligence, catch spies

OpenAI Backs Bill That Would Limit Liability for AI-Enabled Mass Deaths or Financial Disasters

Applications of AI at OpenAI

Florida AG announces investigation into OpenAI over shooting that allegedly involved ChatGPT

Even Nvidia’s own research teams can’t get enough GPUs amid the race for AI computing power

BlackRock taps Galaxy Digital as validator for its staked Ethereum ETF

Visa moves to own AI checkout as agentic commerce meets crypto

CoreWeave scales AI infrastructure agreement with Meta to $21 billion

Iran closes strait, challenges U.S. to rein in Israel: ‘the world is watching whether it will act on its commitments’

The $21 billion AI bet: Meta and CoreWeave ink deal for NVIDIA’s next-gen superchips

Anthropic keeps new AI model private after it finds thousands of external vulnerabilities

CoreWeave (CRWV) Shares Surge 7% on Massive $21B Meta Partnership Extension

Fed divided on rate cuts as Middle East tensions add to policy uncertainty

Fed minutes open door to further rate cuts amid Iran war

Fed Rate Cuts Shrink to One as Iran War Rattles Oil Markets and Inflation Outlook

Six Swiss banks join forces to build a unified digital franc

Amazon (AMZN) Cloud Operations Under Fire: Drone Attacks Cripple AWS Middle East Infrastructure