AI × Crypto News Feed

Real-time AI-curated news from 34,840+ articles across 50+ sources. Sentiment analysis, importance scoring, and key takeaways — updated every 15 minutes.

34840 articles

AIBearisharXiv – CS AI · 18h ago6/10

🧠

Agentic AI Scientists Are Not Built For Autonomous Scientific Discovery

A new position paper argues that despite functioning as useful co-scientists, agentic AI systems are fundamentally not designed for truly autonomous scientific discovery due to challenges in problem selection bias, insufficient tacit knowledge in training data, compressed output diversity, and lack of real-world experimental feedback loops.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

Bias by Necessity: Impossibility Theorems for Sequential Processing with Convergent AI and Human Validation

Researchers prove that primacy effects, anchoring, and order-dependence are mathematically inevitable in autoregressive language models due to causal masking constraints. The findings are validated across 12 frontier LLMs and confirmed through human experiments, suggesting cognitive biases represent resource-rational responses to sequential processing rather than design flaws.

$BIC

AIBullisharXiv – CS AI · 18h ago6/10

🧠

E-TCAV: Formalizing Penultimate Proxies for Efficient Concept Based Interpretability

Researchers introduce E-TCAV, an optimized version of TCAV that improves the efficiency and stability of neural network interpretability testing by leveraging penultimate layer representations. The method achieves linear speed-ups while maintaining accuracy, advancing practical tools for model debugging and real-time concept-guided training across vision and language tasks.

AIBullisharXiv – CS AI · 18h ago6/10

🧠

Learning to Explore: Scaling Agentic Reasoning via Exploration-Aware Policy Optimization

Researchers introduce EAPO, an exploration-aware reinforcement learning framework that enables LLM agents to selectively explore uncertain scenarios before acting. The method uses fine-grained reward functions and adaptive exploration mechanisms to improve decision-making across text and GUI-based agent benchmarks.

🏢 Hugging Face

AINeutralarXiv – CS AI · 18h ago6/10

🧠

AgentRx: A Benchmark Study of LLM Agents for Multimodal Clinical Prediction Tasks

Researchers benchmarked LLM-based agents for multimodal clinical prediction tasks using real-world healthcare data, finding that single-agent systems outperform naive multi-agent frameworks in handling diverse data types like medical images, notes, and EHR records. The study reveals critical limitations in current multi-agent collaboration approaches and provides an open-source evaluation framework to advance clinical AI development.

AINeutralarXiv – CS AI · 18h ago5/10

🧠

What Will Happen Next: Large Models-Driven Deduction for Emergency Instances

Researchers propose WLDS, a Large Language Model-driven system for simulating and deducing emergency scenarios across multiple domains. The system addresses limitations of traditional simulation methods by using LMs to generate diverse, realistic emergency instance variations with calibration mechanisms to ensure factual accuracy and logical consistency.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment

Researchers introduce TRACE, a novel training method that improves AI model performance by selectively applying different optimization techniques to critical versus routine tokens in reasoning tasks. The approach addresses inefficiencies in standard self-distillation by concentrating training effort on important decision points, achieving 2.76 percentage point improvements over baseline methods while better preserving out-of-distribution generalization.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

Automated Approach for Solving Infinite-state Polynomial Reachability Games

Researchers have developed an automated algorithm for solving infinite-state polynomial reachability games, a class of two-player strategic games with applications in AI and reactive synthesis. The approach introduces ranking certificates as a formal proof mechanism and demonstrates the ability to solve previously intractable problems, including computing optimal strategies for the classical Cinderella-Stepmother game.

AINeutralarXiv – CS AI · 18h ago5/10

🧠

Functional Stable Model Semantics and Answer Set Programming Modulo Theories

Researchers demonstrate how functional stable model semantics enhances Answer Set Programming Modulo Theories (ASPMT), enabling integration of intensional functions that derive values from other predicates rather than pre-defined sources. The framework allows tight ASPMT programs to translate into SMT instances, extending the theoretical foundations of logic programming.

AIBullisharXiv – CS AI · 18h ago6/10

🧠

Beyond Autonomy: A Dynamic Tiered AgentRunner Framework for Governable and Resilient Enterprise AI Execution

Researchers propose the Dynamic Tiered AgentRunner, an enterprise-grade framework that adds governance controls to autonomous AI agents through risk-adaptive resource allocation, separation of powers between independent agents, and resilience mechanisms. The framework addresses critical gaps in current LLM agent deployments by preventing unauthorized high-risk operations and enabling enterprise compliance requirements.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation

Researchers introduced PDEAgent-Bench, the first comprehensive benchmark for evaluating AI systems that generate numerical solvers from partial differential equations (PDEs). The benchmark contains 645 test cases across multiple PDE families and finite-element libraries, revealing that while current LLMs can produce runnable code, they substantially fail when accuracy and efficiency requirements are enforced.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules

Researchers introduce DiagnosticIQ, a benchmark dataset of 6,690 expert-validated questions testing whether large language models can recommend maintenance actions based on industrial sensor rules. Evaluation of 29 LLMs reveals that while frontier models perform well on standard tasks, they exhibit significant brittleness—losing 13-60% accuracy under minor perturbations and pattern-matching rather than reasoning when conditions are inverted.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

Generalization Bounds of Emergent Communications for Agentic AI Networking

Researchers propose a novel emergent communication framework for 6G agentic AI networks that enables autonomous agents to learn their own communication protocols while accounting for physical networking constraints. The framework applies information-theoretic principles to quantify trade-offs between task-relevant information and computational complexity, with experimental validation showing improved generalization performance.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

UTS at PsyDefDetect: Multi-Agent Councils and Absence-Based Reasoning for Defense Mechanism Classification

Researchers from UTS achieved second place in a psychological defense mechanism classification competition using a multi-agent AI system that identifies defense patterns through absence-based reasoning rather than presence detection. The system combines Gemini 2.5 agents with fine-tuned Qwen models to achieve an F1 score of 0.406, addressing critical biases in minority class prediction through structured ensemble methods.

🧠 Gemini

AINeutralarXiv – CS AI · 18h ago6/10

🧠

CTQWformer: A CTQW-based Transformer for Graph Classification

Researchers introduce CTQWformer, a novel machine learning framework that combines continuous-time quantum walks with transformer architectures for improved graph classification. The hybrid approach outperforms existing graph neural network and kernel-based methods by better capturing both global structural dependencies and dynamic information propagation in complex networks.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

FormalRewardBench: A Benchmark for Formal Theorem Proving Reward Models

Researchers introduce FormalRewardBench, the first benchmark for evaluating reward models in formal theorem proving using Lean 4. The benchmark reveals that frontier LLMs like Claude Opus outperform specialized theorem provers at evaluating proof quality, suggesting that theorem proving ability does not transfer to proof evaluation tasks.

🧠 Claude🧠 Opus

AINeutralarXiv – CS AI · 18h ago6/10

🧠

Unpredictability dissociates from structured control in language agents

Researchers demonstrate that unpredictability in language agents does not equate to effective control, finding that structured decision-making mechanisms significantly outperform stochastic sampling across 74,352 test cases. The study challenges assumptions about randomness and control in AI systems, with implications for agent reliability and interpretability.

AIBullisharXiv – CS AI · 18h ago6/10

🧠

The Echo Amplifies the Knowledge: Somatic Marker Analogues in Language Models via Emotion Vector Re-Injection

Researchers demonstrate that language models can be enhanced with emotion-like markers that improve decision-making when combined with semantic knowledge, mirroring human neuroscience findings about emotional processing. By injecting emotion vectors into Gemma 3 during recall, the model achieved 80% good decision outcomes versus 52% with knowledge alone, validating that emotional context amplifies rather than replaces reasoning.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents

CodeClinic introduces a benchmark for evaluating whether large language model agents can autonomously generate clinical skills rather than relying on pre-built tool libraries. The research demonstrates that an offline autoformalization pipeline converting clinical guidelines into Python libraries improves consistency and reduces token usage by 40% compared to zero-shot code generation.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

Rethinking Constraint Awareness for Efficient State Embedding of Neural Routing Solver

Researchers propose Constraint-Aware Residual Modulation (CARM), a neural module that improves how AI solvers handle complex vehicle routing problems by maintaining global observation during constraint-aware decision-making. The advancement demonstrates significant performance improvements across multiple routing problem variants and scaling capabilities.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

Evaluating Developmental Cognition Capabilities of LLMs

Researchers introduce the Developmental Sentence Completion Test (DSCT), a 20-item assessment tool that evaluates how large language models understand and reflect human developmental cognition based on Kegan's constructive-developmental theory. The study finds that frontier LLMs accurately identify developmental stages in simulated personas but show only fair agreement with real human responses, revealing that developmental signal is cleaner in synthetic data than human-generated text.

🏢 Meta

GeneralNeutralarXiv – CS AI · 18h ago5/10

📰

Swarm Skills: A Portable, Self-Evolving Multi-Agent System Specification for Coordination Engineering

🏢 Anthropic

AIBullisharXiv – CS AI · 18h ago6/10

🧠

Active Testing of Large Language Models via Approximate Neyman Allocation

Researchers introduce a novel active testing algorithm that reduces evaluation costs for large language models by intelligently sampling from evaluation pools using semantic entropy and approximate Neyman allocation. The method achieves up to 28% MSE reduction over uniform sampling while saving an average of 22.9% of evaluation budget across multiple benchmarks.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

Why Retrying Fails: Context Contamination in LLM Agent Pipelines

Researchers introduce the Context-Contaminated Restart Model (CCRM) to formally analyze why LLM agents fail at higher rates when retrying tasks after errors, showing that failed attempts pollute the context window and increase subsequent error rates 7.1x. The model provides closed-form formulas for success probability, optimal pipeline depth allocation, and quantifies the exact benefit of clearing context before retry attempts.

AINeutralarXiv – CS AI · 18h ago6/10

🧠

Biological Plausibility and Representational Alignment of Feedback Alignment in Convolutional Networks

Researchers demonstrate that modified feedback alignment (FA) algorithms can train convolutional neural networks while maintaining biological plausibility, with internal representations converging to structures similar to backpropagation despite using fundamentally different weight update mechanisms. This finding suggests that successful learning algorithms may achieve comparable results through different computational paths, bridging biologically plausible alternatives with practical neural network training.

← PrevPage 418 of 1394Next →