AI × Crypto News Feed

Real-time AI-curated news from 34,840+ articles across 50+ sources. Sentiment analysis, importance scoring, and key takeaways — updated every 15 minutes.

34840 articles

AINeutralarXiv – CS AI · 20h ago6/10

🧠

Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities

Researchers introduce Absurd World, a benchmarking framework that tests large language models' logical reasoning by creating logically coherent but unrealistic scenarios derived from real-world problems. The framework reveals whether LLMs can reason independently of learned patterns by breaking down real-world models into symbols, actions, sequences, and events, then systematically altering them while preserving underlying logic.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug-Disease Pairs

Researchers introduce TESSERA, a neuro-symbolic framework that combines Large Language Models with Monte Carlo Tree Search to extract multi-step explanations from knowledge graphs, specifically for drug-disease mechanism discovery. The system uses LLMs for local judgments rather than autonomous generation, enforcing structural constraints through knowledge graphs while employing MCTS for principled credit assignment across extended reasoning chains.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

Relational Retrieval: Leveraging Known-Novel Interactions for Generalized Category Discovery

Researchers propose Relational Pattern Consistency (RPC), a machine learning framework for Generalized Category Discovery that bridges labeled and unlabeled data through bidirectional knowledge transfer. The method uses One-vs-All classifiers and relational pattern matching to simultaneously preserve known categories and discover novel ones, achieving state-of-the-art results on multiple benchmarks.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

WindINR: Latent-State INR for Fast Local Wind Query and Correction in Complex Terrain

WindINR is a machine learning framework that enables fast, localized wind forecasting in complex terrain by using implicit neural representations to query wind conditions at specific user-defined locations rather than generating dense grid-based forecasts. The system achieves 2.6x speedup in corrections by updating only a compact latent state instead of retraining full networks, making it practical for real-time wind estimation applications.

AIBullisharXiv – CS AI · 20h ago6/10

🧠

A Unified Pair-GRPO Family: From Implicit to Explicit Preference Constraints for Stable and General RL Alignment

Researchers propose Pair-GRPO, a unified theoretical framework for LLM alignment that addresses instability and interpretability issues in reinforcement learning from human preferences. The method introduces Soft-Pair-GRPO and Hard-Pair-GRPO variants with proven gradient equivalence, monotonic policy improvement, and superior performance on standard benchmarks.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

Unpredictability dissociates from structured control in language agents

Researchers demonstrate that unpredictability in language agents does not equate to effective control, finding that structured decision-making mechanisms significantly outperform stochastic sampling across 74,352 test cases. The study challenges assumptions about randomness and control in AI systems, with implications for agent reliability and interpretability.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

Outlier-Robust Diffusion Solvers for Inverse Problems

Researchers have developed an improved diffusion model-based approach for solving inverse problems that demonstrates robustness to outliers in real-world measurements. The method combines explicit noise estimation, Huber loss optimization, and conjugate gradient methods to outperform existing diffusion model techniques across linear and nonlinear tasks.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces

Researchers introduce DUDE, a framework that teaches AI web agents to resist deceptive interface elements through hybrid-reward learning and experience summarization. The accompanying RUC benchmark demonstrates the framework reduces susceptibility to deception by 53.8% while preserving task performance, addressing a critical vulnerability in autonomous GUI interaction systems.

AIBullisharXiv – CS AI · 20h ago6/10

🧠

VulTriage: Triple-Path Context Augmentation for LLM-Based Vulnerability Detection

Researchers introduce VulTriage, an LLM-based framework that enhances vulnerability detection in source code through triple-path context augmentation combining control flow analysis, vulnerability knowledge retrieval, and semantic summarization. The approach achieves state-of-the-art results on benchmark datasets and demonstrates strong generalization to low-resource scenarios.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD

Researchers introduce BenchCAD, a comprehensive benchmark containing 17,900 execution-verified CAD programs across 106 industrial part families, designed to evaluate multimodal AI models on their ability to generate parametric CAD code from visual or textual inputs. Testing 10+ frontier models reveals that current systems can recover basic geometry but struggle with faithful parametric abstraction, fine 3D structure, and complex CAD operations, highlighting significant gaps between general-purpose AI capabilities and industrial CAD automation readiness.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

Researchers introduce MarsTSC, a novel framework combining Vision Language Models with agentic reasoning for few-shot multimodal time series classification. The system uses collaborative AI roles—Generator, Reflector, and Modifier—to iteratively refine knowledge and improve classification accuracy across 12 benchmarks while providing interpretable explanations.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

From Passive Reuse to Active Reasoning: Grounding Large Language Models for Neuro-Symbolic Experience Replay

Researchers introduce Neuro-Symbolic Experience Replay (NSER), a framework that enhances reinforcement learning by combining Large Language Models with symbolic logic to transform passive memory buffers into active knowledge construction systems. The approach grounds LLM-generated behavioral rules into differentiable logic representations, enabling more efficient policy optimization across multiple benchmark environments.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

Strategic commitments shape collective cybersecurity under AI inequality

Researchers present a game-theoretic model showing that unequal access to AI-powered cybersecurity tools creates persistent vulnerabilities, with weak defenders unable to afford strong protection. They propose that targeted subsidies for committed defenders adopting advanced AI defenses significantly improve overall system resilience and suppress attacks more effectively than commitment alone.

AINeutralarXiv – CS AI · 20h ago5/10

🧠

Perceptual Asymmetry Between Hue Categories: Evidence from Human Color Categorization

Researchers extend the COLIBRI fuzzy color model to reveal that human color categories exhibit significant perceptual asymmetry, with yellow forming a narrow, sharply-defined region while green spans a broader interval. This finding challenges computational models that assume uniformly distributed color representations and suggests color naming follows non-uniform geometric organization in perceptual space.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

The Generalized Turing Test: A Foundation for Comparing Intelligence

Researchers introduce the Generalized Turing Test (GTT), a formal framework for comparing AI agent capabilities through indistinguishability rather than fixed benchmarks. The framework defines a comparator where one agent is deemed superior if another agent cannot reliably distinguish between interactions with it versus interactions with itself, creating a dataset-agnostic evaluation method validated across modern AI models.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

The First Drop of Ink: Nonlinear Impact of Misleading Information in Long-Context Reasoning

Researchers reveal that large language models suffer from a nonlinear performance degradation when exposed to misleading information in long-context scenarios, with the majority of decline occurring when hard distractors comprise just a small fraction of the total context. This finding, termed 'The First Drop of Ink' effect, demonstrates that attention mechanisms disproportionately focus on misleading content, suggesting that upstream retrieval quality is more critical than previously understood for RAG and agentic systems.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

MaD Physics: Evaluating information seeking under constraints in physical environments

Researchers introduce MaD Physics, a benchmark for evaluating AI agents' ability to conduct scientific discovery under realistic resource constraints. The benchmark tests agents' capacity to make informative measurements within budget limits and infer underlying physical laws, using altered physics environments to prevent reliance on training data.

🧠 Gemini

AINeutralarXiv – CS AI · 20h ago6/10

🧠

The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?

Researchers discover that neural networks across different modalities (vision, point clouds, language) converge toward shared representations, with non-language modalities systematically moving toward language's neighborhood structure rather than vice versa. Using directional analysis, they attribute this asymmetry to language representations occupying more compact feature space, proposing that language serves as the asymptotic attractor in multimodal representation learning.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities

Researchers introduce HOME-KGQA, a new benchmark dataset for evaluating knowledge graph question answering systems on household activities using multimodal data. The dataset reveals significant performance gaps in current LLM-based KGQA methods, highlighting critical challenges for real-world deployment of AI systems that combine language models with structured knowledge.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

EduStory: A Unified Framework for Pedagogically-Consistent Multi-Shot STEM Instructional Video Generation

EduStory introduces a novel framework for generating pedagogically-consistent multi-shot STEM instructional videos, addressing the challenge of maintaining knowledge coherence across long-horizon video generation. The framework combines pedagogical state modeling, script-guided control, and specialized evaluation metrics, supported by a new benchmark (EduVideoBench) designed to advance reliable and trustworthy educational video synthesis.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

Explainable Knowledge Tracing via Probabilistic Embeddings and Pattern-based Reasoning

Researchers introduce Probabilistic Logical Knowledge Tracing (PLKT), an interpretable AI framework that uses Beta-distributed probabilistic embeddings to model student knowledge states and predict learning performance. Unlike conventional deep learning approaches that rely on opaque deterministic embeddings, PLKT constructs transparent reasoning paths showing how past interactions influence predictions while maintaining superior accuracy compared to existing methods.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

From Controlled to the Wild: Evaluation of Pentesting Agents for the Real-World

Researchers present a new evaluation protocol for AI pentesting agents that moves beyond simplified benchmarks to assess real-world vulnerability discovery capabilities. The framework combines structured ground-truth validation with LLM-based semantic matching and includes efficiency metrics, addressing a critical gap in how offensive security AI systems are currently measured.

AINeutralarXiv – CS AI · 20h ago5/10

🧠

Cplus2ASP: Computing Action Language C+ in Answer Set Programming

Cplus2ASP Version 2 is a new system that translates action language C+ into answer set programming, offering significant performance improvements over the Causal Calculator through modern ASP solving techniques. The tool supports incremental execution, external atoms via Lua integration, and extensible translations for other action languages, making it relevant for automated reasoning and planning applications.

AINeutralarXiv – CS AI · 20h ago6/10

🧠

The Metacognitive Probe: Five Behavioural Calibration Diagnostics for LLMs

Researchers introduce the Metacognitive Probe, a diagnostic tool measuring five dimensions of LLM confidence behavior including calibration, epistemic vigilance, and reasoning validation. Testing on eight frontier models and 69 humans reveals significant within-model disparities—exemplified by Gemini 2.5 Flash scoring 88 on confidence calibration but only 41 on difficulty prediction—suggesting composite benchmarks mask pockets of overconfidence.

🧠 Gemini

AINeutralarXiv – CS AI · 20h ago6/10

🧠

Beyond ESG Scores: Learning Dynamic Constraints for Sequential Portfolio Optimization

Researchers propose MACF-X, a machine learning framework that integrates ESG constraints into portfolio optimization without modifying financial models' core logic. The approach treats ESG as dynamic portfolio preferences rather than static scoring inputs, potentially improving risk management in sustainable investing.

← PrevPage 420 of 1394Next →