AI × Crypto News Feed

Real-time AI-curated news from 64,567+ articles across 50+ sources. Sentiment analysis, importance scoring, and key takeaways — updated every 15 minutes.

64567 articles

AIBullisharXiv – CS AI · Mar 127/10

🧠

Hybrid Self-evolving Structured Memory for GUI Agents

Researchers developed HyMEM, a brain-inspired hybrid memory system that significantly improves GUI agents' ability to interact with computers. The system uses graph-based structured memory combining symbolic nodes with trajectory embeddings, enabling smaller 7B/8B models to match or exceed performance of larger closed-source models like GPT-4o.

🧠 GPT-4

AINeutralarXiv – CS AI · Mar 127/10

🧠

Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability

Researchers introduce TRACED, a framework that evaluates AI reasoning quality through geometric analysis rather than traditional scalar probabilities. The system identifies correct reasoning as high-progress stable trajectories, while AI hallucinations show low-progress unstable patterns with high curvature fluctuations.

AIBullisharXiv – CS AI · Mar 127/10

🧠

IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs

OpenAI researchers introduce IH-Challenge, a reinforcement learning dataset designed to improve instruction hierarchy in frontier LLMs. Fine-tuning GPT-5-Mini with this dataset improved robustness by 10% and significantly reduced unsafe behavior while maintaining helpfulness.

🏢 OpenAI🏢 Hugging Face🧠 GPT-5

AINeutralarXiv – CS AI · Mar 127/10

🧠

Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning

A comprehensive study comparing reinforcement learning approaches for AI alignment finds that diversity-seeking algorithms don't outperform reward-maximizing methods in moral reasoning tasks. The research demonstrates that moral reasoning has more concentrated high-reward distributions than mathematical reasoning, making standard optimization methods equally effective without explicit diversity mechanisms.

AIBullisharXiv – CS AI · Mar 127/10

🧠

Explainable LLM Unlearning Through Reasoning

Researchers introduce Targeted Reasoning Unlearning (TRU), a new method for removing specific knowledge from large language models while preserving general capabilities. The approach uses reasoning-based targets to guide the unlearning process, addressing issues with previous gradient ascent methods that caused unintended capability degradation.

AIBullisharXiv – CS AI · Mar 127/10

🧠

MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios

Researchers introduce MoE-SpAc, a new framework for efficient Mixture-of-Experts model inference on edge devices that achieves 42% improvement over existing baselines. The system uses speculative decoding as a memory management tool and demonstrates 4.04x average speedup across benchmarks.

AIBearisharXiv – CS AI · Mar 127/10

🧠

The Dunning-Kruger Effect in Large Language Models: An Empirical Study of Confidence Calibration

A new study reveals that large language models exhibit patterns similar to the Dunning-Kruger effect, where poorly performing AI models show severe overconfidence in their abilities. The research tested four major models across 24,000 trials, finding that Kimi K2 displayed the worst calibration with 72.6% overconfidence despite only 23.3% accuracy, while Claude Haiku 4.5 achieved the best performance with proper confidence calibration.

🧠 Claude🧠 Haiku🧠 Gemini

AIBearisharXiv – CS AI · Mar 127/10

🧠

Quantifying Hallucinations in Language Language Models on Medical Textbooks

Research study finds that LLaMA-70B-Instruct hallucinated in 19.7% of medical Q&A responses despite high plausibility scores, highlighting significant reliability issues in AI healthcare applications. The study shows that lower hallucination rates correlate with higher usefulness scores, emphasizing the need for better safeguards in medical AI systems.

AINeutralarXiv – CS AI · Mar 127/10

🧠

Evaluating Adjective-Noun Compositionality in LLMs: Functional vs Representational Perspectives

A research study reveals that large language models develop strong internal compositional representations for adjective-noun combinations, but struggle to consistently translate these representations into successful task performance. The findings highlight a significant gap between what LLMs understand internally and their functional capabilities.

AINeutralarXiv – CS AI · Mar 127/10

🧠

Measuring and Eliminating Refusals in Military Large Language Models

Researchers developed the first benchmark dataset to measure refusal rates in military Large Language Models, finding that current LLMs refuse up to 98.2% of legitimate military queries due to safety behaviors. The study tested 34 models and demonstrated techniques to reduce refusals while maintaining military task performance.

AINeutralarXiv – CS AI · Mar 127/10

🧠

Assessing Cognitive Biases in LLMs for Judicial Decision Support: Virtuous Victim and Halo Effects

Research examining five major LLMs found they exhibit human-like cognitive biases when evaluating judicial scenarios, showing stronger virtuous victim effects but reduced credential-based halo effects compared to humans. The study suggests LLMs may offer modest improvements over human decision-making in judicial contexts, though variability across models limits current practical application.

🧠 ChatGPT🧠 Claude🧠 Sonnet

AINeutralarXiv – CS AI · Mar 127/10

🧠

DeliberationBench: A Normative Benchmark for the Influence of Large Language Models on Users' Views

Researchers developed DeliberationBench, a new benchmark to assess how large language models influence users' opinions on policy matters. A study of 4,088 participants discussing 65 policy proposals with six frontier LLMs found that these models have substantial influence that appears to align with democratically legitimate deliberative processes.

AINeutralarXiv – CS AI · Mar 127/10

🧠

Defining AI Models and AI Systems: A Framework to Resolve the Boundary Problem

A comprehensive study analyzing 896 academic papers and 80+ regulatory documents reveals critical ambiguities in how 'AI models' and 'AI systems' are defined across regulations like the EU AI Act. The research proposes clear operational definitions to resolve regulatory boundary problems that complicate responsibility allocation across the AI value chain.

AIBullisharXiv – CS AI · Mar 127/10

🧠

RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators

RedFuser is a new automated framework that optimizes AI model deployment by fusing cascaded reduction operations into single loops, achieving 2-5x performance improvements. The system addresses limitations in existing AI compilers that struggle with complex multi-loop operations like those found in attention mechanisms.

AINeutralarXiv – CS AI · Mar 127/10

🧠

How to Count AIs: Individuation and Liability for AI Agents

A legal research paper proposes the 'Algorithmic Corporation' (A-corp) framework to address the challenge of identifying and assigning liability for AI agents' actions as millions of autonomous AIs proliferate across the economy. The A-corp structure would create legally recognizable entities owned by humans but operated by AIs, enabling both accountability and legal recourse when AI agents cause harm.

AIBullisharXiv – CS AI · Mar 127/10

🧠

The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths

Researchers have developed dmaplane, a Linux kernel module that provides buffer orchestration for AI workloads, addressing the gap between efficient data transport and proper buffer management. The system integrates RDMA, GPU memory management, and NUMA-aware allocation to optimize high-performance AI data paths at the kernel level.

AINeutralarXiv – CS AI · Mar 127/10

🧠

Architecture-Aware LLM Inference Optimization on AMD Instinct GPUs: A Comprehensive Benchmark and Deployment Study

Researchers conducted comprehensive benchmarks of LLM inference on AMD Instinct MI325X GPUs, testing models from 235B to 1 trillion parameters. The study reveals that architecture-aware optimization is critical, with different model types requiring specific configurations for optimal performance on AMD hardware.

🧠 Llama

AIBearisharXiv – CS AI · Mar 127/10

🧠

Targeted Bit-Flip Attacks on LLM-Based Agents

Researchers have introduced Flip-Agent, the first targeted bit-flip attack framework specifically designed to exploit LLM-based agents by manipulating hardware faults. The attack can manipulate both final outputs and tool invocations in multi-stage AI agent pipelines, revealing critical security vulnerabilities in these systems.

AIBearisharXiv – CS AI · Mar 127/10

🧠

Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety

A large-scale study of 62,808 AI safety evaluations across six frontier models reveals that deployment scaffolding architectures can significantly impact measured safety, with map-reduce scaffolding degrading safety performance. The research found that evaluation format (multiple-choice vs open-ended) affects safety scores more than scaffold architecture itself, and safety rankings vary dramatically across different models and configurations.

AIBullisharXiv – CS AI · Mar 127/10

🧠

Training Language Models via Neural Cellular Automata

Researchers developed a method using neural cellular automata (NCA) to generate synthetic data for pre-training language models, achieving up to 6% improvement in downstream performance with only 164M synthetic tokens. This approach outperformed traditional pre-training on 1.6B natural language tokens while being more computationally efficient and transferring well to reasoning benchmarks.

AI × CryptoNeutralarXiv – CS AI · Mar 127/10

🤖

Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents

Researchers propose NabaOS, a lightweight verification framework that detects AI agent hallucinations using HMAC-signed tool receipts instead of zero-knowledge proofs. The system achieves 94.2% detection accuracy with <15ms verification time, compared to cryptographic approaches that require 180+ seconds per query.

AINeutralarXiv – CS AI · Mar 127/10

🧠

Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

Researchers propose treating multi-agent AI memory as a computer architecture problem, introducing a three-layer memory hierarchy and identifying critical protocol gaps. The paper highlights multi-agent memory consistency as the most pressing challenge for building scalable collaborative AI systems.

AIBullisharXiv – CS AI · Mar 127/10

🧠

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

Researchers have developed HTMuon, an improved optimization algorithm for training large language models that builds upon the existing Muon optimizer. HTMuon addresses limitations in Muon's weight spectra by incorporating heavy-tailed spectral corrections, showing up to 0.98 perplexity reduction in LLaMA pretraining experiments.

🏢 Perplexity

AINeutralarXiv – CS AI · Mar 127/10

🧠

Dissecting Chronos: Sparse Autoencoders Reveal Causal Feature Hierarchies in Time Series Foundation Models

Researchers applied sparse autoencoders to analyze Chronos-T5-Large, a 710M parameter time series foundation model, revealing how different layers process temporal data. The study found that mid-encoder layers contain the most causally important features for change detection, while early layers handle frequency patterns and final layers compress semantic concepts.

AIBearisharXiv – CS AI · Mar 127/10

🧠

Amnesia: Adversarial Semantic Layer Specific Activation Steering in Large Language Models

Researchers have developed 'Amnesia,' a lightweight adversarial attack that bypasses safety mechanisms in open-weight Large Language Models by manipulating internal transformer states. The attack enables generation of harmful content without requiring fine-tuning or additional training, highlighting vulnerabilities in current LLM safety measures.

← PrevPage 612 of 2583Next →