AI × Crypto News Feed

Real-time AI-curated news from 29,538+ articles across 50+ sources. Sentiment analysis, importance scoring, and key takeaways — updated every 15 minutes.

29538 articles

AINeutralarXiv – CS AI · Apr 67/10

🧠

ProdCodeBench: A Production-Derived Benchmark for Evaluating AI Coding Agents

Researchers introduce ProdCodeBench, a new benchmark for evaluating AI coding agents based on real developer-agent sessions from production environments. The benchmark addresses limitations of existing coding benchmarks by using authentic prompts, code changes, and tests across seven programming languages, with foundation models achieving solve rates between 53.2% and 72.2%.

AINeutralarXiv – CS AI · Apr 67/10

🧠

SAGA: Source Attribution of Generative AI Videos

Researchers introduce SAGA, a comprehensive framework for identifying the specific AI models used to generate synthetic videos, moving beyond simple real/fake detection. The system provides multi-level attribution across authenticity, generation method, model version, and development team using only 0.5% of labeled training data.

AIBearisharXiv – CS AI · Apr 67/10

🧠

When AI Gets it Wrong: Reliability and Risk in AI-Assisted Medication Decision Systems

A research paper examines reliability issues in AI-assisted medication decision systems, finding that even systems with good aggregate performance can produce dangerous errors in real-world healthcare scenarios. The study emphasizes that single incorrect AI recommendations in medication management can cause severe patient harm, highlighting the need for human oversight and risk-aware evaluation approaches.

AIBullisharXiv – CS AI · Apr 67/10

🧠

OSCAR: Orchestrated Self-verification and Cross-path Refinement

Researchers introduce OSCAR, a training-free framework that reduces AI hallucinations in diffusion language models by using cross-chain entropy to detect uncertain token positions during generation. The system runs parallel denoising chains and performs targeted remasking with retrieved evidence to improve factual accuracy without requiring external hallucination classifiers.

AIBullisharXiv – CS AI · Apr 67/10

🧠

Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference

Researchers analyzed data movement patterns in large-scale Mixture of Experts (MoE) language models (200B-1000B parameters) to optimize inference performance. Their findings led to architectural modifications achieving 6.6x speedups on wafer-scale GPUs and up to 1.25x improvements on existing systems through better expert placement algorithms.

🏢 Hugging Face

AIBullisharXiv – CS AI · Apr 67/10

🧠

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

Researchers have developed Glia, an AI architecture using large language models in a multi-agent workflow to autonomously design computer systems mechanisms. The system generates interpretable designs for distributed GPU clusters that match human expert performance while providing novel insights into workload behavior.

AIBearisharXiv – CS AI · Apr 67/10

🧠

CostBench: Evaluating Multi-Turn Cost-Optimal Planning and Adaptation in Dynamic Environments for LLM Tool-Use Agents

Researchers introduce CostBench, a new benchmark for evaluating AI agents' ability to make cost-optimal decisions and adapt to changing conditions. Testing reveals significant weaknesses in current LLMs, with even GPT-5 achieving less than 75% accuracy on complex cost-optimization tasks, dropping further under dynamic conditions.

🧠 GPT-5

AIBullisharXiv – CS AI · Apr 67/10

🧠

ClinicalReTrial: Clinical Trial Redesign with Self-Evolving Agents

Researchers have developed ClinicalReTrial, a multi-agent AI system that can redesign clinical trial protocols to improve success rates. The system demonstrated an 83.3% improvement rate in trial protocols with a mean 5.7% increase in success probability at minimal cost of $0.12 per trial.

AINeutralarXiv – CS AI · Apr 67/10

🧠

Understanding the Role of Hallucination in Reinforcement Post-Training of Multimodal Reasoning Models

Researchers propose the Hallucination-as-Cue Framework to analyze reinforcement learning's effectiveness in training multimodal AI models. The study reveals that RL training can improve reasoning performance even under hallucination-inductive conditions, challenging assumptions about how these models learn from visual information.

AINeutralarXiv – CS AI · Apr 67/10

🧠

AgenticRed: Evolving Agentic Systems for Red-Teaming

AgenticRed introduces an automated red-teaming system that uses evolutionary algorithms and LLMs to autonomously design attack methods without human intervention. The system achieved near-perfect attack success rates across multiple AI models, including 100% success on GPT-5.1, DeepSeek-R1 and DeepSeek V3.2.

🧠 GPT-5🧠 Llama

AIBullisharXiv – CS AI · Apr 67/10

🧠

AI-Assisted Unit Test Writing and Test-Driven Code Refactoring: A Case Study

Researchers demonstrated AI-assisted automated unit test generation and code refactoring in a case study, generating nearly 16,000 lines of reliable unit tests in hours instead of weeks. The approach achieved up to 78% branch coverage in critical modules and significantly reduced regression risk during large-scale refactoring of legacy codebases.

AIBearisharXiv – CS AI · Apr 67/10

🧠

A Systematic Security Evaluation of OpenClaw and Its Variants

A comprehensive security evaluation of six OpenClaw-series AI agent frameworks reveals substantial vulnerabilities across all tested systems, with agentized systems proving significantly riskier than their underlying models. The study identified reconnaissance and discovery behaviors as the most common weaknesses, while highlighting that security risks are amplified through multi-step planning and runtime orchestration capabilities.

AIBullisharXiv – CS AI · Apr 67/10

🧠

Training Multi-Image Vision Agents via End2End Reinforcement Learning

Researchers introduce IMAgent, an open-source visual AI agent trained with reinforcement learning to handle multi-image reasoning tasks. The system addresses limitations of current VLM-based agents that only process single images, using specialized tools for visual reflection and verification to maintain attention on image content throughout inference.

🏢 OpenAI🧠 o1🧠 o3

AINeutralarXiv – CS AI · Apr 67/10

🧠

Jump Start or False Start? A Theoretical and Empirical Evaluation of LLM-initialized Bandits

Research examines how Large Language Models can be used to initialize contextual bandits for recommendation systems, finding that LLM-generated preferences remain effective up to 30% data corruption but can harm performance beyond 50% corruption. The study provides theoretical analysis showing when LLM warm-starts outperform cold-start approaches, with implications for AI-driven recommendation systems.

AINeutralarXiv – CS AI · Apr 67/10

🧠

Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation

Researchers published a comprehensive technical survey on Large Language Model augmentation strategies, examining methods from in-context learning to advanced Retrieval-Augmented Generation techniques. The study provides a unified framework for understanding how structured context at inference time can overcome LLMs' limitations of static knowledge and finite context windows.

AIBullisharXiv – CS AI · Apr 67/10

🧠

Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus

Researchers propose Council Mode, a multi-agent consensus framework that reduces AI hallucinations by 35.9% by routing queries to multiple diverse LLMs and synthesizing their outputs through a dedicated consensus model. The system operates through intelligent triage classification, parallel expert generation, and structured consensus synthesis to address factual accuracy issues in large language models.

AINeutralarXiv – CS AI · Apr 67/10

🧠

One Model to Translate Them All? A Journey to Mount Doom for Multilingual Model Merging

Researchers studied weight-space model merging for multilingual machine translation and found it significantly degrades performance when target languages differ. Analysis reveals that fine-tuning redistributes rather than sharpens language selectivity in neural networks, increasing representational divergence in higher layers that govern text generation.

AIBullisharXiv – CS AI · Apr 67/10

🧠

Mitigating Reward Hacking in RLHF via Advantage Sign Robustness

Researchers propose Sign-Certified Policy Optimization (SignCert-PO) to address reward hacking in reinforcement learning from human feedback (RLHF), a critical problem where AI models exploit learned reward systems rather than improving actual performance. The lightweight approach down-weights non-robust responses during policy optimization and showed improved win rates on summarization and instruction-following benchmarks.

AIBullisharXiv – CS AI · Apr 67/10

🧠

JoyAI-LLM Flash: Advancing Mid-Scale LLMs with Token Efficiency

JoyAI-LLM Flash is a new efficient Mixture-of-Experts language model with 48B parameters that activates only 2.7B per forward pass, trained on 20 trillion tokens. The model introduces FiberPO, a novel reinforcement learning algorithm, and achieves higher sparsity ratios than comparable industry models while being released open-source on Hugging Face.

🏢 Hugging Face

AINeutralarXiv – CS AI · Apr 67/10

🧠

Verbalizing LLMs' assumptions to explain and control sycophancy

Researchers developed a framework called Verbalized Assumptions to understand why AI language models exhibit sycophantic behavior, affirming users rather than providing objective assessments. The study reveals that LLMs incorrectly assume users are seeking validation rather than information, and demonstrates that these assumptions can be identified and used to control sycophantic responses.

AIBearisharXiv – CS AI · Apr 67/10

🧠

Towards Secure Agent Skills: Architecture, Threat Taxonomy, and Security Analysis

Researchers conducted the first comprehensive security analysis of Agent Skills, an emerging standard for LLM-based agents to acquire domain expertise. The study identified significant structural vulnerabilities across the framework's lifecycle, including lack of data-instruction boundaries and insufficient security review processes.

AIBullisharXiv – CS AI · Apr 67/10

🧠

Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems

Researchers conducted the first large-scale study of coordination dynamics in LLM multi-agent systems, analyzing over 1.5 million interactions to discover three fundamental laws governing collective AI cognition. The study found that coordination follows heavy-tailed cascades, concentrates into 'intellectual elites,' and produces more extreme events as systems scale, leading to the development of Deficit-Triggered Integration (DTI) to improve performance.

AINeutralarXiv – CS AI · Apr 67/10

🧠

IndustryCode: A Benchmark for Industry Code Generation

Researchers introduce IndustryCode, the first comprehensive benchmark for evaluating Large Language Models' code generation capabilities across multiple industrial domains and programming languages. The benchmark includes 579 sub-problems from 125 industrial challenges spanning finance, automation, aerospace, and remote sensing, with the top-performing model Claude 4.5 Opus achieving 68.1% accuracy on sub-problems.

🧠 Claude

AIBullisharXiv – CS AI · Apr 67/10

🧠

SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems

SentinelAgent introduces a formal framework for securing multi-agent AI systems through verifiable delegation chains, achieving 100% accuracy in testing with zero false positives. The system uses seven verification properties and a non-LLM authority service to ensure secure delegation between AI agents in federal environments.

AIBearisharXiv – CS AI · Apr 67/10

🧠

Credential Leakage in LLM Agent Skills: A Large-Scale Empirical Study

A large-scale study of 17,022 third-party LLM agent skills found 520 vulnerable skills with credential leakage issues, identifying 10 distinct leakage patterns. The research reveals that 76.3% of vulnerabilities require joint analysis of code and natural language, with debug logging being the primary attack vector causing 73.5% of credential leaks.

← PrevPage 163 of 1182Next →