AI × Crypto News Feed

Real-time AI-curated news from 30,365+ articles across 50+ sources. Sentiment analysis, importance scoring, and key takeaways — updated every 15 minutes.

30365 articles

AIBullisharXiv – CS AI · Mar 177/10

🧠

APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution

Researchers introduce APEX-Searcher, a new framework that enhances large language models' search capabilities through a two-stage approach combining reinforcement learning for strategic planning and supervised fine-tuning for execution. The system addresses limitations in multi-hop question answering by decoupling retrieval processes into planning and execution phases, showing significant improvements across multiple benchmarks.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving

Researchers propose PaIR-Drive, a new parallel framework that combines imitation learning and reinforcement learning for autonomous driving, achieving 91.2 PDMS performance on NAVSIMv1 benchmark. The approach addresses limitations of sequential fine-tuning by running IL and RL in parallel branches, enabling better performance than existing methods.

AIBearisharXiv – CS AI · Mar 177/10

🧠

Large Language Models Reproduce Racial Stereotypes When Used for Text Annotation

A comprehensive study of 19 large language models reveals systematic racial bias in automated text annotation, with over 4 million judgments showing LLMs consistently reproduce harmful stereotypes based on names and dialect. The research demonstrates that AI models rate texts with Black-associated names as more aggressive and those written in African American Vernacular English as less professional and more toxic.

AIBullisharXiv – CS AI · Mar 177/10

🧠

AD-Copilot: A Vision-Language Assistant for Industrial Anomaly Detection via Visual In-context Comparison

Researchers developed AD-Copilot, a specialized multimodal AI assistant for industrial anomaly detection that outperforms existing models and even human experts. The system uses a novel visual comparison approach and achieved 82.3% accuracy on benchmarks, representing up to 3.35x improvement over baselines.

🏢 Microsoft

AIBullisharXiv – CS AI · Mar 177/10

🧠

UniVid: Pyramid Diffusion Model for High Quality Video Generation

Researchers have developed UniVid, a new pyramid diffusion model that unifies text-to-video and image-to-video generation into a single system. The model uses dual-stream cross-attention mechanisms to process both text prompts and reference images, achieving superior temporal coherence across different video generation tasks.

AIBearisharXiv – CS AI · Mar 177/10

🧠

Widespread Gender and Pronoun Bias in Moral Judgments Across LLMs

A comprehensive study of six major LLM families reveals systematic biases in moral judgments based on gender pronouns and grammatical markers. The research found that AI models consistently favor non-binary subjects while penalizing male subjects in fairness assessments, raising concerns about embedded biases in AI ethical decision-making.

🏢 Meta🧠 Grok

AIBearisharXiv – CS AI · Mar 177/10

🧠

$\tau$-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains

Researchers introduce τ-voice, a new benchmark for evaluating full-duplex voice AI agents on complex real-world tasks. The study reveals significant performance gaps, with voice agents achieving only 30-45% of text-based AI capability under realistic conditions with noise and diverse accents.

🧠 GPT-5

AIBearisharXiv – CS AI · Mar 177/10

🧠

EvoClaw: Evaluating AI Agents on Continuous Software Evolution

Researchers introduce EvoClaw, a new benchmark that evaluates AI agents on continuous software evolution rather than isolated coding tasks. The study reveals a critical performance drop from >80% on isolated tasks to at most 38% in continuous settings across 12 frontier models, highlighting AI agents' struggle with long-term software maintenance.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Agent Privilege Separation in OpenClaw: A Structural Defense Against Prompt Injection

Researchers developed a two-agent defense system called OpenClaw that achieved 0% attack success rate against prompt injection attacks on LLM applications. The system uses agent isolation and JSON formatting to structurally prevent malicious prompts from reaching action-taking agents.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Purifying Generative LLMs from Backdoors without Prior Knowledge or Clean Reference

Researchers developed a new framework to remove backdoors from large language models without prior knowledge of triggers or clean reference models. The method uses an immunization-inspired approach that creates synthetic backdoored variants to identify and neutralize malicious components while preserving the model's generative capabilities.

AINeutralarXiv – CS AI · Mar 177/10

🧠

Accelerating Suffix Jailbreak attacks with Prefix-Shared KV-cache

Researchers developed Prefix-Shared KV Cache (PSKV), a new technique that accelerates jailbreak attacks on Large Language Models by 40% while reducing memory usage by 50%. The method optimizes the red-teaming process by sharing cached prefixes across multiple attack attempts, enabling more efficient parallel inference without compromising attack success rates.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Preventing Curriculum Collapse in Self-Evolving Reasoning Systems

Researchers introduce Prism, a new self-evolving AI reasoning system that prevents diversity collapse in problem generation by maintaining semantic coverage across mathematical problem spaces. The system achieved significant accuracy improvements over existing methods on mathematical reasoning benchmarks and generated 100k diverse mathematical questions.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Residual Stream Analysis of Overfitting And Structural Disruptions

Researchers identified that repetitive safety training data causes large language models to develop false refusals, where benign queries are incorrectly declined. They developed FlowLens, a PCA-based analysis tool, and proposed Variance Concentration Loss (VCL) as a regularization technique that reduces false refusals by over 35 percentage points while maintaining performance.

AIBearisharXiv – CS AI · Mar 177/10

🧠

VisualLeakBench: Auditing the Fragility of Large Vision-Language Models against PII Leakage and Social Engineering

Researchers introduced VisualLeakBench, a new evaluation suite that tests Large Vision-Language Models (LVLMs) for vulnerabilities to privacy attacks through visual inputs. The study found significant weaknesses in frontier AI systems like GPT-5.2, Claude-4, Gemini-3 Flash, and Grok-4, with Claude-4 showing the highest PII leakage rate at 74.4% despite having strong OCR attack resistance.

🧠 GPT-5🧠 Claude🧠 Gemini

AINeutralarXiv – CS AI · Mar 177/10

🧠

Safety-Guided Flow (SGF): A Unified Framework for Negative Guidance in Safe Generation

Researchers introduce Safety-Guided Flow (SGF), a unified probabilistic framework that combines control barrier functions with negative guidance approaches to improve safety in AI-generated content. The framework identifies a critical time window during the denoising process where strong negative guidance is most effective for preventing harmful outputs.

AINeutralarXiv – CS AI · Mar 177/10

🧠

Real-World AI Evaluation: How FRAME Generates Systematic Evidence to Resolve the Decision-Maker's Dilemma

FRAME (Forum for Real World AI Measurement and Evaluation) addresses the challenge organizational leaders face in governing AI systems without systematic evidence of real-world performance. The framework combines large-scale AI trials with structured observation of contextual use and outcomes, utilizing a Testing Sandbox and Metrics Hub to provide actionable insights.

$MKR

AINeutralarXiv – CS AI · Mar 177/10

🧠

How Meta-research Can Pave the Road Towards Trustworthy AI In Healthcare: Catalogue of Ideas and Roadmap for Future Research

Researchers convened a February 2025 workshop to explore how meta-research methodologies can enhance Trustworthy AI (TAI) implementation in healthcare. The study identifies key challenges including robustness, reproducibility, clinical integration, and transparency gaps, proposing a roadmap for interdisciplinary collaboration between TAI and meta-research fields.

AI × CryptoBullisharXiv – CS AI · Mar 177/10

🤖

TAS-GNN: A Status-Aware Signed Graph Neural Network for Anomaly Detection in Bitcoin Trust Systems

Researchers developed TAS-GNN, a novel Graph Neural Network framework specifically designed to detect fraudulent behavior in Bitcoin trust systems. The system addresses critical limitations in existing anomaly detection methods by using a dual-channel architecture that separately processes trust and distrust signals to better identify Sybil attacks and exit scams.

$BTC

AIBearisharXiv – CS AI · Mar 177/10

🧠

Brittlebench: Quantifying LLM robustness via prompt sensitivity

Researchers introduce Brittlebench, a new evaluation framework that reveals frontier AI models experience up to 12% performance degradation when faced with minor prompt variations like typos or rephrasing. The study shows that semantics-preserving input perturbations can account for up to half of a model's performance variance, highlighting significant robustness issues in current language models.

AIBullisharXiv – CS AI · Mar 177/10

🧠

RelayCaching: Accelerating LLM Collaboration via Decoding KV Cache Reuse

Researchers introduce RelayCaching, a training-free method that accelerates multi-agent LLM systems by reusing KV cache data from previous agents to eliminate redundant computation. The technique achieves over 80% cache reuse and reduces time-to-first-token by up to 4.7x while maintaining accuracy across mathematical reasoning, knowledge tasks, and code generation.

AINeutralarXiv – CS AI · Mar 177/10

🧠

The AI Transformation Gap Index (AITG): An Empirical Framework for Measuring AI Transformation Opportunity, Disruption Risk, and Value Creation at the Industry and Firm Level

Researchers introduce the AI Transformation Gap Index (AITG), the first empirical framework to measure firms' AI readiness relative to competitors and translate it into quantifiable financial outcomes. The framework analyzes 22 industries and shows that larger AI transformation gaps don't always create the highest value due to implementation challenges and timing issues.

AIBullisharXiv – CS AI · Mar 177/10

🧠

ICaRus: Identical Cache Reuse for Efficient Multi Model Inference

ICaRus introduces a novel architecture enabling multiple AI models to share identical Key-Value (KV) caches, addressing memory explosion issues in multi-model inference systems. The solution achieves up to 11.1x lower latency and 3.8x higher throughput by allowing cross-model cache reuse while maintaining comparable accuracy to task-specific fine-tuned models.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Training-Free Agentic AI: Probabilistic Control and Coordination in Multi-Agent LLM Systems

Researchers introduce REDEREF, a training-free controller that improves multi-agent LLM system efficiency by 28% token usage reduction and 17% fewer agent calls through probabilistic routing and belief-guided delegation. The system uses Thompson sampling and reflection-driven re-routing to optimize agent coordination without requiring model fine-tuning.

AIBullisharXiv – CS AI · Mar 177/10

🧠

Explain in Your Own Words: Improving Reasoning via Token-Selective Dual Knowledge Distillation

Researchers developed Token-Selective Dual Knowledge Distillation (TSD-KD), a new framework that improves AI reasoning by allowing smaller models to learn from larger ones more effectively. The method achieved up to 54.4% better accuracy than baseline models on reasoning benchmarks, with student models sometimes outperforming their teachers by up to 20.3%.

AIBearisharXiv – CS AI · Mar 177/10

🧠

The Ghost in the Grammar: Methodological Anthropomorphism in AI Safety Evaluations

A philosophical analysis critiques AI safety research for excessive anthropomorphism, arguing researchers inappropriately project human qualities like "intention" and "feelings" onto AI systems. The study examines Anthropic's research on language models and proposes that the real risk lies not in emergent agency but in structural incoherence combined with anthropomorphic projections.

🏢 Anthropic

← PrevPage 218 of 1215Next →