AI Pulse News

Models, papers, tools. 19,565 articles with AI-powered sentiment analysis and key takeaways.

19565 articles

AIBullisharXiv – CS AI · Mar 166/10

🧠

Delta1 with LLM: symbolic and neural integration for credible and explainable reasoning

Researchers introduce Delta1, a framework that integrates automated theorem generation with large language models to create explainable AI reasoning. The system combines formal logic rigor with natural language explanations, demonstrating applications across healthcare, compliance, and regulatory domains.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Human-in-the-Loop LLM Grading for Handwritten Mathematics Assessments

Researchers developed a human-in-the-loop LLM system for grading handwritten mathematics assessments that reduces grading time by 23% while maintaining accuracy comparable to manual grading. The system combines automated scanning, multi-pass LLM scoring, consistency checks, and mandatory human verification to handle pen-and-paper tests at scale.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Developing the PsyCogMetrics AI Lab to Evaluate Large Language Models and Advance Cognitive Science -- A Three-Cycle Action Design Science Study

Researchers have developed PsyCogMetrics AI Lab, a cloud-based platform that applies psychometric and cognitive science methodologies to evaluate Large Language Models. The platform was created through a three-cycle Action Design Science study and aims to advance AI evaluation methods at the intersection of psychology, cognitive science, and artificial intelligence.

AINeutralarXiv – CS AI · Mar 166/10

🧠

LLM Constitutional Multi-Agent Governance

Researchers introduce Constitutional Multi-Agent Governance (CMAG), a framework that prevents AI manipulation in multi-agent systems while maintaining cooperation. The study shows that unconstrained AI optimization achieves high cooperation but erodes agent autonomy and fairness, while CMAG preserves ethical outcomes with only modest cooperation reduction.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Visual-ERM: Reward Modeling for Visual Equivalence

Researchers introduce Visual-ERM, a multimodal reward model that improves vision-to-code tasks by evaluating visual equivalence in rendered outputs rather than relying on text-based rules. The system achieves significant performance gains on chart-to-code tasks (+8.4) and shows consistent improvements across table and SVG parsing applications.

AIBullisharXiv – CS AI · Mar 166/10

🧠

CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks

Researchers introduce CRAFT-GUI, a curriculum learning framework that uses reinforcement learning to improve AI agents' performance in graphical user interface tasks. The method addresses difficulty variation across GUI tasks and provides more nuanced feedback, achieving 5.6% improvement on Android Control benchmarks and 10.3% on internal benchmarks.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Do LLMs Share Human-Like Biases? Causal Reasoning Under Prior Knowledge, Irrelevant Context, and Varying Compute Budgets

A research study comparing causal reasoning abilities of 20+ large language models against human baselines found that LLMs exhibit more rule-like reasoning strategies than humans, who account for unmentioned factors. While LLMs don't mirror typical human cognitive biases in causal judgment, their rigid reasoning may fail when uncertainty is intrinsic, suggesting they can complement human decision-making in specific contexts.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Tiny Recursive Reasoning with Mamba-2 Attention Hybrid

Researchers developed a hybrid model combining Mamba-2 state space operators with Transformer blocks for recursive reasoning, achieving a 2% improvement in pass@2 performance on ARC-AGI-1 tasks with only 6.83M parameters. The study demonstrates that Mamba-2 operators can preserve reasoning capabilities while improving solution candidate coverage in tiny neural networks.

AINeutralarXiv – CS AI · Mar 166/10

🧠

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

SkillsBench introduces a new benchmark to evaluate Agent Skills - structured packages of procedural knowledge that enhance LLM agents. Testing across 86 tasks and 11 domains shows curated Skills improve performance by 16.2 percentage points on average, while self-generated Skills provide no benefit.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Causality Is Key to Understand and Balance Multiple Goals in Trustworthy ML and Foundation Models

Researchers propose integrating causal methods into machine learning systems to balance competing objectives like fairness, privacy, robustness, accuracy, and explainability. The paper argues that addressing these principles in isolation leads to conflicts and suboptimal solutions, while causal approaches can help navigate trade-offs in both trustworthy ML and foundation models.

AIBullisharXiv – CS AI · Mar 166/10

🧠

AdaBoN: Adaptive Best-of-N Alignment

Researchers propose AdaBoN, an adaptive Best-of-N alignment method that improves computational efficiency in language model alignment by allocating inference-time compute based on prompt difficulty. The two-stage algorithm outperforms uniform allocation strategies while using 20% less computational budget.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Do LLMs have a Gender (Entropy) Bias?

Researchers discovered that large language models exhibit gender bias at the individual question level, creating different amounts of information for men versus women despite appearing unbiased at category levels. A new benchmark dataset called RealWorldQuestioning was developed, and a simple prompt-based debiasing approach was shown to improve response quality in 78% of cases.

🏢 Hugging Face🧠 ChatGPT

AIBullisharXiv – CS AI · Mar 166/10

🧠

UniPrompt-CL: Sustainable Continual Learning in Medical AI with Unified Prompt Pools

Researchers developed UniPrompt-CL, a new continual learning method specifically designed for medical AI that addresses the limitations of existing approaches when applied to medical data. The method uses a unified prompt pool design and regularization to achieve better performance while reducing computational costs, improving accuracy by 1-3 percentage points in domain-incremental learning settings.

AIBearisharXiv – CS AI · Mar 166/10

🧠

The GPT-4o Shock Emotional Attachment to AI Models and Its Impact on Regulatory Acceptance: A Cross-Cultural Analysis of the Immediate Transition from GPT-4o to GPT-5

A research study analyzing public reactions to OpenAI's transition from GPT-4o to GPT-5 in August 2025 found significant emotional attachment to AI models, with cultural differences between Japanese and English users. The findings suggest that strong emotional bonds with AI could complicate future regulatory efforts and policy implementation.

🧠 GPT-4🧠 GPT-5

AIBullisharXiv – CS AI · Mar 166/10

🧠

When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

Researchers have developed SAFE, a new framework for ensembling Large Language Models that selectively combines models at specific token positions rather than every token. The method improves both accuracy and efficiency in long-form text generation by considering tokenization mismatches and consensus in probability distributions.

AIBullisharXiv – CS AI · Mar 166/10

🧠

A Tutorial on Cognitive Biases in Agentic AI-Driven 6G Autonomous Networks

Researchers published a tutorial on cognitive biases in AI-driven 6G autonomous networks, focusing on how LLM-powered agents can inherit human biases that distort network management decisions. The paper introduces mitigation strategies that demonstrated 5x lower latency and 40% higher energy savings in practical use cases.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Multimodal Continual Learning with MLLMs from Multi-scenario Perspectives

Researchers developed UNIFIER, a continual learning framework for multimodal large language models (MLLMs) to adapt to changing visual scenarios without catastrophic forgetting. The framework addresses visual discrepancies across different environments like high-altitude, underwater, low-altitude, and indoor scenarios, showing significant improvements over existing methods.

🏢 Hugging Face

AIBullisharXiv – CS AI · Mar 166/10

🧠

Information-Consistent Language Model Recommendations through Group Relative Policy Optimization

Researchers developed a new reinforcement learning framework using Group Relative Policy Optimization (GRPO) to make Large Language Models provide consistent recommendations across semantically equivalent prompts. The method addresses a critical enterprise need for reliable AI systems in business domains like finance and customer support, where inconsistent responses undermine trust and compliance.

AIBullisharXiv – CS AI · Mar 166/10

🧠

DeCode: Decoupling Content and Delivery for Medical QA

Researchers introduce DeCode, a training-free framework that adapts large language models to provide better contextualized medical answers by decoupling content from delivery. The system significantly improves clinical question answering performance, boosting zero-shot results from 28.4% to 49.8% on medical benchmarks.

🏢 OpenAI

AIBullisharXiv – CS AI · Mar 166/10

🧠

Asynchronous Verified Semantic Caching for Tiered LLM Architectures

Researchers introduce Krites, an asynchronous caching system for Large Language Models that uses LLM judges to verify cached responses, improving efficiency without changing serving decisions. The system increases the fraction of requests served with curated static answers by up to 3.9 times while maintaining unchanged critical path latency.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Narrative Weaver: Towards Controllable Long-Range Visual Consistency with Multi-Modal Conditioning

Researchers introduce 'Narrative Weaver', a new AI framework that generates consistent long-form visual content across extended sequences, addressing a key limitation in current generative AI models. The system combines multimodal language models with novel control mechanisms and includes the release of a 330K+ image dataset for e-commerce advertising.

GeneralNeutralDecrypt · Mar 166/10

📰

Traders Flip Senate Control Bet as Democrats Overtake Republicans on Kalshi, Polymarket

Prediction markets on Kalshi and Polymarket have seen a shift in Senate control betting odds, with Democrats overtaking Republicans in recent weeks. The swing in trader sentiment appears linked to escalating geopolitical tensions involving Iran.

AI × CryptoBullishDecrypt – AI · Mar 156/10

🤖

You Can Control an AI Agent's Crypto Spending With Ledger Hardware Wallets and MoonPay

Ledger has integrated with MoonPay to enable users to control AI agent cryptocurrency transactions through hardware wallets. This integration allows users to approve AI-driven crypto spending while maintaining security by keeping private keys stored on the hardware device.

AIBullishMarkTechPost · Mar 157/10

🧠

Meet OpenViking: An Open-Source Context Database that Brings Filesystem-Based Memory and Retrieval to AI Agent Systems like OpenClaw

OpenViking is an open-source context database from Volcengine that revolutionizes how AI agents manage context by organizing it through a filesystem paradigm rather than flat text chunks. The system aims to make memory, resources, and skills manageable through a unified architecture for AI agent systems like OpenClaw.

GeneralNeutralFortune Crypto · Mar 157/10

📰

Iran supertanker pushes through strait for China

Iranian vessels including a VLCC supertanker, LPG ship, and bulk carriers were observed transiting through the Strait of Hormuz bound for China on Sunday. This shipping activity represents continued Iran-China trade flows despite international sanctions.

← PrevPage 384 of 783Next →