AI Pulse News

Models, papers, tools. 18,995 articles with AI-powered sentiment analysis and key takeaways.

18995 articles

AIBullisharXiv – CS AI · Apr 136/10

🧠

WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models

Researchers introduce WAND, a framework that reduces computational and memory costs of autoregressive text-to-speech models by replacing full self-attention with windowed attention combined with knowledge distillation. The approach achieves up to 66.2% KV cache memory reduction while maintaining speech quality, addressing a critical scalability bottleneck in modern AR-TTS systems.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

Researchers systematically evaluated how sampling temperature and prompting strategies affect extended reasoning performance in large language models, finding that zero-shot prompting peaks at moderate temperatures (T=0.4-0.7) while chain-of-thought performs better at extremes. The study reveals that extended reasoning benefits grow substantially with higher temperatures, suggesting that T=0 is suboptimal for reasoning tasks.

🧠 Grok

AINeutralarXiv – CS AI · Apr 136/10

🧠

Silhouette Loss: Differentiable Global Structure Learning for Deep Representations

Researchers introduce Soft Silhouette Loss, a novel machine learning objective that improves deep neural network representations by enforcing intra-class compactness and inter-class separation. The lightweight differentiable loss outperforms cross-entropy and supervised contrastive learning when combined, achieving 39.08% top-1 accuracy compared to 37.85% for existing methods while reducing computational overhead.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Structured Exploration and Exploitation of Label Functions for Automated Data Annotation

Researchers introduce EXPONA, an automated framework for generating label functions that improve weak label quality in machine learning datasets. The system balances exploration across surface, structural, and semantic levels with reliability filtering, achieving up to 98.9% label coverage and 46% downstream performance improvements across diverse classification tasks.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models

Researchers analyzed how large language models decide whether to act on predictions or escalate to humans, finding that models use inconsistent and miscalibrated thresholds across five real-world domains. Supervised fine-tuning on chain-of-thought reasoning proved most effective at establishing robust escalation policies that generalize across contexts, suggesting escalation behavior requires explicit characterization before AI system deployment.

AIBullisharXiv – CS AI · Apr 136/10

🧠

Adaptive Rigor in AI System Evaluation using Temperature-Controlled Verdict Aggregation via Generalized Power Mean

Researchers introduce Temperature-Controlled Verdict Aggregation (TCVA), a novel evaluation method that adapts AI system assessment rigor based on application domain requirements. By combining verdict scoring with generalized power-mean aggregation and a tunable temperature parameter, TCVA achieves human-aligned evaluation comparable to existing benchmarks while offering computational efficiency.

AIBullisharXiv – CS AI · Apr 136/10

🧠

TiAb Review Plugin: A Browser-Based Tool for AI-Assisted Title and Abstract Screening

Researchers developed TiAb Review Plugin, an open-source Chrome extension that enables AI-assisted screening of academic titles and abstracts without requiring server subscriptions or coding skills. The tool combines Google Sheets for collaboration, Google's Gemini API for LLM-based screening, and an in-browser machine learning algorithm achieving 94-100% recall, demonstrating practical viability for systematic literature reviews.

🧠 Gemini

AINeutralarXiv – CS AI · Apr 136/10

🧠

Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach

Researchers present a forensic-focused multimodal framework for detecting hate speech and threats across images, documents, and text. The approach intelligently determines what evidence is present before applying appropriate AI models, improving accuracy and evidentiary traceability in digital investigations.

AINeutralarXiv – CS AI · Apr 136/10

🧠

From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity

Researchers propose FEAT, a federated learning method that improves continual learning by addressing class imbalance and representation collapse across distributed clients. The approach combines geometric alignment and energy-based correction to better utilize exemplar samples while maintaining performance under dynamic heterogeneity.

AINeutralarXiv – CS AI · Apr 136/10

🧠

StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning

StructRL is a new reinforcement learning framework that recovers dynamic programming structure from distributional learning dynamics without requiring explicit models. The research demonstrates that temporal patterns in return distribution evolution reveal inherent structure in how information propagates through state spaces, enabling more efficient and stable learning.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Practical Bayesian Inference for Speech SNNs: Uncertainty and Loss-Landscape Smoothing

Researchers demonstrate that applying Bayesian inference to Spiking Neural Networks (SNNs) for speech processing smooths the irregular loss landscape caused by threshold-based spike generation. Testing on speech datasets shows improved performance metrics and more regular predictive landscapes compared to deterministic approaches.

AINeutralarXiv – CS AI · Apr 136/10

🧠

VOLTA: The Surprising Ineffectiveness of Auxiliary Losses for Calibrated Deep Learning

Researchers introduce VOLTA, a simplified deep learning approach for uncertainty quantification that outperforms ten established baselines including ensemble methods and MC Dropout. The method achieves superior calibration with expected calibration error of 0.010 and competitive accuracy across multiple datasets, suggesting that complex auxiliary losses may be unnecessary for reliable uncertainty estimation in safety-critical applications.

AINeutralarXiv – CS AI · Apr 136/10

🧠

3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding

Researchers introduce 3D-VCD, an inference-time framework that reduces hallucinations in 3D-LLM embodied agents by contrasting predictions against distorted scene graphs. The method addresses failures specific to 3D spatial reasoning without requiring model retraining, advancing reliability in embodied AI systems.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition

Researchers introduce MATU, a novel uncertainty quantification framework using tensor decomposition to address reliability challenges in Large Language Model-based Multi-Agent Systems. The method analyzes entire reasoning trajectories rather than single outputs, effectively measuring uncertainty across different agent structures and communication topologies.

AINeutralarXiv – CS AI · Apr 136/10

🧠

LLMs Underperform Graph-Based Parsers on Supervised Relation Extraction for Complex Graphs

A new study comparing large language models against graph-based parsers for relation extraction demonstrates that smaller, specialized architectures significantly outperform LLMs when processing complex linguistic graphs with multiple relations. This finding challenges the prevailing assumption that larger language models are universally superior for natural language processing tasks.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models

Researchers benchmarked five frontier LLMs against human players in Cards Against Humanity games, finding that while models exceed random baseline performance, their humor preferences align poorly with humans but strongly with each other. The findings suggest LLM humor judgment may reflect systematic biases and structural artifacts rather than genuine preference understanding.

AIBearisharXiv – CS AI · Apr 136/10

🧠

Lessons Without Borders? Evaluating Cultural Alignment of LLMs Using Multilingual Story Moral Generation

Researchers evaluated how well frontier LLMs like GPT-4o and Gemini interpret story morals across 14 language-culture pairs, finding that while models generate semantically similar outputs to humans, they lack cultural diversity and concentrate on universally shared values rather than culturally-specific moral interpretations.

🧠 GPT-4🧠 Gemini

AINeutralarXiv – CS AI · Apr 136/10

🧠

Building Better Environments for Autonomous Cyber Defence

Workshop participants from academia, industry, and government convened in November 2025 to establish best practices for designing reinforcement learning environments in autonomous cyber defence. The resulting framework and guidelines address a critical gap in documented knowledge about RL environment development for network security applications, including critical infrastructure protection.

AIBullisharXiv – CS AI · Apr 136/10

🧠

HiFloat4 Format for Language Model Pre-training on Ascend NPUs

Researchers demonstrate that HiFloat4, a 4-bit floating-point format, enables efficient large language model training on Huawei's Ascend NPUs with up to 4x improvements in compute throughput and memory efficiency. The study shows that specialized stabilization techniques can maintain accuracy within 1% of full-precision baselines while preserving computational gains across dense and mixture-of-experts architectures.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs

Researchers introduce Dictionary-Aligned Concept Control (DACO), a framework that uses a curated dictionary of 15,000 multimodal concepts and Sparse Autoencoders to improve safety in multimodal large language models by steering their activations at inference time. Testing across multiple models shows DACO significantly enhances safety performance while preserving general-purpose capabilities without requiring model retraining.

AINeutralarXiv – CS AI · Apr 136/10

🧠

AI-Induced Human Responsibility (AIHR) in AI-Human teams

A research study reveals that people assign significantly more responsibility to human decision-makers when they work alongside AI systems compared to human teammates, even in scenarios involving moral harm. This 'AI-Induced Human Responsibility' (AIHR) effect stems from perceiving AI as a constrained tool rather than an autonomous agent, raising important questions about accountability structures in AI-augmented organizations.

$MKR

AINeutralarXiv – CS AI · Apr 136/10

🧠

Beyond Relevance: Utility-Centric Retrieval in the LLM Era

A research paper proposes a fundamental shift in how retrieval systems are evaluated, moving from traditional relevance-based metrics toward utility-centric optimization for large language models. This framework argues that retrieval effectiveness should be measured by its contribution to LLM-generated answer quality rather than document ranking alone, reflecting the structural changes introduced by retrieval-augmented generation (RAG) systems.

AINeutralarXiv – CS AI · Apr 136/10

🧠

WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning

Researchers introduce WOMBET, a framework that improves reinforcement learning efficiency in robotics by generating synthetic training data from a world model in source tasks and selectively transferring it to target tasks. The approach combines offline-to-online learning with uncertainty-aware planning to reduce data collection costs while maintaining robustness.

AINeutralarXiv – CS AI · Apr 136/10

🧠

Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models

Researchers introduce Litmus (Re)Agent, an agentic system that predicts how multilingual AI models will perform on tasks lacking direct benchmark data. Using a controlled benchmark of 1,500 questions across six tasks, the system decomposes queries into hypotheses and synthesizes predictions through structured reasoning, outperforming competing approaches particularly when direct evidence is sparse.

AINeutralarXiv – CS AI · Apr 136/10

🧠

PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment

Researchers introduce PerMix-RLVR, a training method that enables large language models to maintain persona flexibility while preserving task robustness. The approach addresses a fundamental trade-off in reinforcement learning with verifiable rewards, where models become less responsive to persona prompts but gain improved performance on objective tasks.

← PrevPage 305 of 760Next →