AI Pulse News

Models, papers, tools. 17,711 articles with AI-powered sentiment analysis and key takeaways.

17711 articles

AIBullisharXiv – CS AI · Feb 277/109

🧠

ArchAgent: Agentic AI-driven Computer Architecture Discovery

ArchAgent, an AI-driven system built on AlphaEvolve, has achieved breakthrough results in automated computer architecture discovery by designing state-of-the-art cache replacement policies. The system achieved 5.3% performance improvements in just 2 days and 0.9% improvements in 18 days, working 3-5x faster than human-developed solutions.

AINeutralarXiv – CS AI · Feb 277/108

🧠

A Mathematical Theory of Agency and Intelligence

Researchers propose a mathematical framework distinguishing agency from intelligence in AI systems, introducing 'bipredictability' as a measure of effective information sharing between observations, actions, and outcomes. Current AI systems achieve agency but lack true intelligence, which requires adaptive learning and self-monitoring capabilities.

AINeutralarXiv – CS AI · Feb 277/107

🧠

SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation

Researchers introduce SC-ARENA, a new natural language evaluation framework for testing large language models in single-cell biology research. The framework addresses limitations in existing benchmarks by incorporating biological knowledge and real-world task formats to better assess AI models' understanding of cellular biology.

AIBullisharXiv – CS AI · Feb 277/105

🧠

Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Models

Researchers propose Metacognitive Behavioral Tuning (MBT), a new framework that addresses structural fragility in Large Reasoning Models by injecting human-like self-regulatory control into AI thought processes. The approach reduces reasoning collapse and improves accuracy while consuming fewer computational tokens across multi-hop question-answering benchmarks.

AIBullisharXiv – CS AI · Feb 277/106

🧠

Discovery of Interpretable Physical Laws in Materials via Language-Model-Guided Symbolic Regression

Researchers have developed a new framework that uses large language models to guide symbolic regression in discovering interpretable physical laws from high-dimensional materials data. The method reduces the search space by approximately 10^5 times compared to traditional approaches and successfully identified novel formulas for key properties of perovskite materials.

AINeutralarXiv – CS AI · Feb 277/106

🧠

Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation

Researchers have conducted a comprehensive review of adversarial transferability in image classification, identifying gaps in standardized evaluation frameworks for transfer-based attacks. They propose a benchmark framework and categorize existing attacks into six distinct types to address biased assessments in current research.

AIBullisharXiv – CS AI · Feb 277/107

🧠

LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure

Researchers have released LLMServingSim 2.0, a unified simulator that models the complex interactions between heterogeneous hardware and disaggregated software in large language model serving infrastructures. The simulator achieves 0.97% average error compared to real deployments while maintaining 10-minute simulation times for complex configurations.

$NEAR

AIBullisharXiv – CS AI · Feb 277/107

🧠

Residual Koopman Spectral Profiling for Predicting and Preventing Transformer Training Instability

Researchers developed Residual Koopman Spectral Profiling (RKSP), a method that predicts transformer training instability from a single forward pass at initialization with 99.5% accuracy. The technique includes Koopman Spectral Shaping (KSS) which can prevent training divergence and enable 50-150% higher learning rates across various AI models including GPT-2 and LLaMA-2.

$NEAR

AINeutralarXiv – CS AI · Feb 277/106

🧠

Accelerated Online Risk-Averse Policy Evaluation in POMDPs with Theoretical Guarantees and Novel CVaR Bounds

Researchers developed a new theoretical framework for accelerated risk-averse policy evaluation in partially observable Markov decision processes (POMDPs) using Conditional Value-at-Risk (CVaR) bounds. The method enables safe elimination of suboptimal actions while maintaining computational guarantees, achieving substantial speedups in autonomous agent decision-making under uncertainty.

AIBullisharXiv – CS AI · Feb 277/106

🧠

Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention

Researchers propose Affine-Scaled Attention, a new mechanism that improves Transformer model training stability by introducing flexible scaling and bias terms to attention weights. The approach shows consistent improvements in optimization behavior and downstream task performance compared to standard softmax attention across multiple language model sizes.

AINeutralarXiv – CS AI · Feb 277/106

🧠

Modality Collapse as Mismatched Decoding: Information-Theoretic Limits of Multimodal LLMs

Researchers identified a fundamental limitation in multimodal LLMs where decoders trained on text cannot effectively utilize non-text information like speaker identity or visual textures, despite this information being preserved through all model layers. The study demonstrates this 'modality collapse' is due to decoder design rather than encoding failures, with experiments showing targeted training can improve specific modality accessibility.

AINeutralarXiv – CS AI · Feb 277/105

🧠

Conformalized Neural Networks for Federated Uncertainty Quantification under Dual Heterogeneity

Researchers propose FedWQ-CP, a new approach for uncertainty quantification in federated learning that addresses both data and model heterogeneity challenges. The method enables reliable uncertainty estimation across distributed agents while maintaining efficiency through single-round communication and weighted threshold aggregation.

AIBullisharXiv – CS AI · Feb 277/107

🧠

Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents

Researchers introduce GUIPruner, a training-free framework that addresses efficiency bottlenecks in high-resolution GUI agents by eliminating spatiotemporal redundancy. The system achieves 3.4x reduction in computational operations and 3.3x speedup while maintaining 94% of original performance, enabling real-time navigation with minimal resource consumption.

AIBullisharXiv – CS AI · Feb 277/108

🧠

FlashOptim: Optimizers for Memory Efficient Training

FlashOptim introduces memory optimization techniques that reduce AI training memory requirements by over 50% per parameter while maintaining model quality. The suite reduces AdamW memory usage from 16 bytes to 7 bytes per parameter through improved master weight splitting and 8-bit optimizer state quantization.

AIBullisharXiv – CS AI · Feb 277/106

🧠

Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

Researchers developed a runtime-reconfigurable bitwise systolic array architecture for multi-precision quantized neural networks on FPGA hardware accelerators. The system achieves 1.3-3.6x speedup on mixed-precision models while supporting higher clock frequencies up to 250MHz, addressing the trade-off between hardware efficiency and inference accuracy.

AIBullisharXiv – CS AI · Feb 277/105

🧠

Cost-of-Pass: An Economic Framework for Evaluating Language Models

Researchers developed a new economic framework called 'cost-of-pass' to evaluate AI language models by combining accuracy with inference costs. The study found that lightweight models are most cost-effective for basic tasks while reasoning models excel at complex problems, with costs for complex quantitative tasks roughly halving every few months.

AINeutralarXiv – CS AI · Feb 277/107

🧠

LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

LiveMCPBench introduces the first large-scale benchmark evaluating AI agents' ability to navigate real-world tasks using Model Context Protocol (MCP) tools across multiple servers. The benchmark reveals significant performance gaps, with top model Claude-Sonnet-4 achieving 78.95% success while most models only reach 30-50%, identifying tool retrieval as the primary bottleneck.

$OCEAN

AIBullisharXiv – CS AI · Feb 277/106

🧠

Knowledge Fusion of Large Language Models Via Modular SkillPacks

Researchers introduce GraftLLM, a new method for transferring knowledge between large language models using 'SkillPack' format that preserves capabilities while avoiding catastrophic forgetting. The approach enables efficient model fusion and continual learning for heterogeneous models through modular knowledge storage.

AINeutralarXiv – CS AI · Feb 277/107

🧠

Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning

Researchers developed Compositional-ARC, a dataset to test AI models' ability to systematically generalize abstract spatial reasoning tasks. A small 5.7M parameter transformer model trained with meta-learning outperformed large language models like GPT-4o and Gemini 2.0 Flash on novel geometric transformation combinations.

AINeutralarXiv – CS AI · Feb 277/107

🧠

"I think this is fair": Uncovering the Complexities of Stakeholder Decision-Making in AI Fairness Assessment

A qualitative study with 26 non-AI expert stakeholders reveals that everyday users assess AI fairness more comprehensively than AI experts, considering broader features beyond legally protected categories and setting stricter fairness thresholds. The research highlights the importance of incorporating stakeholder perspectives in AI governance and fairness assessment processes.

AINeutralarXiv – CS AI · Feb 277/107

🧠

A Mind Cannot Be Smeared Across Time

A new academic paper proposes that machine consciousness requires simultaneous computation rather than sequential processing. The research introduces 'Stack Theory' with temporal semantics, arguing that conscious unity depends on objective co-instantiation of mental processes within specific time windows, potentially making software consciousness impossible on purely sequential computer architectures.

AIBullisharXiv – CS AI · Feb 277/106

🧠

On Discovering Algorithms for Adversarial Imitation Learning

Researchers have developed DAIL (Discovered Adversarial Imitation Learning), the first meta-learned AI algorithm that uses LLM-guided evolutionary methods to automatically discover reward assignment functions for training AI agents. This breakthrough addresses stability issues in adversarial imitation learning and demonstrates superior performance compared to human-designed approaches across different environments.

AIBullisharXiv – CS AI · Feb 277/103

🧠

Controlling Exploration-Exploitation in GFlowNets via Markov Chain Perspectives

Researchers introduce α-GFNs, an enhanced version of Generative Flow Networks that allows tunable control over exploration-exploitation dynamics through a parameter α. The method achieves up to 10× improvement in mode discovery across various benchmarks by addressing constraints in traditional GFlowNet objectives through Markov chain theory.

$LINK

AINeutralarXiv – CS AI · Feb 277/107

🧠

Vibe Researching as Wolf Coming: Can AI Agents with Skills Replace or Augment Social Scientists?

A research paper introduces the concept of 'vibe researching' where AI agents can autonomously execute entire research pipelines from idea to submission using specialized skills. The study analyzes how AI agents excel at speed and methodological tasks but struggle with theoretical originality and tacit knowledge, creating a cognitive rather than sequential delegation boundary in research workflows.

AIBullisharXiv – CS AI · Feb 277/106

🧠

Toward Automated Validation of Language Model Synthesized Test Cases using Semantic Entropy

Researchers introduce VALTEST, a framework that uses semantic entropy to automatically validate test cases generated by Large Language Models, addressing the problem of invalid or hallucinated tests that mislead AI programming agents. The system improves test validity by up to 29% and enhances code generation performance through better filtering of LLM-generated test cases.

← PrevPage 169 of 709Next →