🧠

AI

21,049 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21049 articles

AIBullisharXiv – CS AI · Apr 76/10

🧠

PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training

Researchers introduce PRAISE, a new framework that improves training efficiency for AI agents performing complex search tasks like multi-hop question answering. The method addresses key limitations in current reinforcement learning approaches by reusing partial search trajectories and providing intermediate rewards rather than only final answer feedback.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

AIBearisharXiv – CS AI · Apr 76/10

🧠

Plausibility as Commonsense Reasoning: Humans Succeed, Large Language Models Do not

A new study reveals that large language models fail to integrate world knowledge with syntactic structure for ambiguity resolution in the same way humans do. Researchers tested Turkish language models on relative-clause attachment ambiguities and found that while humans reliably use plausibility to guide interpretation, LLMs show weak, unstable, or reversed responses to the same plausibility cues.

AINeutralarXiv – CS AI · Apr 76/10

🧠

LiveFact: A Dynamic, Time-Aware Benchmark for LLM-Driven Fake News Detection

Researchers have developed LiveFact, a new dynamic benchmark for evaluating Large Language Models' ability to detect fake news and misinformation in real-time conditions. The benchmark addresses limitations of static testing by using temporal evidence sets and finds that open-source models like Qwen3-235B-A22B now match proprietary systems in performance.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Neuromorphic Computing for Low-Power Artificial Intelligence

Researchers outline how neuromorphic computing could overcome energy efficiency limits in classical CMOS technology for AI applications. The approach requires co-design across materials, circuits, and algorithms to achieve brain-inspired compute-in-memory architectures.

AIBearisharXiv – CS AI · Apr 76/10

🧠

Individual and Combined Effects of English as a Second Language and Typos on LLM Performance

Research reveals that Large Language Models (LLMs) experience greater performance degradation when facing English as a Second Language (ESL) inputs combined with typographical errors, compared to either factor alone. The study tested eight ESL variants with three levels of typos, finding that evaluations on clean English may overestimate real-world model performance.

AIBearisharXiv – CS AI · Apr 76/10

🧠

Metaphors We Compute By: A Computational Audit of Cultural Translation vs. Thinking in LLMs

New research reveals that Large Language Models (LLMs) exhibit cultural bias and Western defaultism when generating metaphors across different cultural contexts. The study found that LLMs act more as cultural translators using dominant Western frameworks rather than true culturally-aware reasoning systems, even when prompted with specific cultural identities.

AINeutralarXiv – CS AI · Apr 76/10

🧠

What Makes Good Multilingual Reasoning? Disentangling Reasoning Traces with Measurable Features

Researchers challenge the assumption that multilingual AI reasoning should simply mimic English patterns, finding that effective reasoning features vary significantly across languages. The study analyzed Large Reasoning Models across 10 languages and discovered that English-derived reasoning approaches may not translate effectively to other languages, suggesting need for adaptive, language-specific AI training methods.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Discovering Failure Modes in Vision-Language Models using RL

Researchers developed an AI framework using reinforcement learning to automatically discover failure modes in vision-language models without human intervention. The system trains a questioner agent that generates adaptive queries to expose weaknesses, successfully identifying 36 novel failure modes across various VLM combinations.

AIBullisharXiv – CS AI · Apr 76/10

🧠

MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition

Researchers propose MUXQ, a new quantization technique for large language models that addresses activation outliers through low-rank decomposition. The method enables efficient INT8 quantization while maintaining accuracy close to FP16, making it suitable for edge device deployment with NPU-based hardware.

🏢 Perplexity

AINeutralarXiv – CS AI · Apr 76/10

🧠

Multilingual Prompt Localization for Agent-as-a-Judge: Language and Backbone Sensitivity in Requirement-Level Evaluation

A research study reveals that AI model performance rankings change dramatically based on the evaluation language used, with GPT-4o performing best in English while Gemini leads in Arabic and Hindi. The study tested 55 development tasks across five languages and six AI models, showing no single model dominates across all languages.

🧠 GPT-4🧠 Gemini

AINeutralarXiv – CS AI · Apr 76/10

🧠

Reproducibility study on how to find Spurious Correlations, Shortcut Learning, Clever Hans or Group-Distributional non-robustness and how to fix them

A reproducibility study unifies research on spurious correlations in deep neural networks across different domains, comparing correction methods including XAI-based approaches. The research finds that Counterfactual Knowledge Distillation (CFKD) most effectively improves model generalization, though practical deployment remains challenging due to group labeling dependencies and data scarcity issues.

AINeutralarXiv – CS AI · Apr 75/10

🧠

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

🧠 Stable Diffusion

AIBullisharXiv – CS AI · Apr 76/10

🧠

DP-OPD: Differentially Private On-Policy Distillation for Language Models

Researchers have developed DP-OPD (Differentially Private On-Policy Distillation), a new framework for training privacy-preserving language models that significantly improves performance over existing methods. The approach simplifies the training pipeline by eliminating the need for DP teacher training and offline synthetic text generation while maintaining strong privacy guarantees.

🏢 Perplexity

AIBullisharXiv – CS AI · Apr 76/10

🧠

Conversational Control with Ontologies for Large Language Models: A Lightweight Framework for Constrained Generation

Researchers developed a lightweight framework that uses ontological definitions to provide modular and explainable control over Large Language Model outputs in conversational systems. The method fine-tunes LLMs to generate content according to specific constraints like English proficiency level and content polarity, consistently outperforming pre-trained baselines across seven state-of-the-art models.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Training Transformers in Cosine Coefficient Space

Researchers developed a new method to train transformer neural networks using discrete cosine transform (DCT) coefficients, achieving the same performance while using only 52% of the parameters. The technique requires no architectural changes and simply replaces standard linear layers with spectral layers that store DCT coefficients instead of full weight matrices.

🏢 Perplexity

AIBullisharXiv – CS AI · Apr 76/10

🧠

GROUNDEDKG-RAG: Grounded Knowledge Graph Index for Long-document Question Answering

Researchers introduced GroundedKG-RAG, a new retrieval-augmented generation system that creates knowledge graphs directly grounded in source documents to improve long-document question answering. The system reduces resource consumption and hallucinations while maintaining accuracy comparable to state-of-the-art models at lower cost.

AIBullisharXiv – CS AI · Apr 76/10

🧠

HighFM: Towards a Foundation Model for Learning Representations from High-Frequency Earth Observation Data

Researchers have developed HighFM, a foundation model for analyzing high-frequency Earth observation data using over 2TB of satellite imagery to enable real-time disaster monitoring. The model adapts masked autoencoding frameworks with temporal encodings to capture short-term environmental changes and demonstrates superior performance in cloud masking and fire detection tasks.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Poisoned Identifiers Survive LLM Deobfuscation: A Case Study on Claude Opus 4.6

Research study reveals that when Claude Opus 4.6 deobfuscates JavaScript code, poisoned identifier names from the original string table consistently survive in the reconstructed code, even when the AI demonstrates correct understanding of the code's semantics. Changing the task framing from 'deobfuscate' to 'write fresh implementation' significantly reduced this persistence while maintaining algorithmic accuracy.

🧠 Claude🧠 Haiku🧠 Opus

AIBullisharXiv – CS AI · Apr 76/10

🧠

Context is All You Need

Researchers introduce CONTXT, a lightweight neural network adaptation method that improves AI model performance when deployed on data different from training data. The technique uses simple additive and multiplicative transforms to modulate internal representations, providing consistent gains across both discriminative and generative models including LLMs.

AIBullisharXiv – CS AI · Apr 76/10

🧠

APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs

Researchers propose APPA, a new framework for aligning large language models with diverse human preferences in federated learning environments. The method dynamically reweights group-level rewards to improve fairness, achieving up to 28% better alignment for underperforming groups while maintaining overall model performance.

🏢 Meta🧠 Llama

AIBearisharXiv – CS AI · Apr 76/10

🧠

Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models

A new research study reveals that major large language models exhibit systematic bias toward American English over British English across training data, tokenization, and outputs. The research introduces DiAlign, a method for measuring dialectal alignment, and finds evidence of linguistic homogenization that could impact global AI equity.

AINeutralarXiv – CS AI · Apr 76/10

🧠

ClawArena: Benchmarking AI Agents in Evolving Information Environments

Researchers introduce ClawArena, a new benchmark for evaluating AI agents' ability to maintain accurate beliefs in evolving information environments with conflicting sources. The benchmark tests 64 scenarios across 8 professional domains, revealing significant performance gaps between different AI models and frameworks in handling dynamic belief revision and multi-source reasoning.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Graphic-Design-Bench: A Comprehensive Benchmark for Evaluating AI on Graphic Design Tasks

Researchers introduce GraphicDesignBench (GDB), the first comprehensive benchmark suite for evaluating AI models on professional graphic design tasks including layout, typography, and animation. Testing reveals current AI models struggle with spatial reasoning, vector code generation, and typographic precision despite showing promise in high-level semantic understanding.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison

Researchers conducted the first comprehensive analysis of emotion representations in small language models (100M-10B parameters), finding that these models do possess internal emotion vectors similar to larger frontier models. The study evaluated 9 models across 5 architectural families and discovered that emotion representations localize at middle transformer layers, with generation-based extraction methods proving superior to comprehension-based approaches.

🏢 Perplexity🧠 Llama

← PrevPage 480 of 842Next →