🧠

AI

12,738 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12738 articles

AIBullisharXiv – CS AI · Apr 76/10

🧠

Scaling DPPs for RAG: Density Meets Diversity

Researchers propose ScalDPP, a new retrieval mechanism for RAG systems that uses Determinantal Point Processes to optimize both density and diversity in context selection. The approach addresses limitations in current RAG pipelines that ignore interactions between retrieved information chunks, leading to redundant contexts that reduce effectiveness.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Training Transformers in Cosine Coefficient Space

Researchers developed a new method to train transformer neural networks using discrete cosine transform (DCT) coefficients, achieving the same performance while using only 52% of the parameters. The technique requires no architectural changes and simply replaces standard linear layers with spectral layers that store DCT coefficients instead of full weight matrices.

🏢 Perplexity

AINeutralarXiv – CS AI · Apr 76/10

🧠

Empirical Characterization of Rationale Stability Under Controlled Perturbations for Explainable Pattern Recognition

Researchers propose a new metric to assess consistency of AI model explanations across similar inputs, implementing it on BERT models for sentiment analysis. The framework uses cosine similarity of SHAP values to detect inconsistent reasoning patterns and biased feature reliance, providing more robust evaluation of model behavior.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Memory Intelligence Agent

Researchers have developed Memory Intelligence Agent (MIA), a new AI framework that improves deep research agents through a Manager-Planner-Executor architecture with advanced memory systems. The framework enables continuous learning during inference and demonstrates superior performance across eleven benchmarks through enhanced cooperation between parametric and non-parametric memory systems.

AIBullisharXiv – CS AI · Apr 76/10

🧠

SuperLocalMemory V3.3: The Living Brain -- Biologically-Inspired Forgetting, Cognitive Quantization, and Multi-Channel Retrieval for Zero-LLM Agent Memory Systems

Researchers have released SuperLocalMemory V3.3, an open-source AI agent memory system that operates entirely locally without cloud LLMs, implementing biologically-inspired forgetting mechanisms and multi-channel retrieval. The system achieves 70.4% performance on LoCoMo benchmarks while running on CPU only, addressing the paradox of AI agents having vast knowledge but poor conversational memory.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus

Research reveals that multi-agent LLM committees suffer from 'representational collapse' where agents produce highly similar outputs despite different role prompts, with mean cosine similarity of 0.888. A new diversity-aware consensus protocol (DALC) improves accuracy to 87% while reducing token costs by 26% compared to traditional self-consistency methods.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Optimizing Service Operations via LLM-Powered Multi-Agent Simulation

Researchers introduce an LLM-powered multi-agent simulation framework for optimizing service operations by modeling human behavior through AI agents. The method uses prompts to embed design choices and extracts outcomes from LLM responses to create a controlled Markov chain model, showing superior performance in supply chain and contest design applications.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Automatically Generating Hard Math Problems from Hypothesis-Driven Error Analysis

Researchers have developed a new automated pipeline that generates challenging math problems by first identifying specific mathematical concepts where LLMs struggle, then creating targeted problems to test these weaknesses. The method successfully reduced a leading LLM's accuracy from 77% to 45%, demonstrating its effectiveness at creating more rigorous benchmarks.

🧠 Llama

AIBullisharXiv – CS AI · Apr 76/10

🧠

Decocted Experience Improves Test-Time Inference in LLM Agents

Researchers present a new approach to improve Large Language Model performance without updating model parameters by using 'decocted experience' - extracting and organizing key insights from previous interactions to guide better reasoning. The method shows effectiveness across reasoning tasks including math, web browsing, and software engineering by constructing better contextual inputs rather than simply scaling computational resources.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Implementing surrogate goals for safer bargaining in LLM-based agents

Researchers developed methods to implement 'surrogate goals' in LLM-based agents to reduce bargaining risks by deflecting threats away from what principals care about. The study tested four approaches (prompting, fine-tuning, scaffolding) and found that scaffolding and fine-tuning methods outperformed simple prompting for implementing desired threat response behaviors.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Unveiling Language Routing Isolation in Multilingual MoE Models for Interpretable Subnetwork Adaptation

Researchers discovered that multilingual MoE AI models exhibit 'Language Routing Isolation,' where high and low-resource languages activate different expert sets. They developed RISE, a framework that exploits this isolation to improve low-resource language performance by up to 10.85% F1 score while preserving other language capabilities.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Search, Do not Guess: Teaching Small Language Models to Be Effective Search Agents

Researchers developed a new training approach that makes small language models more effective search agents by teaching them to consistently use search tools rather than relying on internal knowledge. The method achieved significant performance improvements of 17.3 points on Bamboogle and 15.3 points on HotpotQA, reaching large language model-level results while maintaining lower computational costs.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison

Researchers conducted the first comprehensive analysis of emotion representations in small language models (100M-10B parameters), finding that these models do possess internal emotion vectors similar to larger frontier models. The study evaluated 9 models across 5 architectural families and discovered that emotion representations localize at middle transformer layers, with generation-based extraction methods proving superior to comprehension-based approaches.

🏢 Perplexity🧠 Llama

AIBullisharXiv – CS AI · Apr 76/10

🧠

REAM: Merging Improves Pruning of Experts in LLMs

Researchers propose REAM (Router-weighted Expert Activation Merging), a new method for compressing large language models that groups and merges expert weights instead of pruning them. The technique preserves model performance better than existing pruning methods while reducing memory requirements for deployment.

AIBullisharXiv – CS AI · Apr 76/10

🧠

LangFIR: Discovering Sparse Language-Specific Features from Monolingual Data for Language Steering

Researchers introduce LangFIR, a method that enables better language control in multilingual AI models using only monolingual data instead of expensive parallel datasets. The technique identifies sparse language-specific features and achieves superior performance in controlling language output across multiple models including Gemma and Llama.

🧠 Llama

AINeutralarXiv – CS AI · Apr 76/10

🧠

Incentives shape how humans co-create with generative AI

A randomized control trial reveals that incentive structures significantly influence how humans use generative AI in creative tasks. When participants were rewarded for originality rather than just quality, they produced more diverse collective output by using AI more selectively for brainstorming and editing rather than copying suggestions verbatim.

AINeutralarXiv – CS AI · Apr 76/10

🧠

Can Humans Tell? A Dual-Axis Study of Human Perception of LLM-Generated News

A research study using JudgeGPT platform found that humans cannot reliably distinguish between AI-generated and human-written news articles across 2,318 judgments from 1,054 participants. The study tested six different LLMs and concluded that user-side detection is not viable, suggesting the need for cryptographic content provenance systems.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Automated Attention Pattern Discovery at Scale in Large Language Models

Researchers developed AP-MAE, a vision transformer model that analyzes attention patterns in large language models at scale to improve interpretability. The system can predict code generation accuracy with 55-70% precision and enable targeted interventions that increase model accuracy by 13.6%.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Focus Matters: Phase-Aware Suppression for Hallucination in Vision-Language Models

Researchers developed a new method to reduce hallucinations in Large Vision-Language Models (LVLMs) by identifying a three-phase attention structure in vision processing and selectively suppressing low-attention tokens during the focus phase. The training-free approach significantly reduces object hallucinations while maintaining caption quality with minimal inference latency impact.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Automating Cloud Security and Forensics Through a Secure-by-Design Generative AI Framework

Researchers developed a secure-by-design AI framework combining PromptShield and CIAF to automate cloud security and forensic investigations while protecting against prompt injection attacks. The system achieved over 93% accuracy in classification tasks and enhanced ransomware detection in AWS and Azure environments.

AIBullisharXiv – CS AI · Apr 76/10

🧠

I-CALM: Incentivizing Confidence-Aware Abstention for LLM Hallucination Mitigation

Researchers developed I-CALM, a prompt-based framework that reduces AI hallucinations by encouraging language models to abstain from answering when uncertain, rather than providing confident but incorrect responses. The method uses verbal confidence assessment and reward schemes to improve reliability without model retraining.

🧠 GPT-5

AIBullisharXiv – CS AI · Apr 76/10

🧠

InferenceEvolve: Towards Automated Causal Effect Estimators through Self-Evolving AI

Researchers introduce InferenceEvolve, an AI framework using large language models to automatically discover and refine causal inference methods. The system outperformed 58 human submissions in a recent competition and demonstrates how AI can optimize complex scientific programs through evolutionary approaches.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Structured Multi-Criteria Evaluation of Large Language Models with Fuzzy Analytic Hierarchy Process and DualJudge

Researchers developed DualJudge, a new framework for evaluating large language models that combines structured Fuzzy Analytic Hierarchy Process (FAHP) with traditional direct scoring methods. The approach addresses inconsistent LLM evaluation by incorporating uncertainty-aware reasoning and achieved state-of-the-art performance on JudgeBench testing.

AIBullisharXiv – CS AI · Apr 76/10

🧠

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

Researchers introduce vocabulary dropout, a technique to prevent diversity collapse in co-evolutionary language model training where one model generates problems and another solves them. The method sustains proposer diversity and improves mathematical reasoning performance by +4.4 points on average in Qwen3 models.

AIBullisharXiv – CS AI · Apr 76/10

🧠

PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training

Researchers introduce PRAISE, a new framework that improves training efficiency for AI agents performing complex search tasks like multi-hop question answering. The method addresses key limitations in current reinforcement learning approaches by reusing partial search trajectories and providing intermediate rewards rather than only final answer feedback.

← PrevPage 166 of 510Next →