🧠

AI

12,877 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12877 articles

AIBullisharXiv – CS AI · Mar 166/10

🧠

Stake the Points: Structure-Faithful Instance Unlearning

Researchers propose a new "structure-faithful" framework for machine unlearning that preserves semantic relationships in AI models while removing specific data. The method uses semantic anchors to maintain knowledge structure, showing significant performance improvements of 19-33% across image classification, retrieval, and face recognition tasks.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Seeing Eye to Eye: Enabling Cognitive Alignment Through Shared First-Person Perspective in Human-AI Collaboration

Researchers propose Eye2Eye, a new framework that uses first-person perspective to improve human-AI collaboration by addressing communication and understanding gaps. The AR prototype integrates joint attention coordination, revisable memory, and reflective feedback, showing significant improvements in task completion time and user trust in studies.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Continual Learning in Large Language Models: Methods, Challenges, and Opportunities

This comprehensive survey examines continual learning methodologies for large language models, focusing on three core training stages and methods to mitigate catastrophic forgetting. The research reveals that while current approaches show promise in specific domains, fundamental challenges remain in achieving seamless knowledge integration across diverse tasks and temporal scales.

AIBullisharXiv – CS AI · Mar 166/10

🧠

MetaKE: Meta-learning Aligned Knowledge Editing via Bi-level Optimization

Researchers propose MetaKE, a new framework for knowledge editing in Large Language Models that addresses the 'Semantic-Execution Disconnect' through bi-level optimization. The method treats edit targets as learnable parameters and uses a Structural Gradient Proxy to align edits with the model's feasible manifold, showing significant improvements over existing approaches.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations

Researchers developed an agentic AI framework using LLMs like Claude Opus 4.6 and GitHub Copilot to automate chemical process flowsheet modeling. The multi-agent system decomposes engineering tasks with one agent solving problems using domain knowledge and another implementing solutions in code for industrial simulations.

🏢 Anthropic🏢 Microsoft🧠 Claude

AIBullisharXiv – CS AI · Mar 166/10

🧠

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

Researchers introduce Cheers, a unified multimodal AI model that combines visual comprehension and generation by decoupling patch details from semantic representations. The model achieves 4x token compression and outperforms existing models like Tar-1.5B while using only 20% of the training cost.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Swap-guided Preference Learning for Personalized Reinforcement Learning from Human Feedback

Researchers propose Swap-guided Preference Learning (SPL) to address posterior collapse issues in Variational Preference Learning for RLHF systems. SPL introduces three new components to better capture personalized user preferences and improve AI alignment with diverse human values.

AINeutralarXiv – CS AI · Mar 166/10

🧠

When LLM Judge Scores Look Good but Best-of-N Decisions Fail

Research reveals that large language models used as judges for scoring responses show misleading performance when evaluated by global correlation metrics versus actual best-of-n selection tasks. A study using 5,000 prompts found that judges with moderate global correlation (r=0.47) only captured 21% of potential improvement, primarily due to poor within-prompt ranking despite decent overall agreement.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Feynman: Knowledge-Infused Diagramming Agent for Scalable Visual Designs

Researchers have developed Feynman, an AI agent that generates high-quality diagram-caption pairs at scale for training vision-language models. The system created a dataset of 100k+ well-aligned diagrams and introduced Diagramma, a benchmark for evaluating visual reasoning capabilities.

AINeutralarXiv – CS AI · Mar 166/10

🧠

LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation

Researchers have launched LLM BiasScope, an open-source web application that enables real-time bias analysis and side-by-side comparison of outputs from major language models including Google Gemini, DeepSeek, and Meta Llama. The platform uses a two-stage bias detection pipeline and provides interactive visualizations to help researchers and practitioners evaluate bias patterns across different AI models.

🏢 Hugging Face🧠 Gemini🧠 Llama

AIBullisharXiv – CS AI · Mar 166/10

🧠

TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning

Researchers developed TERMINATOR, an early-exit strategy for Large Reasoning Models that reduces Chain-of-Thought reasoning lengths by 14-55% without performance loss. The system identifies optimal stopping points during inference to prevent overthinking and excessive compute usage.

AINeutralarXiv – CS AI · Mar 166/10

🧠

The Perfection Paradox: From Architect to Curator in AI-Assisted API Design

A research study with 16 industry experts found that AI-assisted API design outperformed human-authored specifications in 10 of 11 usability dimensions while reducing authoring time by 87%. However, experts identified a 'Perfection Paradox' where AI-generated designs appeared unsettlingly perfect due to hyper-consistency, suggesting humans should shift from drafting to curating AI-generated patterns.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Na\"ive PAINE: Lightweight Text-to-Image Generation Improvement with Prompt Evaluation

Researchers propose Naïve PAINE, a lightweight system that improves text-to-image generation quality by predicting which initial noise inputs will produce better results before running the full diffusion model. The approach reduces the need for multiple generation cycles to get satisfactory images by pre-selecting higher-quality noise patterns.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Test-Time Strategies for More Efficient and Accurate Agentic RAG

Researchers improved agentic Retrieval-Augmented Generation (RAG) systems by introducing contextualization and de-duplication modules to address inefficiencies in complex question-answering. The enhanced Search-R1 pipeline achieved 5.6% better accuracy and 10.5% fewer retrieval turns using GPT-4.1-mini.

🧠 GPT-4

AIBullisharXiv – CS AI · Mar 166/10

🧠

Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies

Researchers developed Q-DIG, a red-teaming method that uses Quality Diversity techniques to identify diverse language instruction failures in Vision-Language-Action models for robotics. The approach generates adversarial prompts that expose vulnerabilities in robot behavior and improves task success rates when used for fine-tuning.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Mastering Negation: Boosting Grounding Models via Grouped Opposition-Based Learning

Researchers introduced D-Negation, a new dataset and learning framework that improves vision-language AI models' ability to understand negative semantics and complex expressions. The approach achieved up to 5.7 mAP improvement on negative semantic evaluations while fine-tuning less than 10% of model parameters.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Global Evolutionary Steering: Refining Activation Steering Control via Cross-Layer Consistency

Researchers propose Global Evolutionary Refined Steering (GER-steer), a new training-free framework for controlling Large Language Models without fine-tuning costs. The method addresses issues with existing activation engineering approaches by using geometric stability to improve steering vector accuracy and reduce noise.

AIBearisharXiv – CS AI · Mar 166/10

🧠

Prompt Injection as Role Confusion

Researchers have identified 'role confusion' as the fundamental mechanism behind prompt injection attacks on language models, where models assign authority based on how text is written rather than its source. The study achieved 60-61% attack success rates across multiple models and found that internal role confusion strongly predicts attack success before generation begins.

AIBullisharXiv – CS AI · Mar 166/10

🧠

FastDSAC: Unlocking the Potential of Maximum Entropy RL in High-Dimensional Humanoid Control

Researchers introduce FastDSAC, a new framework that successfully applies Maximum Entropy Reinforcement Learning to high-dimensional humanoid control tasks. The system uses Dimension-wise Entropy Modulation and continuous distributional critics to achieve 180% and 400% performance gains on challenging control tasks compared to deterministic methods.

AIBullisharXiv – CS AI · Mar 166/10

🧠

DART: Input-Difficulty-AwaRe Adaptive Threshold for Early-Exit DNNs

Researchers introduce DART, a new framework for early-exit deep neural networks that achieves up to 3.3x speedup and 5.1x lower energy consumption while maintaining accuracy. The system uses input difficulty estimation and adaptive thresholds to optimize AI inference for resource-constrained edge devices.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Structured Distillation for Personalized Agent Memory: 11x Token Reduction with Retrieval Preservation

Researchers developed a structured distillation method that compresses AI agent conversation history by 11x (from 371 to 38 tokens per exchange) while maintaining 96% of retrieval quality. The technique enables thousands of exchanges to fit within a single prompt at 1/11th the context cost, addressing the expensive verbatim storage problem for long AI conversations.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Task-Specific Knowledge Distillation via Intermediate Probes

Researchers introduce a new knowledge distillation framework that improves training of smaller AI models by using intermediate representations from large language models rather than their final outputs. The method shows consistent improvements across reasoning benchmarks, particularly when training data is limited, by providing cleaner supervision signals.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Budget-Sensitive Discovery Scoring: A Formally Verified Framework for Evaluating AI-Guided Scientific Selection

Researchers introduce Budget-Sensitive Discovery Score (BSDS), a formally verified framework for evaluating AI-guided scientific candidate selection under budget constraints. Testing on drug discovery datasets reveals that simple random forest models outperform large language models, with LLMs providing no marginal value over existing trained classifiers.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization

Researchers propose AMRO-S, a new routing framework for multi-agent LLM systems that uses ant colony optimization to improve efficiency and reduce costs. The system addresses key deployment challenges like high inference costs and latency while maintaining performance quality through semantic-aware routing and interpretable decision-making.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Literary Narrative as Moral Probe : A Cross-System Framework for Evaluating AI Ethical Reasoning and Refusal Behavior

Researchers developed a new method to evaluate AI ethical reasoning using literary narratives from science fiction, testing 13 AI systems across 24 conditions. The study found that current AI systems perform surface-level ethical responses rather than genuine moral reasoning, with more sophisticated systems showing more complex failure modes.

🏢 Anthropic🏢 Microsoft🧠 Claude

← PrevPage 203 of 516Next →