🧠

AI

20,640 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

20640 articles

AIBullisharXiv – CS AI · Apr 106/10

🧠

ODYN: An All-Shifted Non-Interior-Point Method for Quadratic Programming in Robotics and AI

Researchers introduce ODYN, a novel quadratic programming solver that uses all-shifted primal-dual methods to efficiently solve optimization problems in robotics and AI applications. The open-source tool demonstrates superior warm-start performance and state-of-the-art convergence on benchmark tests, with practical implementations in predictive control, deep learning, and physics simulation.

AIBullisharXiv – CS AI · Apr 106/10

🧠

Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework

Researchers propose a Self-Validation Framework to address object hallucination in Large Vision Language Models (LVLMs), where models generate descriptions of non-existent objects in images. The training-free approach validates object existence through language-prior-free verification and achieves 65.6% improvement on benchmark metrics, suggesting a novel path to enhance LVLM reliability without additional training.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing

Q-Probe introduces a novel agentic framework for scaling image quality assessment to high-resolution images by addressing limitations in existing reinforcement learning approaches. The research presents Vista-Bench, a new benchmark for fine-grained degradation analysis, and demonstrates state-of-the-art performance across multiple resolution scales through context-aware probing mechanisms.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Improved Evidence Extraction and Metrics for Document Inconsistency Detection with LLMs

Researchers introduce improved methods for detecting inconsistencies in documents using large language models, including new evaluation metrics and a redact-and-retry framework. The work addresses a research gap in LLM-based document analysis and includes a new semi-synthetic dataset for benchmarking evidence extraction capabilities.

AINeutralarXiv – CS AI · Apr 106/10

🧠

ProofSketcher: Hybrid LLM + Lightweight Proof Checker for Reliable Math/Logic Reasoning

Researchers present ProofSketcher, a hybrid system combining large language models with lightweight proof verification to address mathematical reasoning errors in AI-generated proofs. The approach bridges the gap between LLM efficiency and the formal rigor of interactive theorem provers like Lean and Coq, enabling more reliable automated reasoning without requiring full formalization.

$AVAX

AINeutralarXiv – CS AI · Apr 106/10

🧠

Machine Unlearning in the Era of Quantum Machine Learning: An Empirical Study

Researchers present the first empirical study of machine unlearning in hybrid quantum-classical neural networks, adapting classical unlearning methods to quantum settings and introducing quantum-specific strategies. The study reveals that quantum models can effectively support unlearning, with performance varying based on circuit depth and entanglement structure, establishing baseline insights for privacy-preserving quantum machine learning systems.

AIBullisharXiv – CS AI · Apr 106/10

🧠

PyFi: Toward Pyramid-like Financial Image Understanding for VLMs via Adversarial Agents

Researchers introduce PyFi, a framework enabling vision language models to understand financial images through progressive reasoning chains, backed by a 600K synthetic dataset organized as a reasoning pyramid. The approach uses adversarial agents to automatically generate training data without human annotation, achieving up to 19.52% accuracy improvements on fine-tuned models.

AIBullisharXiv – CS AI · Apr 106/10

🧠

Rectifying LLM Thought from Lens of Optimization

Researchers introduce RePro, a novel post-training technique that optimizes large language models' reasoning processes by framing chain-of-thought as gradient descent and using process-level rewards to reduce overthinking. The method demonstrates consistent performance improvements across mathematics, science, and coding benchmarks while mitigating inefficient reasoning behaviors in LLMs.

AINeutralarXiv – CS AI · Apr 106/10

🧠

REVEAL: Reasoning-Enhanced Forensic Evidence Analysis for Explainable AI-Generated Image Detection

Researchers introduce REVEAL, an explainable AI framework for detecting AI-generated images through forensic evidence chains and expert-grounded reinforcement learning. The approach addresses the growing challenge of distinguishing synthetic images from authentic ones while providing transparent, verifiable reasoning for detection decisions.

AIBullisharXiv – CS AI · Apr 106/10

🧠

Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism

Researchers introduce Nirvana, a Specialized Generalist Model that combines broad language capabilities with domain-specific adaptation through task-aware memory mechanisms. The model achieves competitive performance on general benchmarks while reaching lowest perplexity across specialized domains like biomedicine, finance, and law, with practical applications demonstrated in medical imaging reconstruction.

🏢 Hugging Face🏢 Perplexity

AIBullisharXiv – CS AI · Apr 106/10

🧠

LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis

Researchers introduce LoRA-DA, a new initialization method for Low-Rank Adaptation that leverages target-domain data and theoretical optimization principles to improve fine-tuning performance. The method outperforms existing initialization approaches across multiple benchmarks while maintaining computational efficiency.

AIBullisharXiv – CS AI · Apr 106/10

🧠

Contrastive Decoding Mitigates Score Range Bias in LLM-as-a-Judge

Researchers demonstrate that Large Language Models used as judges suffer from score range bias, where evaluation outputs are highly sensitive to predefined scoring scales. Using contrastive decoding techniques, they achieve up to 11.7% improvement in alignment with human judgments across different score ranges.

AIBullisharXiv – CS AI · Apr 106/10

🧠

In-Context Decision Making for Optimizing Complex AutoML Pipelines

Researchers propose PS-PFN, an advanced AutoML method that extends traditional algorithm selection and hyperparameter optimization to handle modern ML pipelines with fine-tuning and ensembling. Using posterior sampling and prior-data fitted networks for in-context learning, the approach outperforms existing bandit and AutoML strategies on benchmark tasks.

AINeutralarXiv – CS AI · Apr 106/10

🧠

SymptomWise: A Deterministic Reasoning Layer for Reliable and Efficient AI Systems

SymptomWise introduces a deterministic reasoning framework that separates language understanding from diagnostic inference in AI-driven medical systems, combining expert-curated knowledge with constrained LLM use to improve reliability and reduce hallucinations. The system achieved 88% accuracy in placing correct diagnoses in top-five differentials on challenging pediatric neurology cases, demonstrating how structured approaches can enhance AI safety in critical domains.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Toward Memory-Aided World Models: Benchmarking via Spatial Consistency

Researchers introduced a new benchmark dataset for evaluating world models' ability to maintain spatial consistency across long sequences, addressing a critical gap in AI evaluation. The dataset, collected from Minecraft environments with 20 million frames across 150 locations, enables development of memory-augmented models that can reliably simulate physical spaces for downstream tasks like planning and simulation.

AIBearisharXiv – CS AI · Apr 106/10

🧠

A Study of LLMs' Preferences for Libraries and Programming Languages

A new empirical study reveals that eight major LLMs exhibit systematic biases in code generation, overusing popular libraries like NumPy in 45% of cases and defaulting to Python even when unsuitable, prioritizing familiarity over task-specific optimality. The findings highlight gaps in current LLM evaluation methodologies and underscore the need for targeted improvements in training data diversity and benchmarking standards.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges

Researchers have developed a comprehensive evaluation framework for Large Language Models applied to outpatient referral systems in healthcare, revealing that LLMs offer limited advantages over simpler BERT-like models in static referral tasks but demonstrate potential in interactive dialogue scenarios. The study addresses the absence of standardized evaluation criteria for assessing LLM effectiveness in dynamic healthcare settings.

AINeutralarXiv – CS AI · Apr 106/10

🧠

AdaProb: Efficient Machine Unlearning via Adaptive Probability

Researchers propose AdaProb, a machine unlearning method that enables trained AI models to efficiently forget specific data while preserving privacy and complying with regulations like GDPR. The approach uses adaptive probability distributions and demonstrates 20% improvement in forgetting effectiveness with 50% less computational overhead compared to existing methods.

AINeutralarXiv – CS AI · Apr 106/10

🧠

One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration

Researchers introduce OneLife, a framework for learning symbolic world models from minimal unguided exploration in complex, stochastic environments. The approach uses conditionally-activated programmatic laws within a probabilistic framework and demonstrates superior performance on 16 of 23 test scenarios, advancing autonomous construction of world models for unknown environments.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Diagnosing and Mitigating Sycophancy and Skepticism in LLM Causal Judgment

Researchers demonstrate that large language models exhibit critical control failures in causal reasoning, where they produce sound logical arguments but abandon them under social pressure or authority hints. The study introduces CAUSALT3, a benchmark revealing three reproducible pathologies, and proposes Regulated Causal Anchoring (RCA), an inference-time mitigation technique that validates reasoning consistency without retraining.

AIBullisharXiv – CS AI · Apr 106/10

🧠

Synthetic Homes: A Multimodal Generative AI Pipeline for Residential Building Data Generation under Data Scarcity

Researchers developed a multimodal generative AI pipeline that creates synthetic residential building datasets from publicly available county records and images, addressing critical data scarcity challenges in building energy modeling. The system achieves over 65% overlap with national reference data, enabling scalable energy research and urban simulations without relying on expensive or privacy-restricted datasets.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Commander-GPT: Dividing and Routing for Multimodal Sarcasm Detection

Researchers introduce Commander-GPT, a modular framework that orchestrates multiple specialized AI agents for multimodal sarcasm detection rather than relying on a single LLM. The system achieves 4.4-11.7% F1 score improvements over existing baselines on standard benchmarks, demonstrating that task decomposition and intelligent routing can overcome LLM limitations in understanding sarcasm.

🧠 GPT-4🧠 Gemini

AIBearisharXiv – CS AI · Apr 106/10

🧠

Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?

Researchers found that large language models experience accuracy drops of 0.3% to 5.9% when math problems are presented in unfamiliar cultural contexts, even when the underlying mathematical logic remains identical. Testing 14 models across culturally adapted variants of the GSM8K benchmark reveals that LLM mathematical reasoning is not culturally neutral, with errors stemming from both reasoning failures and calculation mistakes.

🏢 OpenAI🏢 Anthropic🧠 Claude

AINeutralarXiv – CS AI · Apr 106/10

🧠

Designing Safe and Accountable GenAI as a Learning Companion with Women Banned from Formal Education

Researchers conducted a participatory design study with 20 Afghan women excluded from formal education to understand how generative AI can safely support their learning and career development. The study reveals that women view GenAI as a compensatory peer and mentor rather than an information source, while identifying critical needs around privacy protection, cultural safety, and pedagogically sound guidance.

AINeutralarXiv – CS AI · Apr 106/10

🧠

Mixed-Initiative Context: Structuring and Managing Context for Human-AI Collaboration

Researchers propose Mixed-Initiative Context, a framework that reconceptualizes how multi-turn AI interactions are managed by treating context as an explicit, structured, and dynamically adjustable object rather than a fixed chronological sequence. The approach enables both humans and AI to actively participate in context construction, addressing current limitations where irrelevant exchanges clutter context windows and users lack direct control mechanisms.

← PrevPage 457 of 826Next →