🧠

AI

12,705 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12705 articles

AINeutralarXiv – CS AI · Apr 156/10

🧠

Modeling Co-Pilots for Text-to-Model Translation

Researchers introduce Text2Model and Text2Zinc, frameworks that use large language models to translate natural language descriptions into formal optimization and satisfaction models. The work represents the first unified approach combining both problem types with a solver-agnostic architecture, though experiments reveal LLMs remain imperfect at this task despite showing competitive performance.

AIBullisharXiv – CS AI · Apr 156/10

🧠

Cycle-Consistent Search: Question Reconstructability as a Proxy Reward for Search Agent Training

Researchers propose Cycle-Consistent Search (CCS), a novel framework for training search agents using reinforcement learning without requiring gold-standard labeled data. The method leverages question reconstructability as a reward signal, using information bottlenecks to ensure agents learn from genuine search quality rather than surface-level linguistic patterns.

AIBullisharXiv – CS AI · Apr 156/10

🧠

PAL: Personal Adaptive Learner

Researchers introduce PAL (Personal Adaptive Learner), an AI platform that transforms lecture videos into interactive learning experiences by dynamically adjusting question difficulty and providing personalized feedback in real time. The system addresses limitations in current educational AI by moving beyond static adaptation to context-aware, individualized support that evolves with learner understanding.

AINeutralarXiv – CS AI · Apr 156/10

🧠

GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization

Researchers propose GRACE, a dynamic coreset selection framework that reduces LLM training costs by intelligently selecting representative dataset subsets. The method combines representation diversity with gradient-based metrics and uses k-NN graph propagation to adapt to evolving training dynamics, demonstrating improved efficiency across multiple benchmarks.

AIBullisharXiv – CS AI · Apr 156/10

🧠

M$^\star$: Every Task Deserves Its Own Memory Harness

Researchers introduce M★, a method that automatically evolves task-specific memory systems for large language model agents by treating memory architecture as executable Python code. The approach outperforms fixed memory designs across conversation, planning, and reasoning benchmarks, suggesting that specialized memory mechanisms significantly outperform one-size-fits-all solutions.

AINeutralarXiv – CS AI · Apr 156/10

🧠

A Layer-wise Analysis of Supervised Fine-Tuning

Researchers present a layer-wise analysis of Supervised Fine-Tuning (SFT) in large language models, revealing that middle layers remain stable during training while final layers exhibit high sensitivity. They introduce Mid-Block Efficient Tuning, a targeted approach that selectively updates intermediate layers and achieves up to 10.2% performance gains over standard LoRA on benchmarks with significantly reduced parameter overhead.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Beyond Static Sandboxing: Learned Capability Governance for Autonomous AI Agents

Researchers introduce Aethelgard, an adaptive governance framework that addresses the capability overprovisioning problem in autonomous AI agents by dynamically restricting tool access based on task requirements. The system uses reinforcement learning to enforce least-privilege principles, reducing security exposure while maintaining operational efficiency.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions

Researchers propose Polynomial Expansion Rank Adaptation (PERA), a novel fine-tuning method that enhances Low-Rank Adaptation (LoRA) by incorporating high-order polynomial interactions into low-rank factors. PERA improves the expressive capacity of LLM fine-tuning without increasing computational costs, demonstrating consistent performance gains across benchmarks while maintaining the efficiency benefits of rank-constrained adaptation.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Disposition Distillation at Small Scale: A Three-Arc Negative Result

Researchers attempted to train behavioral dispositions into small language models through distillation but found that initial positive results were artifacts of measurement errors. After rigorous validation, they discovered no reliable method to instill self-verification and uncertainty acknowledgment without degrading model performance or creating superficial stylistic mimicry across five different small models.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Filtered Reasoning Score: Evaluating Reasoning Quality on a Model's Most-Confident Traces

Researchers propose Filtered Reasoning Score (FRS), a new evaluation metric that assesses the quality of reasoning in large language models beyond simple accuracy metrics. FRS focuses on the model's most confident reasoning traces, evaluating dimensions like faithfulness and coherence, revealing significant performance differences between models that appear identical under traditional accuracy benchmarks.

AIBearisharXiv – CS AI · Apr 156/10

🧠

LLMs Struggle with Abstract Meaning Comprehension More Than Expected

Research shows that large language models like GPT-4o struggle significantly with abstract meaning comprehension across zero-shot, one-shot, and few-shot settings, while fine-tuned models like BERT and RoBERTa perform better. A bidirectional attention classifier inspired by human cognitive strategies improved accuracy by 3-4% on abstract reasoning tasks, revealing a critical gap in how modern LLMs handle non-concrete, high-level semantics.

🧠 GPT-4

AINeutralarXiv – CS AI · Apr 156/10

🧠

Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs

Researchers introduce wSSAS, a deterministic framework that enhances Large Language Model text categorization by combining hierarchical classification with signal-to-noise filtering to improve accuracy and reproducibility. Testing across Google Business, Amazon Product, and Goodreads reviews demonstrates significant improvements in clustering integrity and reduced categorization entropy.

🧠 Gemini

AINeutralarXiv – CS AI · Apr 156/10

🧠

Robust Explanations for User Trust in Enterprise NLP Systems

Researchers propose a black-box robustness evaluation framework for NLP explanations, revealing that decoder-based LLMs produce 73% more stable explanations than encoder models like BERT. The study establishes practical cost-robustness tradeoffs that help organizations select models for compliance-sensitive applications before deployment.

🧠 Llama

AIBullisharXiv – CS AI · Apr 156/10

🧠

Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving

Researchers propose Sequential Navigation Guidance (SNG), a framework addressing a critical flaw in end-to-end autonomous driving systems that over-rely on local scene understanding while underutilizing global navigation information. The SNG framework combines navigation paths and turn-by-turn instructions with a new VQA dataset and efficient model to improve autonomous vehicle planning and navigation-following in complex scenarios.

AINeutralarXiv – CS AI · Apr 156/10

🧠

LLM-Guided Semantic Bootstrapping for Interpretable Text Classification with Tsetlin Machines

Researchers propose a semantic bootstrapping framework that transfers knowledge from large language models into interpretable symbolic Tsetlin Machines, enabling text classification systems to achieve BERT-comparable performance while remaining fully transparent and computationally efficient without runtime LLM dependencies.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent Workloads

Researchers present a systematic study of seven tactics for reducing cloud LLM token consumption in coding-agent workloads, demonstrating that local routing combined with prompt compression can achieve 45-79% token savings on certain tasks. The open-source implementation reveals that optimal cost-reduction strategies vary significantly by workload type, offering practical guidance for developers deploying AI coding agents at scale.

🏢 OpenAI

AINeutralarXiv – CS AI · Apr 156/10

🧠

Cooperative Memory Paging with Keyword Bookmarks for Long-Horizon LLM Conversations

Researchers propose cooperative paging, a method for managing long LLM conversations by replacing evicted context with compact keyword bookmarks and providing a recall tool for on-demand retrieval. The technique outperforms existing solutions on the LoCoMo benchmark across multiple models, though bookmark discrimination remains a critical limitation.

🧠 GPT-4🧠 Claude

AINeutralarXiv – CS AI · Apr 156/10

🧠

Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks

Researchers introduce CodeRQ-Bench, the first benchmark for evaluating LLM reasoning quality across coding tasks including generation, summarization, and classification. They propose VERA, a two-stage evaluator combining evidence-grounded verification with ambiguity-aware score correction, achieving significant performance improvements over existing methods.

AIBullisharXiv – CS AI · Apr 156/10

🧠

KG-Reasoner: A Reinforced Model for End-to-End Multi-Hop Knowledge Graph Reasoning

Researchers introduce KG-Reasoner, an end-to-end framework that uses reinforcement learning to train large language models to perform multi-hop reasoning over knowledge graphs without decomposing tasks into isolated pipeline steps. The approach demonstrates competitive or superior performance across eight reasoning benchmarks by enabling LLMs to dynamically explore reasoning paths and backtrack when necessary.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Deepfakes at Face Value: Image and Authority

A philosophical paper argues that deepfakes violate a fundamental right to authority over one's own image and identity, distinct from harm-based objections. The work establishes that algorithmic simulation of biometric features constitutes wrongful 'identity conscription' that warrants legal and ethical protection, separating this from permissible artistic depictions.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Topology-Aware Reasoning over Incomplete Knowledge Graph with Graph-Based Soft Prompting

Researchers propose a graph-based soft prompting framework that enables LLMs to reason over incomplete knowledge graphs by processing subgraph structures rather than explicit node paths, achieving state-of-the-art results on multi-hop question-answering benchmarks while reducing computational costs through a two-stage inference approach.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Orthogonal Subspace Projection for Continual Machine Unlearning via SVD-Based LoRA

Researchers propose an SVD-based orthogonal subspace projection method for continual machine unlearning that prevents interference between sequential deletion tasks in neural networks. The approach maintains model performance on retained data while effectively removing influence of unlearned data, addressing a critical limitation of naive LoRA fusion methods.

AINeutralarXiv – CS AI · Apr 156/10

🧠

MODIX: A Training-Free Multimodal Information-Driven Positional Index Scaling for Vision-Language Models

Researchers introduce MODIX, a training-free framework that dynamically optimizes how Vision-Language Models allocate attention across multimodal inputs by adjusting positional encoding based on information density rather than uniform token assignment. The approach improves reasoning performance without modifying model parameters, suggesting positional encoding should be treated as an adaptive resource in multimodal transformer architectures.

AIBullisharXiv – CS AI · Apr 156/10

🧠

TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting

TimeSAF introduces a hierarchical asynchronous fusion framework that improves how large language models guide time series forecasting by decoupling semantic understanding from numerical dynamics. This addresses a fundamental architectural limitation in existing methods and demonstrates superior performance on standard benchmarks with strong generalization capabilities.

AIBullisharXiv – CS AI · Apr 156/10

🧠

Learning Chain Of Thoughts Prompts for Predicting Entities, Relations, and even Literals on Knowledge Graphs

Researchers introduce RALP, a novel method that uses chain-of-thought prompts with large language models to improve knowledge graph predictions, outperforming traditional embedding models by over 5% on standard benchmarks while better handling unseen entities, relations, and numerical data.

← PrevPage 140 of 509Next →