🧠

AI

12,714 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12714 articles

AIBullisharXiv – CS AI · Apr 146/10

🧠

PoTable: Towards Systematic Thinking via Plan-then-Execute Stage Reasoning on Tables

Researchers introduce PoTable, a novel AI framework that enhances Large Language Models' ability to reason about tabular data through systematic, stage-oriented planning before execution. The approach mimics professional data analyst workflows by breaking complex table reasoning into distinct analytical stages with clear objectives, demonstrating improved accuracy and explainability across benchmark datasets.

AIBullisharXiv – CS AI · Apr 146/10

🧠

WebLLM: A High-Performance In-Browser LLM Inference Engine

WebLLM is an open-source JavaScript framework enabling high-performance large language model inference directly in web browsers without cloud servers. Using WebGPU and WebAssembly technologies, it achieves up to 80% of native GPU performance while preserving user privacy through on-device processing.

🏢 OpenAI

AINeutralarXiv – CS AI · Apr 146/10

🧠

HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks

Researchers introduced HumanVBench, a comprehensive benchmark for evaluating how well multimodal AI models understand human-centric video content across 16 tasks including emotion recognition and speech-visual alignment. The study evaluated 30 leading MLLMs and found significant performance gaps, even among top proprietary models, while introducing automated synthesis pipelines to enable scalable benchmark creation with minimal human effort.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Influencing Humans to Conform to Preference Models for RLHF

Researchers demonstrate that human preferences can be influenced to better align with the mathematical models used in RLHF algorithms, without changing underlying reward functions. Through three interventions—revealing model parameters, training humans on preference models, and modifying elicitation questions—the study shows significant improvements in preference data quality and AI alignment outcomes.

AINeutralarXiv – CS AI · Apr 146/10

🧠

If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs

Researchers introduce LIFESTATE-BENCH, a benchmark for evaluating lifelong learning capabilities in large language models through multi-turn interactions using narrative datasets like Hamlet. Testing shows nonparametric approaches significantly outperform parametric methods, but all models struggle with catastrophic forgetting over extended interactions, revealing fundamental limitations in LLM memory and consistency.

🧠 GPT-4🧠 Llama

AIBullisharXiv – CS AI · Apr 146/10

🧠

Optimizing Large Language Models: Metrics, Energy Efficiency, and Case Study Insights

Researchers demonstrate that quantization and local inference techniques can reduce LLM energy consumption and carbon emissions by up to 45% without sacrificing performance. The findings address growing sustainability concerns surrounding generative AI deployment, offering practical optimization strategies for resource-constrained environments.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

Researchers introduce PODS (Policy Optimization with Down-Sampling), a technique that accelerates reinforcement learning training for large language models by selectively training on high-variance rollouts rather than all generated data. The method achieves equivalent performance to standard approaches at 1.7x faster speeds, addressing computational bottlenecks in LLM reasoning optimization.

AINeutralarXiv – CS AI · Apr 146/10

🧠

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

Researchers propose TokUR, a framework that enables large language models to estimate uncertainty at the token level during reasoning tasks, allowing LLMs to self-assess response quality and improve performance on mathematical problems. The approach uses low-rank random weight perturbation to generate predictive distributions, demonstrating strong correlation with answer correctness and potential for enhancing LLM reliability.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Tuning Language Models for Robust Prediction of Diverse User Behaviors

Researchers introduce BehaviorLM, a progressive fine-tuning approach that enables large language models to predict both common and rare user behaviors more effectively. The method uses a two-stage process that balances learning frequent anchor behaviors with improving predictions for uncommon tail behaviors, demonstrating improved performance on real-world datasets.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Learning World Models for Interactive Video Generation

Researchers propose Video Retrieval Augmented Generation (VRAG) to address fundamental challenges in interactive world models for long-form video generation, specifically tackling compounding errors and spatiotemporal incoherence. The work establishes that autoregressive video generation inherently struggles with error accumulation, while explicit global state conditioning significantly improves long-term consistency and interactive planning capabilities.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Towards Reasonable Concept Bottleneck Models

Researchers introduce CREAM (Concept Reasoning Models), an advanced framework for Concept Bottleneck Models that allows explicit encoding of concept relationships and concept-to-task mappings. The model maintains interpretability while achieving competitive performance even with incomplete concept sets through an optional side-channel, addressing a key limitation in explainable AI systems.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Large Language Model as An Operator: An Experience-Driven Solution for Distribution Network Voltage Control

Researchers propose an LLM-based system for autonomous voltage control in electrical distribution networks, using experience-driven decision-making to optimize day-ahead dispatch strategies. The framework combines historical operational data retrieval with AI-generated solutions, demonstrating how large language models can address complex power system management under incomplete information.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training

Researchers present Data Mixing Agent, an AI framework that uses reinforcement learning to automatically optimize how large language models balance training data from source and target domains during continual pre-training. The approach outperforms manual reweighting strategies while generalizing across different models, domains, and fields without requiring retraining.

AIBullisharXiv – CS AI · Apr 146/10

🧠

Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition

Researchers introduce Modular Delta Merging with Orthogonal Constraints (MDM-OC), a machine learning framework that enables multiple fine-tuned models to be merged, updated, and selectively removed without performance degradation or task interference. The approach uses orthogonal projections to prevent model conflicts and supports compliance requirements like GDPR-mandated data deletion.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Teaching the Teacher: The Role of Teacher-Student Smoothness Alignment in Genetic Programming-based Symbolic Distillation

Researchers propose a novel framework for improving symbolic distillation of neural networks by regularizing teacher models for functional smoothness using Jacobian and Lipschitz penalties. This approach addresses the core challenge that standard neural networks learn complex, irregular functions while symbolic regression models prioritize simplicity, resulting in poor knowledge transfer. Results across 20 datasets demonstrate statistically significant improvements in predictive accuracy for distilled symbolic models.

AINeutralarXiv – CS AI · Apr 146/10

🧠

StyleBench: Evaluating thinking styles in Large Language Models

StyleBench is a new benchmark that evaluates how different reasoning structures (Chain-of-Thought, Tree-of-Thought, etc.) affect LLM performance across various tasks and model sizes. The research reveals that structural complexity only improves accuracy in specific scenarios, with simpler approaches often proving more efficient, and that learning adaptive reasoning strategies is itself a complex problem requiring advanced training methods.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Detecting Invariant Manifolds in ReLU-Based RNNs

Researchers have developed a novel algorithm for detecting invariant manifolds in ReLU-based recurrent neural networks (RNNs), enabling analysis of dynamical system behavior through topological and geometrical properties. The method identifies basin boundaries, multistability, and chaotic dynamics, with applications to scientific computing and explainable AI.

AIBullisharXiv – CS AI · Apr 146/10

🧠

HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation

Researchers introduce HiPRAG, a training methodology that improves agentic RAG systems by using fine-grained process rewards to optimize search decisions. The approach reduces inefficient search behaviors while achieving 65-67% accuracy across QA benchmarks, demonstrating that optimizing reasoning processes yields better performance than outcome-only training.

🧠 Llama

AINeutralarXiv – CS AI · Apr 146/10

🧠

A Survey of Inductive Reasoning for Large Language Models

Researchers present the first comprehensive survey of inductive reasoning in large language models, categorizing improvement methods into post-training, test-time scaling, and data augmentation approaches. The survey establishes unified benchmarks and evaluation metrics for assessing how LLMs perform particular-to-general reasoning tasks that better align with human cognition.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Domain-Specific Data Generation Framework for RAG Adaptation

RAGen is a new framework for generating domain-specific training data to improve Retrieval-Augmented Generation (RAG) systems. The system creates question-answer-context triples using semantic chunking, concept extraction, and Bloom's Taxonomy principles, enabling faster adaptation of LLMs to specialized domains like scientific research and enterprise knowledge bases.

AINeutralarXiv – CS AI · Apr 146/10

🧠

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

Researchers introduce SimBench, a standardized benchmark for evaluating how faithfully large language models simulate human behavior across 20 diverse datasets. The study reveals current LLMs achieve only modest simulation fidelity (40.80/100) and uncovers critical limitations including an alignment-simulation tradeoff and struggles with demographic-specific behavior replication.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?

Researchers identify that reasoning language models exhibit worse performance in low-resource languages due to failures in language understanding rather than reasoning capability itself. The study proposes Selective Translation, which strategically adds English translations only when understanding failures are detected, achieving near full-translation performance while translating just 20% of inputs.

AINeutralarXiv – CS AI · Apr 146/10

🧠

GroupRank: A Groupwise Paradigm for Effective and Efficient Passage Reranking with LLMs

Researchers introduce GroupRank, a novel LLM-based passage reranking paradigm that balances efficiency and accuracy by combining pointwise and listwise ranking approaches. The method achieves state-of-the-art performance with 65.2 NDCG@10 on BRIGHT benchmark while delivering 6.4x faster inference than existing approaches.

AINeutralarXiv – CS AI · Apr 146/10

🧠

A Unified Theory of Sparse Dictionary Learning in Mechanistic Interpretability: Piecewise Biconvexity and Spurious Minima

Researchers develop the first unified theoretical framework for sparse dictionary learning (SDL) methods used in AI interpretability, proving these optimization problems are piecewise biconvex and characterizing why they produce flawed features. The work explains long-standing practical failures in sparse autoencoders and proposes feature anchoring as a solution to improve feature disentanglement in neural networks.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Enhancing Geo-localization for Crowdsourced Flood Imagery via LLM-Guided Attention

Researchers introduce VPR-AttLLM, a framework that enhances geographic localization of crowdsourced flood imagery by integrating Large Language Models with Visual Place Recognition systems. The approach improves location accuracy by 1-3% across standard benchmarks and up to 8% on real flood images without requiring model retraining.

← PrevPage 149 of 509Next →