🧠

AI

12,706 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12706 articles

AIBullisharXiv – CS AI · Apr 156/10

🧠

Learning Chain Of Thoughts Prompts for Predicting Entities, Relations, and even Literals on Knowledge Graphs

Researchers introduce RALP, a novel method that uses chain-of-thought prompts with large language models to improve knowledge graph predictions, outperforming traditional embedding models by over 5% on standard benchmarks while better handling unseen entities, relations, and numerical data.

AIBullisharXiv – CS AI · Apr 156/10

🧠

PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning

Researchers introduce PromptEcho, a novel reward construction method for improving text-to-image model training that requires no human annotation or model fine-tuning. By leveraging frozen vision-language models to compute token-level alignment scores, the approach achieves significant performance gains on multiple benchmarks while remaining computationally efficient.

AINeutralarXiv – CS AI · Apr 156/10

🧠

GF-Score: Certified Class-Conditional Robustness Evaluation with Fairness Guarantees

Researchers introduce GF-Score, a framework that evaluates neural network robustness across individual classes while measuring fairness disparities, eliminating the need for expensive adversarial attacks through self-calibration. Testing across 22 models reveals consistent vulnerability patterns and shows that more robust models paradoxically exhibit greater class-level fairness disparities.

AIBullisharXiv – CS AI · Apr 156/10

🧠

CLASP: Class-Adaptive Layer Fusion and Dual-Stage Pruning for Multimodal Large Language Models

Researchers introduce CLASP, a token reduction framework that optimizes Multimodal Large Language Models by intelligently pruning visual tokens through class-adaptive layer fusion and dual-stage pruning. The approach addresses computational inefficiency in MLLMs while maintaining performance across diverse benchmarks and architectures.

AINeutralarXiv – CS AI · Apr 156/10

🧠

CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference

Researchers propose CoDe-R, a two-stage framework using Large Language Models to improve binary decompilation by reducing logical errors and semantic misalignment. A 1.3B model using this approach achieves state-of-the-art performance on the HumanEval-Decompile benchmark, becoming the first lightweight model to exceed 50% re-executability rates.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Researchers investigate on-policy distillation (OPD) dynamics in large language model training, identifying two critical success conditions: compatible thinking patterns between student and teacher models, and genuine new capabilities from the teacher. The study reveals that successful OPD relies on token-level alignment and proposes recovery strategies for failing distillation scenarios.

AINeutralarXiv – CS AI · Apr 156/10

🧠

League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

Researchers propose League of LLMs (LOL), a benchmark-free evaluation framework that uses mutual peer assessment among multiple LLMs to overcome data contamination and evaluation bias issues. Testing on eight mainstream models reveals 70.7% ranking consistency while uncovering model-specific behaviors like memorization patterns and family-based scoring bias in OpenAI models.

🏢 OpenAI

AINeutralarXiv – CS AI · Apr 156/10

🧠

No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning

Researchers introduce ECHO, a reinforcement learning framework that co-evolves policy and critic models to address the problem of stale feedback in LLM agent training. The system uses cascaded rollouts and saturation-aware gain shaping to maintain synchronized, relevant critique as the agent's behavior improves over time, demonstrating enhanced stability and success rates in complex environments.

AINeutralarXiv – CS AI · Apr 156/10

🧠

PrivacyReasoner: Can LLM Emulate a Human-like Privacy Mind?

Researchers introduce PrivacyReasoner, an LLM-based agent architecture that reconstructs individual privacy perspectives from online comment history to predict how specific people would perceive data practices. The system outperforms baseline models in predicting privacy concerns across AI, e-commerce, and healthcare domains by contextually activating relevant privacy beliefs.

AINeutralarXiv – CS AI · Apr 156/10

🧠

LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries

Researchers propose LatentRefusal, a safety mechanism for LLM-based text-to-SQL systems that detects unanswerable queries by analyzing intermediate hidden activations rather than relying on output-level instruction following. The approach achieves 88.5% F1 score across four benchmarks while adding minimal computational overhead, addressing a critical deployment challenge in AI systems that generate executable code.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Prompt Evolution for Generative AI: A Classifier-Guided Approach

Researchers propose a prompt evolution framework that uses classifier-guided evolutionary algorithms to improve generative AI outputs. Rather than enhancing prompts before generation, the method applies selection pressure during the generative process to produce images better aligned with user preferences while maintaining diversity.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data

Researchers demonstrate that fine-tuning Large Language Models for report summarization is feasible on limited on-premise hardware (1-2 A100 GPUs), addressing practical constraints in sensitive government and intelligence applications. The study compares supervised and unsupervised approaches, finding that fine-tuning improves summary quality and reduces invalid outputs, even without ground-truth training data.

AIBullisharXiv – CS AI · Apr 156/10

🧠

Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning

Researchers propose Joint Flashback Adaptation, a novel method to address catastrophic forgetting in large language models during incremental task learning. The approach uses limited prompts from previous tasks combined with latent task interpolation, demonstrating improved performance across 1000+ instruction-following and reasoning tasks without requiring full replay data.

AIBullisharXiv – CS AI · Apr 156/10

🧠

Fast AI Model Partition for Split Learning over Edge Networks

Researchers propose an optimal model partitioning algorithm for split learning that reduces training delays by up to 38.95% by representing AI models as directed acyclic graphs and solving the problem via maximum-flow methods. The approach includes a low-complexity block-wise algorithm that achieves 13x faster computation on edge computing hardware, advancing the feasibility of distributed AI inference on mobile and edge devices.

🏢 Nvidia

AINeutralarXiv – CS AI · Apr 156/10

🧠

Variation in Verification: Understanding Verification Dynamics in Large Language Models

Researchers analyzed how LLM verifiers assess solution correctness in test-time scaling scenarios, revealing that verification effectiveness varies significantly with problem difficulty, generator strength, and verifier capability. The study demonstrates that weak generators can nearly match stronger ones post-verification and that verifier scaling alone cannot solve fundamental verification challenges.

🧠 GPT-4

AINeutralarXiv – CS AI · Apr 156/10

🧠

Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework

Researchers introduce Safe-SAIL, a framework that uses sparse autoencoders to interpret safety features in large language models across four domains (pornography, politics, violence, terror). The work reduces interpretation costs by 55% and identifies 1,758 safety-related features with human-readable explanations, advancing mechanistic understanding of AI safety.

AINeutralarXiv – CS AI · Apr 156/10

🧠

SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

Researchers have developed SeedPrints, a novel fingerprinting method that identifies Large Language Models based on their random initialization seed rather than post-training characteristics. This approach enables model attribution and provenance verification from inception through full pretraining, addressing limitations of existing methods that only work reliably after fine-tuning.

AINeutralarXiv – CS AI · Apr 156/10

🧠

LLM as Attention-Informed NTM and Topic Modeling as long-input Generation: Interpretability and long-Context Capability

Researchers propose a novel framework treating Large Language Models as attention-informed Neural Topic Models, enabling interpretable topic extraction from documents. The approach combines white-box interpretability analysis with black-box long-context LLM capabilities, demonstrating competitive performance on topic modeling tasks while maintaining semantic clarity.

AINeutralarXiv – CS AI · Apr 156/10

🧠

StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback

StableSketcher is a novel AI framework that enhances diffusion models for generating pixel-based hand-drawn sketches with improved prompt fidelity. The approach combines fine-tuned variational autoencoders with a reinforcement learning reward function based on visual question answering, alongside a new SketchDUO dataset of instance-level sketches paired with captions and Q&A pairs.

🧠 Stable Diffusion

AINeutralarXiv – CS AI · Apr 156/10

🧠

Why Did Apple Fall: Evaluating Curiosity in Large Language Models

Researchers have developed a comprehensive evaluation framework based on human curiosity scales to assess whether large language models exhibit curiosity-driven learning. The study finds that LLMs demonstrate stronger knowledge-seeking than humans but remain conservative in uncertain situations, with curiosity correlating positively to improved reasoning and active learning capabilities.

AINeutralarXiv – CS AI · Apr 156/10

🧠

FaCT: Faithful Concept Traces for Explaining Neural Network Decisions

Researchers introduce FaCT, a new approach for explaining neural network decisions through faithful concept-based explanations that don't rely on restrictive assumptions about how models learn. The method includes a new evaluation metric (C²-Score) and demonstrates improved interpretability while maintaining competitive performance on ImageNet.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Reasoning about Intent for Ambiguous Requests

Researchers propose a method for large language models to handle ambiguous user requests by generating structured responses that enumerate multiple valid interpretations with corresponding answers, trained via reinforcement learning with dual reward objectives for coverage and precision.

AIBullisharXiv – CS AI · Apr 156/10

🧠

INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT

Researchers propose INFORM-CT, an AI framework combining large language models and vision-language models to automate detection and reporting of incidental findings in abdominal CT scans. The system uses a planner-executor approach that outperforms traditional manual inspection and existing pure vision-based models in accuracy and efficiency.

AIBearishThe Register – AI · Apr 156/10

🧠

AI-powered mainframe exits are a bubble set to pop: Gartner

Gartner warns that AI-powered mainframe modernization solutions are experiencing speculative hype and face a significant market correction. The research firm suggests current valuations and growth expectations for this sector are unsustainable, positioning it as a bubble vulnerable to collapse.

AINeutralTechCrunch – AI · Apr 156/10

🧠

Anthropic’s rise is giving some OpenAI investors second thoughts

OpenAI's recent funding round requires investors to assume a $1.2 trillion IPO valuation, while Anthropic's current $380 billion valuation presents a more conservative entry point. This valuation disparity is prompting some dual investors to reconsider their allocation strategy between the two leading AI companies.

🏢 OpenAI🏢 Anthropic

← PrevPage 141 of 509Next →