AI Pulse News

Models, papers, tools. 39,827 articles with AI-powered sentiment analysis and key takeaways.

39827 articles

AINeutralarXiv – CS AI · Jun 96/10

🧠

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

Researchers introduce Evaluation Cards, a standardized reporting framework that addresses fragmented AI evaluation practices across leaderboards and model cards. The system consolidates benchmark metadata, evaluation data, and model information into unified records with interpretive signals for reproducibility and comparability, deployed across 5,816 models and 635 benchmarks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

XAInomaly: Explainable and Interpretable Deep Contractive Autoencoder for O-RAN Traffic Anomaly Detection

Researchers introduce XAInomaly, an explainable AI framework using a Semi-supervised Deep Contractive Autoencoder for detecting anomalies in Open RAN (O-RAN) networks. The system addresses the critical need for interpretable machine learning in complex wireless infrastructure by combining generative modeling with explainability techniques to identify network traffic deviations while maintaining transparency in decision-making.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Bidirectional Small-Granularity Search between Code and Text

Researchers introduce a bidirectional search task linking code snippets with text descriptions and vice versa, addressing the gap between scientific publications and their implementations. They present a large dataset with automatically-generated training data and manually-annotated test sets, along with a modular encoder-based approach that achieves strong in-domain results with promising out-of-domain generalization.

🧠 GPT-4

AIBearisharXiv – CS AI · Jun 96/10

🧠

Evaluating Hallucinations in Domain-Adapted Large Language Models

Researchers investigating hallucinations in fine-tuned Large Language Models found that domain adaptation via fine-tuning alone is insufficient to prevent inaccurate outputs. Testing Llama-2 with domain-specific data revealed the model struggles with novel reasoning tasks and tends to over-generate information, highlighting fundamental limitations in current LLM adaptation techniques.

🧠 Llama

AIBullisharXiv – CS AI · Jun 96/10

🧠

Retrieval Augmented Generation Framework for the Nepali Legal Domain Question Answering

Researchers have successfully developed the first Retrieval Augmented Generation (RAG) system for legal question answering in Nepali, addressing a critical gap in AI applications for low-resource languages. The system achieved 91% precision using BM25 retrieval and demonstrated 84% human-evaluated truthfulness, establishing a viable foundation for AI-assisted legal services in non-English speaking jurisdictions.

AINeutralarXiv – CS AI · Jun 96/10

🧠

ABLE: Representing and Mapping LLMs via Attribution-Based Large-model Embedding

Researchers introduce ABLE, a framework that represents and compares large language models through gradient-based feature attributions rather than parameter analysis or output comparison. The training-free method achieves competitive performance on model comparison tasks across 239 open-source LLMs while providing theoretical stability guarantees.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Implicit Causal Graph Construction in Text via Chain Discovery

Researchers develop a novel method for constructing implicit causal graphs from text by using large language models to infer intermediate causal events between observed cause-effect pairs. The study compares multiple approaches including chain discovery and iterative search processes, validated against a curated database of 1,560 scientifically verified causal relationships.

AIBullisharXiv – CS AI · Jun 96/10

🧠

GraphLoRA: Structure-Aware Low-Rank Adaptation for Large Language Model Recommendation

GraphLoRA introduces a novel framework that integrates graph neural networks with low-rank adaptation to improve Large Language Model-based recommendation systems. By embedding trainable graph message-passing within the LoRA pathway, the method enables collaborative signals to directly guide parameter updates, achieving superior performance while maintaining computational efficiency compared to existing LLM recommendation approaches.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Post-training is (Massive) Supervised Learning

A new arXiv paper argues that current LLM post-training methods (SFT and RL) function primarily as distribution-fitting mechanisms rather than developing general capabilities, reverting to pre-BERT era approaches. The authors demonstrate that randomly initialized models achieve non-trivial performance when fine-tuned on modern benchmarks, suggesting the field should shift toward training systems that learn how to learn rather than optimizing for specific tasks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

BEACON: Behavioral Entropy Aggregation for Cross-Model Hallucination Detection in Large Language Models

Researchers introduce BEACON, a black-box hallucination detection framework for large language models that achieves 81.23% accuracy by analyzing model outputs without requiring internal access. The method combines multiple uncertainty signals including semantic entropy and consistency checks, outperforming existing baselines and offering practical deployment options across commercial LLM APIs.

AINeutralarXiv – CS AI · Jun 96/10

🧠

CAPruner: Conceptual-Adjacent Scene Graph Pruner for Enhancing 3D Spatial Reasoning of Large Language Models

Researchers propose CAPruner, a scene graph pruning method that enhances how large language models perform 3D spatial reasoning by preserving task-relevant relations rather than relying solely on spatial proximity. The approach combines fuzzy semantic relevance with spatial proximity to identify critical relations, addressing computational inefficiencies in 3D vision-language tasks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models

Researchers introduce mllm-shap, an open-source framework that extends Shapley Value explainability techniques to multimodal large language models processing text and audio inputs simultaneously. The platform addresses three technical challenges unique to multimodal systems and implements five estimation strategies, with a novel phonetic alignment technique reducing computational complexity by 10-50x.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Principled Agent Debate: Adversarial Arbitration for Sycophancy Reduction in Large Language Models

Researchers present Principled Agent Debate (PAD), a multi-agent architecture that reduces sycophancy in large language models by having two models with opposing dispositions argue positions while a blind arbitrator evaluates them. Testing on 200 questions shows PAD variants achieve 48.5-53% accuracy compared to 18.5% for single models, significantly improving truthfulness over agreement bias.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Bridging Traditional Explainability Methods and Multimodal Multilingual Models: An XAI-Based Analysis

Researchers have developed a novel framework extending Shapley Values—a traditional explainability method—to multimodal large language models that process both text and audio. The work introduces computational optimizations and a preprocessing technique called Spectrogram-Guided Phonetic Alignment to make the analysis feasible, alongside an open-source tool for visualization, revealing that input modality significantly affects model attribution patterns.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Bidirectional Semantic Complementary Tool Retrieval for Remote Sensing Agents

Researchers propose a bidirectional semantic complementary tool retrieval (BSCTR) method to improve how LLM-based agents select appropriate tools for remote sensing tasks. The approach addresses a fundamental mismatch between high-level user queries and detailed tool documentation by enhancing queries with decomposed subtasks and enriching tool descriptions with contextual dependencies, demonstrating improved performance on specialized remote sensing benchmarks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Multimodal Large Language Models as Synthetic Participants in Video-Based Studies: An Evaluation

Researchers evaluated whether multimodal large language models (MLLMs) like Gemini 3 Flash and Qwen 3 Omni can replicate human subjective responses in video perception tasks using the Perceived Message Sensation Value framework. The study found significant limitations: MLLMs demonstrated systematic biases including downward mean-shift, central-tendency bias, and inconsistent sensitivity to participant profiles, suggesting current models remain unreliable as synthetic human participants for subjective research.

🧠 Gemini

AIBearisharXiv – CS AI · Jun 96/10

🧠

Concerns and Strategic Responses of Older Workers Navigating Generative AI in Bridge Employment

A research study examines how older workers navigating bridge employment experience disruptions from generative AI adoption and develop resilience strategies to adapt. The findings reveal that older workers face temporal and structural challenges throughout their re-entry into the workforce, responding through task reconfiguration and boundary work while requiring organizational and collective support to prevent burnout.

AINeutralarXiv – CS AI · Jun 95/10

🧠

AI-Integrated Learning Management System for Middle School: A Longitudinal Study of Learning Outcomes Through High School and Beyond

Researchers propose an AI-integrated Learning Management System designed for middle school students that combines formative feedback, adaptive practice, and teacher dashboards while prioritizing privacy through data minimization and auditable logs. A longitudinal study will track whether sustained AI support improves academic outcomes from middle school through post-secondary pathways, addressing the traditional bottleneck where students practice through confusion before receiving corrective feedback.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Evaluating Advanced Prompting on Gemini Flash for Multi-Hop Biomedical QA

Researchers evaluated Google's Gemini Flash models on the MedHopQA biomedical reasoning challenge, demonstrating that advanced prompt engineering significantly improves LLM performance in complex multi-hop question answering. A sophisticated prompt combining role-playing and chain-of-thought examples achieved a 0.720 score versus 0.565 baseline, with Gemini 2.0 Flash matching newer 2.5 Flash performance.

🧠 Gemini

AINeutralarXiv – CS AI · Jun 96/10

🧠

Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark

Researchers introduce RL4F, an open-source benchmark for applying offline reinforcement learning to plasma control in nuclear fusion reactors. Using historical data from the DIII-D tokamak, the framework enables safe algorithm development without costly real-device experimentation, with model-based RL methods showing superior performance across multiple plasma control objectives.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Symbolic Reasoning Frameworks Modulate LLM Risk Aversion in Multi-Agent Strategic Settings

Researchers demonstrate that symbolic reasoning frameworks (I-Ching, Tarot) injected as prompts into language models deployed as strategic agents significantly reshape multi-agent game outcomes by modulating risk-aversion behaviors, producing framework-specific winner distributions in a 7-player diplomacy simulation without the agents following the frameworks' literal content.

AIBullisharXiv – CS AI · Jun 96/10

🧠

MedicalRec: Medical recommender system for image classification without retraining

Researchers have developed MedicalRec, a transformer-based recommender system that identifies optimal deep learning models for medical image classification tasks without requiring retraining. The system leverages a new dataset (MedicalRec-Bench) containing over 5,000 model performance records across five medical imaging domains, achieving a 75.5% HitRate@100 and addressing the computational waste inherent in trial-and-error model selection.

AINeutralarXiv – CS AI · Jun 95/10

🧠

Selecting New Measurement Locations to Diversify Traffic-Pattern Coverage: A Real-World Evaluation for Total Traffic Volume Estimation

Researchers propose an algorithm for strategically placing additional traffic counters in cities by identifying locations with underrepresented traffic patterns, rather than using spatial distribution alone. A real-world evaluation demonstrated that this pattern-diversity approach improves city-wide traffic volume estimation accuracy compared to conventional counter placement methods.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Page image classifier fine-tuned on century-spanning archives of scanned documents for further content-specific processing

Researchers developed an automated image classification system using fine-tuned deep learning models to categorize scanned historical documents by content type (text, tables, graphics), achieving 99.16% accuracy on Czech archaeological archives. The system successfully processed over 649,000 unlabeled pages, with RegNetY-16GF emerging as the most reliable model for production deployment due to consistent inter-model agreement.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Phantom transitions in language model fine-tuning

Researchers discovered that language models fail silently when fine-tuned on contexts with near-synonym competitors, exhibiting apparent phase transitions that are actually artifacts of the softmax readout rather than genuine geometric changes. The study identifies two failure modes and demonstrates that apparent discontinuities persist even under LoRA fine-tuning where embedding matrices remain frozen, revealing the phenomenon occurs entirely in the output layer.

← PrevPage 513 of 1594Next →