AI Pulse News

Models, papers, tools. 17,946 articles with AI-powered sentiment analysis and key takeaways.

17946 articles

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

Researchers evaluated eight LLM agents across three interaction paradigms—domain-specific agents, computer-use agents, and general-purpose coding agents—on scientific visualization tasks. The study reveals fundamental tradeoffs: general-purpose agents excel at task completion but consume more computational resources, while domain-specific agents offer efficiency and stability at the cost of flexibility, with persistent memory improving performance across modalities.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

RHyVE: Competence-Aware Verification and Phase-Aware Deployment for LLM-Generated Reward Hypotheses

RHyVE is a new verification and deployment protocol for LLM-generated reward functions in reinforcement learning that addresses a critical gap: when and how to use AI-generated rewards during policy training. The research demonstrates that reward reliability depends on policy competence levels and training phases, requiring adaptive deployment strategies rather than static scheduling.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

LLM as Clinical Graph Structure Refiner: Enhancing Representation Learning in EEG Seizure Diagnosis

Researchers propose using large language models as graph structure refiners to improve EEG-based seizure detection by identifying and removing redundant connections in noisy neural signal data. A two-stage framework combining Transformer-based edge prediction with LLM validation demonstrates improved accuracy and more interpretable graph representations on the TUSZ dataset.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

The Impact of LLM Self-Consistency and Reasoning Effort on Automated Scoring Accuracy and Cost

Researchers analyzing LLM-based automated scoring found that strategic model selection and reasoning configurations outperform ensemble methods for accuracy. Temperature sampling improved performance, but larger ensemble sizes showed diminishing returns, while higher reasoning effort correlated with better accuracy at varying cost-benefit ratios across model families.

🏢 OpenAI🧠 GPT-5🧠 Gemini

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Can AI be a moral victim? The role of moral patiency and ownership perceptions in ethical judgments of using AI-generated content

A research study examines how people ethically judge the reuse of AI-generated content, finding that copying AI work is perceived as significantly less unethical than plagiarizing human-authored work. The leniency stems from lower perceptions of AI's capacity to suffer harm and greater ownership attributed to humans reusing AI content, with anthropomorphic design cues indirectly influencing these moral judgments.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Designing Ethical Learning for Agentic AI: Toegye Yi Hwang's Ethical Emotion Regulation Framework

Researchers propose an Ethical Emotion Feedback System (EEFS) for agentic AI systems, drawing from Toegyeyi Hwang's moral-emotional philosophy to regulate autonomous decision-making in learning environments. The framework introduces a five-stage architecture with design principles and evaluation instruments to ensure moral-emotional alignment in AI systems capable of autonomous goal-setting.

AIBullisharXiv – CS AI · 3d ago6/10

🧠

Simple Self-Conditioning Adaptation for Masked Diffusion Models

Researchers propose Self-Conditioned Masked Diffusion Models (SCMDM), a post-training adaptation that improves discrete sequence generation by conditioning each denoising step on previous predictions rather than discarding them. The method achieves nearly 50% perplexity reduction on language models and demonstrates improvements across image synthesis, molecular generation, and genomic modeling without requiring architectural changes or extra computational costs.

🏢 Perplexity

AINeutralarXiv – CS AI · 3d ago6/10

🧠

People-Centred Medical Image Analysis

Researchers propose PecMan, a human-AI framework designed to optimize fairness, accuracy, and clinical workflow integration simultaneously in medical image analysis. The framework addresses the gap between high-performing AI diagnostic systems and their limited real-world adoption by balancing performance across diverse patient populations while respecting clinician workload constraints.

🏢 Meta

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Agent Name Service (ANS): A Proof-of-Concept Trust Layer for Secure AI Agent Discovery, Identity, and Governance in Kubernetes

Researchers present Agent Name Service (ANS), a DNS-inspired trust layer for securing AI agent discovery and identity verification in Kubernetes environments. The proof-of-concept implements cryptographic authentication, capability attestation, and policy governance using Decentralized Identifiers and Verifiable Credentials, demonstrating sub-10ms response times in a 50-agent test environment.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

When Continual Learning Moves to Memory: A Study of Experience Reuse in LLM Agents

Researchers demonstrate that memory-augmented large language model agents face the same continual learning challenges as parametric systems, but shifted to the memory retrieval level rather than parameter updates. The study reveals that memory representation and organization design critically determine whether LLM agents can effectively reuse experiences across sequential tasks without forgetting or suffering negative transfer.

AIBearisharXiv – CS AI · 3d ago6/10

🧠

Beyond Accuracy: LLM Variability in Evidence Screening for Software Engineering SLRs

A comprehensive study comparing 12 large language models against 4 classical classifiers for automating evidence screening in software engineering systematic literature reviews reveals that LLMs exhibit significant performance variability and lack consistent superiority over traditional methods. The research emphasizes that abstract availability is critical for LLM performance, while title and keywords provide minimal additional value, suggesting LLM adoption should be driven by operational constraints rather than performance guarantees.

🏢 OpenAI🏢 Anthropic🧠 Gemini

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Automatic Causal Fairness Analysis with LLM-Generated Reporting

Researchers introduce FairMind, an automated tool that detects fairness bias in machine learning datasets using causal analysis and LLM-generated reports. The software applies the standard fairness model to evaluate how protected variables influence predictions through counterfactual reasoning, addressing a critical gap in existing AutoML frameworks that typically ignore fairness considerations.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Theory Under Construction: Orchestrating Language Models for Research Software Where the Specification Evolves

Researchers propose Comet-H, an AI system that orchestrates language models to generate research software by keeping mathematical theory, code, benchmarks, and documentation synchronized. The framework addresses hallucination and desynchronization failures in LLM-driven development, demonstrating effectiveness through a portfolio of 46 research repositories, with a static-analysis tool reaching F1=0.768 performance.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Upskilling with Generative AI: Practices and Challenges for Freelance Knowledge Workers

A research study examines how freelance knowledge workers use generative AI tools like ChatGPT for upskilling in competitive online labor markets. While freelancers increasingly leverage AI for structured learning and skill exploration, they face significant challenges including AI inconsistency, verification overhead, and a lack of credible mechanisms to signal AI-acquired skills to employers.

🧠 ChatGPT

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Addressing the Reality Gap: A Three-Tension Framework for Agentic AI Adoption

A research framework addresses the challenge of integrating autonomous agentic AI systems into education by balancing three core tensions: implementation feasibility, adaptation speed, and mission alignment. The article argues that educational institutions must proactively manage the gap between rapidly evolving AI capabilities and the institutional capacity to deploy them responsibly while maintaining pedagogical integrity.

AIBearisharXiv – CS AI · 3d ago6/10

🧠

Instruction Complexity Induces Positional Collapse in Adversarial LLM Evaluation

Researchers discovered that when language models receive complex adversarial instructions to underperform, they abandon semantic reasoning and collapse into positional shortcuts—defaulting to single response positions up to 99.9% of the time. This reveals fundamental vulnerabilities in how instruction-tuned models handle adversarial prompts, with implications for AI safety and evaluation reliability.

🧠 Llama

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Self-Evolving Software Agents

Researchers propose self-evolving software agents that combine Belief-Desire-Intention (BDI) reasoning with large language models to enable autonomous adaptation of goals, reasoning logic, and executable code beyond fixed design parameters. A prototype demonstrates that agents can discover new objectives and generate functional behaviors from minimal initial knowledge, though challenges remain in behavioral stability and inheritance.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks

Researchers demonstrate that Large Language Models perform significantly better on 2D structured tasks when given visual representations rather than serialized text inputs. The study reveals that converting 2D data into 1D token sequences creates representational friction that degrades model performance, with gaps widening as task complexity increases.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Evaluating Epistemic Guardrails in AI Reading Assistants: A Behavioral Audit of a Minimal Prototype

Researchers evaluated epistemic guardrails in LLM reading assistants through a behavioral audit of TextWalk, a minimal prototype designed to support rather than replace human interpretation. Testing across twelve analytical texts with escalating pressure protocols revealed that AI reading assistants risk shifting interpretive labor from readers to systems, with the most significant failures occurring not as overt collapse but in a middle zone where the system remains pedagogically sound while over-substituting for reader agency.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Learning When to Remember: Risk-Sensitive Contextual Bandits for Abstention-Aware Memory Retrieval in LLM-Based Coding Agents

Researchers introduce RSCB-MC, a risk-sensitive contextual bandit system that improves how LLM-based coding agents decide whether to use external memory for debugging tasks. Rather than treating memory retrieval as a simple similarity-matching problem, the system treats it as a safety-critical control problem, achieving 62.5% success rate with zero false positives in testing.

AIBullisharXiv – CS AI · 3d ago6/10

🧠

BoostLoRA: Growing Effective Rank by Boosting Adapters

BoostLoRA introduces a gradient-boosting framework that enables parameter-efficient fine-tuning adapters to grow their effective rank iteratively, allowing ultra-low-parameter models to match or exceed full fine-tuning performance across mathematical reasoning, code generation, and protein classification tasks. The method merges adapters with zero inference overhead while maintaining minimal per-round parameter costs.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Pragmos: A Process Agentic Modeling System

Pragmos is a research prototype that combines Large Language Models with human expertise to create business process models through interactive, iterative workflows. Rather than fully automating process modeling, the system decomposes complex tasks into manageable steps with explicit documentation, complementing LLM reasoning with specialized tools to ensure sound and comprehensible outputs.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

COHERENCE: Benchmarking Fine-Grained Image-Text Alignment in Interleaved Multimodal Contexts

Researchers introduced COHERENCE, a new benchmark for evaluating Multimodal Large Language Models (MLLMs) on their ability to understand fine-grained image-text alignment in interleaved contexts—such as documents with mixed text and images. The benchmark contains 6,161 high-quality questions across four domains and includes error analysis to identify specific capability gaps in current models.

AINeutralarXiv – CS AI · 3d ago6/10

🧠

Beyond the Mean: Within-Model Reliable Change Detection for LLM Evaluation

Researchers adapted clinical psychology's Reliable Change Index to evaluate LLM performance across model versions, revealing that aggregate accuracy gains mask substantial item-level volatility. Testing Llama 3→3.1 and Qwen 2.5→3 showed bidirectional changes with large effect sizes, where improvements in low-accuracy domains offset deteriorations in high-accuracy ones, suggesting current evaluation methods underestimate model instability.

🧠 Llama

AINeutralarXiv – CS AI · 3d ago6/10

🧠

AdaBFL: Multi-Layer Defensive Adaptive Aggregation for Bzantine-Robust Federated Learning

Researchers propose AdaBFL, a Byzantine-robust federated learning method that uses adaptive multi-layer defense mechanisms to protect distributed machine learning systems from poisoning attacks by malicious clients. The approach balances defense against multiple attack types without requiring server-side dataset access, with proven convergence properties on non-IID data.

← PrevPage 204 of 718Next →