AI Pulse News

Models, papers, tools. 40,082 articles with AI-powered sentiment analysis and key takeaways.

40082 articles

AINeutralarXiv – CS AI · Jun 86/10

🧠

Beyond Skeletons: Learning Animation Directly from Driving Videos with Same2X Training Strategy

DirectAnimator is a new AI framework that generates human animations from static images by learning directly from driving videos, eliminating reliance on potentially error-prone pose estimators. The system introduces a Same2X training strategy that improves cross-identity animation while maintaining computational efficiency and robustness to occlusions.

AINeutralarXiv – CS AI · Jun 86/10

🧠

EASE-TTT: Evidence-Aligned Selective Test-Time Training for Long-Context Question Answering

Researchers present EASE-TTT, a novel framework combining within-context retrieval with test-time adaptation to improve long-context question answering in smaller language models. The method identifies evidence chunks and converts them into soft attention supervision targets, allowing models to focus on relevant information while processing the full context, outperforming existing retrieval-only and generic adaptation baselines.

AIBullisharXiv – CS AI · Jun 86/10

🧠

SpectCount: Spectrotemporal Counting via Synthetic Signals Improves Large Audio Language Models

Researchers propose SpectCount, a synthetic data fine-tuning method that improves large audio language models (LALMs) by generating on-the-fly audio signals to address spectrotemporal perceptual weaknesses. The approach bypasses the bottleneck of scarce annotated audio data and demonstrates performance gains across diverse auditory benchmarks without requiring real-world audio or pretrained generative models.

AINeutralarXiv – CS AI · Jun 86/10

🧠

The Fine-Tuning Trap: Evaluating Negative Transfer and the Role of PEFT in Sub-1B Mathematical Reasoning

Researchers benchmarked five sub-1B language models and discovered that Full Fine-Tuning actively degrades performance on models under 300M parameters, causing accuracy to drop below zero-shot baselines. Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA and DoRA prove necessary for stability, with task-specific strengths that outperform full fine-tuning and sometimes even match in-context learning on the smallest architectures.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Didact: A Cross-Domain Capability Discovery System for Defence

Didact is a prototype system that integrates Australian defence reports, policy documents, and research publications into a unified knowledge graph to help policymakers discover defence capabilities faster. The system uses retrieval-augmented generation (RAG) and natural language conversations to surface fragmented information across heterogeneous sources, with an interactive Evidence Rail for visualizing source relationships.

AIBullisharXiv – CS AI · Jun 86/10

🧠

SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models

Researchers introduce SS-TPT, a new defense mechanism that improves the adversarial robustness of vision-language models like CLIP through intelligent test-time prompt tuning. The method uses stability and suitability scores to filter reliable augmented views, achieving better robustness while maintaining practical inference speeds without the computational slowdown of previous approaches.

AINeutralarXiv – CS AI · Jun 86/10

🧠

When is 3D Worth It? A Resource-Performance Frontier for CNNs and Transformers in Lung CT

Researchers studying lung CT imaging found that 2.5D CNNs provide the best balance of performance, stability, and computational efficiency for cancer screening compared to full 3D models or pure 2D approaches. The study challenges the assumption that 3D models are universally superior for volumetric medical imaging, revealing that 3D CNNs suffer from threshold instability while transformers produce unreliable degenerate predictions.

AINeutralarXiv – CS AI · Jun 86/10

🧠

A Geometric View for Understanding Concept Learning and Neuron Interpretation in Sparse Autoencoders

Researchers propose a mathematical framework for understanding how sparse autoencoders learn and represent concepts, formalizing concept learning as a set-alignment problem and establishing geometric conditions for neuron-level concept representation. The work connects concept learning to formal concept analysis, revealing that neuron interpretation involves complex many-to-many mappings rather than simple one-to-one relationships.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation

Researchers introduce UniSinger, an AI framework that unifies song generation with singing voice conversion by enabling zero-shot speaker cloning and accompaniment co-generation. The system uses a multimodal diffusion transformer with curriculum learning to simultaneously handle vocal timbre control and musical accompaniment, advancing generative music production capabilities.

AINeutralarXiv – CS AI · Jun 85/10

🧠

Phonetic Error Analysis of Raw Waveform Acoustic Models

Researchers achieved state-of-the-art performance on raw waveform acoustic models for phone recognition using CNN-LSTM architectures, with error rates of 13.9%/15.3% on TIMIT benchmarks. Analysis reveals that different phonetic classes benefit differently from model components, and transfer learning from WSJ data improves consonant recognition significantly more than vowels.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Never Seen Before: Benchmarking Genuine Zero-Shot Composed Image Retrieval with Consistent Video-Sourced Datasets

Researchers introduce ZeroSight, a new benchmark for Zero-Shot Composed Image Retrieval that addresses critical flaws in existing datasets by using video-sourced data published after CLIP's training cutoff and proposing SC4CIR, a training-free method that reveals current ZS-CIR performance metrics significantly overestimate actual model capabilities.

AINeutralarXiv – CS AI · Jun 86/10

🧠

TRACE: Trajectory Reasoning through Adaptive Cross-Step Evidence Aggregation for LLM Agents

Researchers introduce TRACE, a monitoring framework designed to detect malicious behavior in autonomous LLM agents by tracking evidence across long sequences of seemingly benign actions. The system achieves 0.713 F1 score and 0.844 recall on benchmark tests, addressing a critical security gap where agents can pursue hidden objectives through temporally distributed steps.

AINeutralarXiv – CS AI · Jun 86/10

🧠

On the Geometry of On-Policy Distillation

Researchers characterize the training dynamics of on-policy distillation (OPD), a technique used to improve large language model reasoning, revealing it operates in a distinct geometric regime compared to supervised fine-tuning and reinforcement learning. The study shows OPD exhibits 'subspace locking,' where cumulative updates rapidly converge to a narrow low-dimensional channel that is functionally sufficient for performance, suggesting OPD has unique training dynamics rather than existing as a simple intermediate between other training approaches.

AINeutralarXiv – CS AI · Jun 85/10

🧠

MetaConfigurator: AI-Assisted RDF Authoring from JSON Data

MetaConfigurator introduces an AI-assisted RDF Authoring View that enables researchers to convert structured JSON, YAML, and CSV data into semantic RDF format through an integrated web interface. The tool bridges conventional data management with Semantic Web technologies, demonstrated using laboratory synthesis experiment data, and includes features like ontology-aware IRI auto-completion and AI-generated SPARQL queries.

AINeutralarXiv – CS AI · Jun 86/10

🧠

GP-Adapter: Gaussian Process CLIP-Adapter for Few-Shot Out-of-Distribution Detection

Researchers introduce GP-Adapter, a training-free framework combining CLIP with Gaussian Process uncertainty modeling to improve few-shot classification and out-of-distribution detection. The approach maintains CLIP's frozen backbone while adding probabilistic inference capabilities, requiring minimal computational overhead and achieving competitive performance on multiple benchmarks.

AINeutralarXiv – CS AI · Jun 86/10

🧠

DIFFRACT: Neuralized Utility Maximization for Wireless Networks by Differentiable Programming

DIFFRACT is a new neuralized framework that combines deep learning with wireless network optimization through differentiable programming, enabling distributed resource management across satellite and terrestrial networks. The approach maps interference management algorithms into neural network architectures, allowing real-time adaptation to dynamic network conditions with scalable utility maximization.

AINeutralarXiv – CS AI · Jun 86/10

🧠

REMEDI: A Benchmark for Retention and Unlearning Evaluation in Multi-label Clinical Disease Inference

Researchers introduce REMEDI, a benchmark for evaluating machine unlearning methods in clinical disease inference using real patient data from MIMIC-III. The study reveals fundamental trade-offs between model utility and data removal effectiveness, with existing unlearning techniques proving poorly suited for multi-label medical classification tasks.

AINeutralarXiv – CS AI · Jun 86/10

🧠

UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding

Researchers introduced UrduMMLU, a 26,431-question benchmark for evaluating large language models on Urdu language understanding across 26 subjects. The evaluation of 30 LLMs revealed significant performance gaps, with Gemini-3.5-Flash achieving 90% accuracy while most models struggle with Urdu-specific and humanities content, highlighting persistent multilingual AI capability disparities.

🧠 Gemini

AINeutralarXiv – CS AI · Jun 86/10

🧠

Textual Supervision Enhances Geospatial Representations in Vision-Language Models

Researchers demonstrate that textual supervision significantly improves how vision-language models understand geospatial information, with language serving as a complementary modality to visual data. The study analyzes geospatial representations across vision-only, vision-language, and multimodal foundation models, revealing systematic gaps in spatial accuracy that can be addressed through improved multimodal learning approaches.

AINeutralarXiv – CS AI · Jun 86/10

🧠

RETROSPECT: RETROsynthesis via Sequential Prediction, and Chemically Transformed-ranking

RETROSPECT introduces a modular retrosynthesis system combining a Transformer-based proposal model with LambdaMART reranking to improve chemical synthesis prediction. The system achieves 55% top-1 accuracy on USPTO-50K benchmarks, demonstrating that decomposing retrosynthesis into proposal generation and learned selection improves both ranking quality and candidate diversity.

AINeutralarXiv – CS AI · Jun 86/10

🧠

An Abstract Architecture for Explainable Autonomy in Hazardous Environments

Researchers present an abstract architecture for building autonomous robotic systems that can explain their decision-making processes to human operators and regulators. The framework addresses the critical need for explainability in autonomous systems deployed in hazardous environments, with a practical application example in nuclear industry operations where trust and regulatory compliance are essential.

AINeutralarXiv – CS AI · Jun 86/10

🧠

DualGate-Net: A Prior-Gated Dual-Encoder Framework for Histopathology Cell Detection

DualGate-Net introduces a prior-gated dual-encoder framework for detecting cells in histopathology images by combining local and global tissue context through an adaptive fusion mechanism. The method achieves improved performance on the OCELOT benchmark, demonstrating that intelligent integration of contextual priors enhances cell detection accuracy in medical imaging applications.

AINeutralarXiv – CS AI · Jun 86/10

🧠

DEFINED: A Data-Efficient Computational Framework for Fine-Grained Creativity Assessment in Debate Scenarios

Researchers introduce DEFINED, a computational framework for assessing creativity in debate using a hierarchical eight-dimensional metric system. The approach combines pre-trained language models with human expert annotations to overcome data scarcity challenges, achieving more accurate scoring than standard LLM evaluators.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Beyond Waypoints: A Trajectory-Centric Waypointing Paradigm for Vision-Language Navigation

Researchers propose a novel Vision-Language Navigation approach that grounds waypoints in executable trajectories rather than predicting isolated navigation points. By using a TSDF-guided diffusion policy, the method ensures predicted waypoints are reachable and maintains consistency between high-level planning and low-level control, demonstrating superior performance on VLN-CE benchmarks.

AINeutralarXiv – CS AI · Jun 85/10

🧠

Acoustic Cue Alignment in Audio Language Models for Speech Emotion Recognition

Researchers demonstrate that instruction-following audio language models can effectively utilize explicit acoustic cues for speech emotion recognition, with aligned acoustic tokens improving performance on standard benchmarks while remaining grounded in the underlying audio signal.

← PrevPage 540 of 1604Next →