AI Pulse News

Models, papers, tools. 39,848 articles with AI-powered sentiment analysis and key takeaways.

39848 articles

AINeutralarXiv – CS AI · Jun 96/10

🧠

Sample-Efficient Post-Training for LEGO Spatial-Physics Reasoning

Researchers propose PVPO, a sample-efficient reinforcement learning method that improves LLM-based LEGO assembly generation by addressing PhysHack, a failure mode where structures satisfy physical constraints but lack semantic or geometric coherence. The approach uses selective data training and couples physical feasibility with geometric rewards, achieving better structural alignment while reducing reliance on rejection sampling.

AIBullisharXiv – CS AI · Jun 96/10

🧠

MetaEvo: A Meta-Optimization Framework for Experience-Driven Agent Evolution

MetaEvo is a new framework that enables large language model-based agents to continuously improve through task experience by focusing on learning mechanisms rather than just memory storage. The two-stage approach combines preference-based optimization with modular architecture to help AI agents develop abstract principles and enhance reasoning capabilities over time.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Contribution Weights: A Geometrical Analysis of Self-Attention Transformers

Researchers introduce Contribution Weights, a new metric for analyzing transformer attention that accounts for value vector geometry alongside attention weights. The approach more accurately identifies semantically critical tokens than traditional attention-based metrics and reveals that attention sinks actively suppress information rather than passively storing excess attention.

AINeutralarXiv – CS AI · Jun 96/10

🧠

SRT: Super-Resolution for Time Series via Disentangled Rectified Flow

Researchers introduce SRT (Super-Resolution for Time Series), a novel AI framework using disentangled rectified flow to reconstruct high-resolution temporal data from low-resolution inputs. The method decomposes time series into trend and seasonal components, employs implicit neural representations, and includes a cross-resolution attention mechanism, with a scaled pre-trained version (SRT-large) demonstrating strong zero-shot capabilities across multiple datasets.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Subtitle-Aligned Fine-Tuning of Whisper for Swiss German ASR: Benchmark Contamination, Convention Mismatch, and an Honest Baseline at 25.6% WER (13.8% cWER)

Researchers present a rigorous study of fine-tuning OpenAI's Whisper model for Swiss German speech recognition, achieving 25.6% WER with honest evaluation on disjoint test data. The work exposes significant benchmark contamination in published Swiss German ASR results, revealing that previous state-of-the-art claims were inflated by models memorizing test sets rather than genuinely understanding dialect.

🏢 OpenAI🏢 Nvidia

AIBullisharXiv – CS AI · Jun 96/10

🧠

LEAF: Growing Trees Without Branching for Speech-Aware Large Language Model Post-Training

LEAF (Low-rank Exploration with Adaptive Forking) introduces a novel tree-based reinforcement learning method for training speech-aware large language models that improves credit assignment by identifying shared response prefixes and assigning rewards at the span level rather than uniformly across tokens. The approach achieves superior performance compared to existing GRPO-style methods without requiring additional computational overhead, enabling smaller models to match or exceed larger baselines.

AINeutralarXiv – CS AI · Jun 95/10

🧠

MIRAGE: Metadata-Integrated Repository Analysis and Guided Enhancement for MSR Datasets

MIRAGE is a metadata-enriched framework for analyzing Mining Software Repositories (MSR) datasets from 2013-2024, incorporating FAIRness assessments and topic modeling to improve dataset discoverability and reusability. The research demonstrates that repository hosting sites and data formats significantly influence citation patterns and dataset utility in software engineering research.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Structured Neuron Pruning in Deep Neural Networks Using Multi-Armed Bandits

Researchers present a novel structured pruning framework that uses multi-armed bandit algorithms to remove redundant neurons from deep neural networks. The approach treats each neuron as a bandit arm, testing its importance through temporary masking and loss measurement, then applies various MAB policies (UCB1, Thompson Sampling, etc.) to identify which neurons to prune. Experiments across tabular and deep learning tasks show MAB-based pruning significantly outperforms traditional magnitude-based and greedy pruning methods.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

Query Lens extends the Logit Lens technique to improve the interpretability of sparse autoencoders by analyzing both encoder key features and decoder value features, while accounting for indirect downstream effects. The research introduces the Subspace Channel Hypothesis, suggesting that neural modules process features through layer-specific subspaces, advancing understanding of how AI models process and manipulate information.

AINeutralarXiv – CS AI · Jun 95/10

🧠

HASA: Subnet Allocation for Compute-Constrained Model-Heterogeneous Federated Learning

Researchers propose HASA, a subnet allocation algorithm for federated learning that assigns model sizes to edge devices based on data heterogeneity rather than just compute constraints. The method improves prediction accuracy across distributed clients while maintaining fixed computational budgets, with implications for efficient on-device AI deployment.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Eyes All Around: Design and Analysis of 360-Degree LiDAR Perception Using Equivariant Feature Learning in Unstructured Traffic

Researchers present a 360-degree LiDAR perception system for autonomous driving that uses rotation equivariant feature learning to handle dense, unstructured urban traffic. Tested on a custom dataset from Indian urban environments, the system achieves strong performance on larger vehicles but struggles with smaller, more variable road users like pedestrians and motorcyclists.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Large Language Models Should Learn Personalized Rather Than Aggregated Human Preferences

A position paper argues that large language models should optimize for individual user preferences rather than aggregated 'average user' preferences, which masks critical information about preference diversity and values. The authors propose bounded personalization frameworks that balance individual autonomy with universal safety constraints, while addressing scalability and manipulation risks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Active Learning with Foundation Model Priors: Efficient Learning under Class Imbalance

Researchers propose an active learning framework that combines foundation model priors with smaller models to address class imbalance and label noise in real-world datasets. The method achieves over 50% annotation savings compared to existing active learning baselines while maintaining model performance across image and text domains.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Trait-space Monitoring for Emergent Misalignment During Supervised Finetuning

Researchers have developed a method to detect emergent misalignment in large language models during finetuning by monitoring internal representational shifts rather than relying solely on behavioral evaluation. The technique identifies dangerous model behavior through a low-dimensional geometric signature in activation space, achieving high detection accuracy with minimal computational overhead.

AINeutralarXiv – CS AI · Jun 96/10

🧠

AMN: An Adaptive Multi-Scale Fusion Network with Boundary and Uncertainty Modeling for Nuclei Segmentation

Researchers introduce AMN, an advanced nuclei segmentation network combining Swin Transformer and ResNet-50 encoders for improved histopathology image analysis. The model achieves state-of-the-art performance on the CoNIC benchmark, outperforming eight existing architectures while demonstrating strong cross-dataset generalization capabilities.

AINeutralarXiv – CS AI · Jun 96/10

🧠

NeuroAlign: Hierarchical Multimodal Fusion of Dynamic and Structural Neuroimaging for MCI Analysis

NeuroAlign presents a hierarchical machine learning framework that fuses functional MRI and diffusion tensor imaging data to improve detection of mild cognitive impairment. The system introduces novel alignment and interaction mechanisms between multimodal neuroimaging datasets, with a new attribution method for interpretability, demonstrating competitive results across multiple medical imaging datasets.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Anchor-Conditioned Compositional Control for Landscape Image Generation

Researchers present a new framework for improving compositional control in AI-generated landscape images by anchoring diffusion models with four-dimensional compositional vectors extracted from training data. The approach achieves superior performance in horizon detection and rule-of-thirds alignment, demonstrating that compositional precision improves when training on homogeneous scene categories rather than mixed datasets.

AIBullisharXiv – CS AI · Jun 96/10

🧠

MOSS-Video-Preview: Toward Real-Time Video Understanding via Cross-Attention

Researchers introduce MOSS-Video-Preview, a cross-attention architecture enabling real-time video understanding where models process frames continuously and revise answers as new information arrives. The approach achieves 5x speedup in time-to-first-token and 2.7x higher decoding throughput compared to decoder-only models, while maintaining competitive offline performance.

AINeutralarXiv – CS AI · Jun 96/10

🧠

No Free Lunch for Synthetic Images under Data Scarcity Conditions

Researchers evaluated trade-offs between fidelity, privacy, and utility in synthetic image generation across VAE, GAN, and DDPM models under data scarcity conditions. The study reveals that GANs and DDPMs maintain performance better than VAEs when differential privacy mechanisms are applied, suggesting no single generative model excels across all three dimensions simultaneously.

AINeutralarXiv – CS AI · Jun 96/10

🧠

AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs

Researchers introduce AVI-Bench, a comprehensive benchmark for evaluating audio-visual intelligence in multimodal large language models across perception, understanding, and reasoning tasks. The study reveals significant limitations in current models and proposes a taxonomy to guide development of more robust audio-visual AI systems.

AINeutralarXiv – CS AI · Jun 96/10

🧠

DOME: Learning Transferable Domain Variables from Sparse Supervision for Test-Time Adaptation

Researchers introduce DOME, a domain encoder that improves test-time adaptation by explicitly modeling sample-specific domain shifts rather than inferring a single global distribution. The method leverages vision-language pretraining and sparse domain banks to achieve state-of-the-art performance on multiple benchmarks, suggesting that structured domain representation outweighs algorithmic complexity.

AINeutralarXiv – CS AI · Jun 96/10

🧠

AQIFormer: A Transformer-Based Multi-View Architecture for Cross-City Air Quality Classification

Researchers have developed AQIFormer, a transformer-based AI system that estimates air quality from traffic camera imagery combined with weather data. The model achieves 89.96% accuracy on training data and maintains strong cross-city generalization with 81.67% accuracy on independent Indian datasets, significantly outperforming existing methods.

AINeutralarXiv – CS AI · Jun 96/10

🧠

ViMax: Agentic Video Generation

ViMax introduces an agentic multi-agent framework for long-form video generation that maintains narrative coherence and visual consistency across extended scenes. The system uses hierarchical narrative planning, retrieval-augmented generation, and VLM-guided agents to coordinate specialized components that negotiate storytelling decisions while tracking character and environmental states.

AINeutralarXiv – CS AI · Jun 96/10

🧠

A Dataset for Dynamic Human Preferences for Vision Language Models

Researchers introduce a new benchmark dataset for evaluating how Vision Language Models adapt to dynamic, user-specific preferences provided at inference time rather than learned from training data. The work addresses a gap in VLM evaluation by testing real-time preference adaptation across multiple users, moving beyond static capability assessments.

AINeutralarXiv – CS AI · Jun 96/10

🧠

MM-Matryoshka: Towards Budget-Elastic Visual Document Retrieval via a 2D Multimodal Matryoshka Training Framework

Researchers introduce MM-Matryoshka, a training framework that enables visual document retrievers to dynamically adjust computational and storage costs without requiring multiple models. The approach allows Vision-Language Models to optimize along two dimensions—vector width and encoder depth—while maintaining retrieval quality, addressing a key efficiency challenge in multimodal AI systems.

← PrevPage 515 of 1594Next →