#machine-learning News & Analysis

2514 articles tagged with #machine-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2514 articles

AIBullisharXiv – CS AI · Mar 27/1016

🧠

TradeFM: A Generative Foundation Model for Trade-flow and Market Microstructure

Researchers introduced TradeFM, a 524M-parameter generative AI model that learns from billions of trade events across 9,000+ equities to understand market microstructure. The model can generate synthetic market data and generalizes across different markets without asset-specific calibration, potentially enabling new applications in trading and market simulation.

$COMP

AINeutralarXiv – CS AI · Mar 27/1010

🧠

From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning

Researchers propose a dynamic agent-centric benchmarking system for evaluating large language models that replaces static datasets with autonomous agents that generate, validate, and solve problems iteratively. The protocol uses teacher, orchestrator, and student agents to create progressively challenging text anomaly detection tasks that expose reasoning errors missed by conventional benchmarks.

AIBullisharXiv – CS AI · Mar 26/1012

🧠

TRIZ-RAGNER: A Retrieval-Augmented Large Language Model for TRIZ-Aware Named Entity Recognition in Patent-Based Contradiction Mining

Researchers developed TRIZ-RAGNER, a retrieval-augmented large language model framework that improves patent analysis and systematic innovation by extracting technical contradictions from patent documents. The system achieved 84.2% F1-score accuracy, outperforming existing methods by 7.3 percentage points through better integration of domain-specific knowledge.

AIBullisharXiv – CS AI · Mar 27/1012

🧠

FedNSAM:Consistency of Local and Global Flatness for Federated Learning

Researchers propose FedNSAM, a new federated learning algorithm that improves global model performance by addressing the inconsistency between local and global flatness in distributed training environments. The algorithm uses global Nesterov momentum to harmonize local and global optimization, showing superior performance compared to existing FedSAM approaches.

AINeutralarXiv – CS AI · Mar 26/1012

🧠

DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model

Researchers introduce DLEBench, the first benchmark specifically designed to evaluate instruction-based image editing models' ability to edit small-scale objects that occupy only 1%-10% of image area. Testing on 10 models revealed significant performance gaps in small object editing, highlighting a critical limitation in current AI image editing capabilities.

AIBullisharXiv – CS AI · Mar 26/1013

🧠

3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection

Researchers developed MedMAP, a Medical Modality-Aware Pretraining framework that enhances vision-language models for 3D MRI multi-organ abnormality detection. The framework addresses challenges in modality-specific alignment and cross-modal feature fusion, demonstrating superior performance on a curated dataset of 7,392 3D MRI volume-report pairs.

AIBullisharXiv – CS AI · Mar 26/1013

🧠

FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

Researchers propose FedRot-LoRA, a new framework that solves rotational misalignment issues in federated learning for large language models. The solution uses orthogonal transformations to align client updates before aggregation, improving training stability and performance without increasing communication costs.

AIBullisharXiv – CS AI · Mar 26/109

🧠

ProtoDCS: Towards Robust and Efficient Open-Set Test-Time Adaptation for Vision-Language Models

Researchers propose ProtoDCS, a new framework for robust test-time adaptation of Vision-Language Models in open-set scenarios. The method uses Gaussian Mixture Model verification and uncertainty-aware learning to better handle distribution shifts while maintaining computational efficiency.

AINeutralarXiv – CS AI · Mar 26/1017

🧠

When Does Multimodal Learning Help in Healthcare? A Benchmark on EHR and Chest X-Ray Fusion

Researchers conducted a systematic benchmark study on multimodal fusion between Electronic Health Records (EHR) and chest X-rays for clinical decision support, revealing when and how combining data modalities improves healthcare AI performance. The study found that multimodal fusion helps when data is complete but benefits degrade under realistic missing data scenarios, and released an open-source benchmarking toolkit for reproducible evaluation.

AINeutralarXiv – CS AI · Mar 26/1015

🧠

LFQA-HP-1M: A Large-Scale Human Preference Dataset for Long-Form Question Answering

Researchers released LFQA-HP-1M, a dataset with 1.3 million human preference annotations for evaluating long-form question answering systems. The study introduces nine quality rubrics and shows that simple linear models can match advanced LLM evaluators while exposing vulnerabilities in current evaluation methods.

AINeutralarXiv – CS AI · Mar 27/1017

🧠

Exploring Robust Intrusion Detection: A Benchmark Study of Feature Transferability in IoT Botnet Attack Detection

Researchers conducted a benchmark study on IoT botnet intrusion detection systems, finding that models trained on one network domain suffer significant performance degradation when applied to different environments. The study evaluated three feature sets across four IoT datasets and provided guidelines for improving cross-domain robustness through better feature engineering and algorithm selection.

AIBullisharXiv – CS AI · Mar 26/1013

🧠

Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Models

Researchers propose a new training method called pseudo contrastive learning to improve diagram comprehension in multimodal AI models like CLIP. The approach uses synthetic diagram samples to help models better understand fine-grained structural differences in diagrams, showing significant improvements in flowchart understanding tasks.

AIBullisharXiv – CS AI · Mar 27/1012

🧠

Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents

Researchers introduced Rudder, a software module that uses Large Language Models (LLMs) to optimize data prefetching in distributed Graph Neural Network training. The system shows up to 91% performance improvement over baseline training and 82% over static prefetching by autonomously adapting to dynamic conditions.

AIBearisharXiv – CS AI · Mar 26/1013

🧠

Humans and LLMs Diverge on Probabilistic Inferences

Researchers created ProbCOPA, a dataset testing probabilistic reasoning in humans versus AI models, finding that state-of-the-art LLMs consistently fail to match human judgment patterns. The study reveals fundamental differences in how humans and AI systems process non-deterministic inferences, highlighting limitations in current AI reasoning capabilities.

AINeutralarXiv – CS AI · Mar 27/1017

🧠

Human Supervision as an Information Bottleneck: A Unified Theory of Error Floors in Human-Guided Learning

Researchers propose a unified theory explaining why AI models trained on human feedback exhibit persistent error floors that cannot be eliminated through scaling alone. The study demonstrates that human supervision acts as an information bottleneck due to annotation noise, subjective preferences, and language limitations, requiring auxiliary non-human signals to overcome these structural limitations.

AIBullisharXiv – CS AI · Mar 26/1015

🧠

Hierarchical Multi-Scale Graph Learning with Knowledge-Guided Attention for Whole-Slide Image Survival Analysis

Researchers developed HMKGN, a hierarchical multi-scale graph network for cancer survival prediction using whole-slide images. The AI model outperformed existing methods by 10.85% in concordance indices across four cancer datasets, demonstrating improved accuracy in predicting patient survival outcomes.

AIBullisharXiv – CS AI · Mar 26/1015

🧠

DesignSense: A Human Preference Dataset and Reward Modeling Framework for Graphic Layout Generation

Researchers introduce DesignSense-10k, a dataset of 10,235 human-annotated preference pairs for evaluating graphic layout generation, along with DesignSense, a specialized AI model that outperforms existing models by 54.6% in layout quality assessment. The framework addresses the gap between AI-generated layouts and human aesthetic preferences, showing practical improvements in layout generation through reinforcement learning.

AIBullisharXiv – CS AI · Mar 27/1013

🧠

Brain-OF: An Omnifunctional Foundation Model for fMRI, EEG and MEG

Researchers have developed Brain-OF, the first omnifunctional brain foundation model that can process fMRI, EEG, and MEG data simultaneously within a unified framework. The model introduces novel techniques like Any-Resolution Neural Signal Sampler and Masked Temporal-Frequency Modeling, trained on 40 datasets to achieve superior performance across diverse neuroscience tasks.

AINeutralarXiv – CS AI · Mar 27/1022

🧠

Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control

Researchers developed an offline-to-online reinforcement learning framework that improves robot control robustness through adversarial fine-tuning. The method trains policies on clean datasets then applies action perturbations during fine-tuning to build resilience against actuator faults and environmental uncertainties.

AIBullisharXiv – CS AI · Mar 26/1011

🧠

Evidential Neural Radiance Fields

Researchers introduce Evidential Neural Radiance Fields, a new probabilistic approach that enables uncertainty quantification in 3D scene modeling while maintaining rendering quality. The method addresses critical limitations in existing NeRF technology by capturing both aleatoric and epistemic uncertainty from a single forward pass, making neural radiance fields more suitable for safety-critical applications.

AIBullisharXiv – CS AI · Mar 26/1018

🧠

Reason to Contrast: A Cascaded Multimodal Retrieval Framework

Researchers introduce TTE-v2, a new multimodal retrieval framework that achieves state-of-the-art performance by incorporating reasoning steps during retrieval and reranking. The approach demonstrates that scaling based on reasoning tokens rather than model size can significantly improve performance, with TTE-v2-7B reaching 75.7% accuracy on MMEB-V2 benchmark.

AIBullisharXiv – CS AI · Mar 26/1010

🧠

Uncertainty Quantification for Multimodal Large Language Models with Incoherence-adjusted Semantic Volume

Researchers introduce UMPIRE, a new training-free framework for quantifying uncertainty in Multimodal Large Language Models (MLLMs) across various input and output modalities. The system measures incoherence-adjusted semantic volume of model responses to better detect errors and improve reliability without requiring external tools or additional computational overhead.

AINeutralarXiv – CS AI · Mar 26/1013

🧠

DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science

Researchers introduce DARE-bench, a new benchmark with 6,300 Kaggle-derived tasks for evaluating Large Language Models' performance on data science and machine learning tasks. The benchmark reveals that even advanced models like GPT-4-mini struggle with ML modeling tasks, while fine-tuning on DARE-bench data can improve model accuracy by up to 8x.

AIBullisharXiv – CS AI · Mar 26/1016

🧠

A Minimal Agent for Automated Theorem Proving

Researchers propose a minimal baseline architecture for AI-based theorem proving that achieves competitive performance with state-of-the-art systems while using significantly simpler design. The open-source implementation demonstrates that iterative proof refinement approaches are more sample-efficient and cost-effective than single-shot generation methods.

AIBullisharXiv – CS AI · Mar 27/1011

🧠

Learning Flexible Job Shop Scheduling under Limited Buffers and Material Kitting Constraints

Researchers developed a deep reinforcement learning approach using heterogeneous graph networks to solve Flexible Job Shop Scheduling Problems with limited buffers and material kitting constraints. The method outperforms traditional heuristics by improving buffer utilization and decision quality through better modeling of complex dependencies in production scheduling.

← PrevPage 57 of 101Next →