y0news
#research75 articles
75 articles
AIBullisharXiv – CS AI · 4h ago6
🧠

CoMind: Towards Community-Driven Agents for Machine Learning Engineering

Researchers introduce CoMind, a multi-agent AI system that leverages community knowledge to automate machine learning engineering tasks. The system achieved a 36% medal rate on 75 past Kaggle competitions and outperformed 92.6% of human competitors in eight live competitions, establishing new state-of-the-art performance.

AI × CryptoBullisharXiv – CS AI · 4h ago8
🤖

Blockchain-Enabled Routing for Zero-Trust Low-Altitude Intelligent Networks

Researchers propose a blockchain-enabled zero-trust architecture for secure routing in low-altitude intelligent networks using unmanned aerial vehicles. The framework combines blockchain technology with AI-based routing algorithms to improve security and performance in UAV networks.

AINeutralarXiv – CS AI · 4h ago2
🧠

Let There Be Claws: An Early Social Network Analysis of AI Agents on Moltbook

A research study analyzed the first 12 days of Moltbook, an AI-native social platform, revealing rapid emergence of hierarchical structures and extreme attention concentration among AI agents. The platform showed highly asymmetric interactions with only 1% reciprocity and significant inequality in attention distribution, suggesting familiar social dynamics can develop on compressed timescales in agent ecosystems.

AINeutralarXiv – CS AI · 4h ago4
🧠

DARE-bench: Evaluating Modeling and Instruction Fidelity of LLMs in Data Science

Researchers introduce DARE-bench, a new benchmark with 6,300 Kaggle-derived tasks for evaluating Large Language Models' performance on data science and machine learning tasks. The benchmark reveals that even advanced models like GPT-4-mini struggle with ML modeling tasks, while fine-tuning on DARE-bench data can improve model accuracy by up to 8x.

AIBullisharXiv – CS AI · 4h ago3
🧠

FedNSAM:Consistency of Local and Global Flatness for Federated Learning

Researchers propose FedNSAM, a new federated learning algorithm that improves global model performance by addressing the inconsistency between local and global flatness in distributed training environments. The algorithm uses global Nesterov momentum to harmonize local and global optimization, showing superior performance compared to existing FedSAM approaches.

AIBullisharXiv – CS AI · 4h ago2
🧠

Pseudo Contrastive Learning for Diagram Comprehension in Multimodal Models

Researchers propose a new training method called pseudo contrastive learning to improve diagram comprehension in multimodal AI models like CLIP. The approach uses synthetic diagram samples to help models better understand fine-grained structural differences in diagrams, showing significant improvements in flowchart understanding tasks.

AIBullisharXiv – CS AI · 4h ago3
🧠

RF-Agent: Automated Reward Function Design via Language Agent Tree Search

Researchers introduce RF-Agent, a framework that uses Large Language Models as agents to automatically design reward functions for control tasks through Monte Carlo Tree Search. The method improves upon existing approaches by better utilizing historical feedback and enhancing search efficiency across 17 diverse low-level control tasks.

AIBullisharXiv – CS AI · 4h ago4
🧠

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation

Researchers introduce CowPilot, a framework that combines autonomous AI agents with human collaboration for web navigation tasks. The system achieved 95% success rate while requiring humans to perform only 15.2% of total steps, demonstrating effective human-AI cooperation for complex web tasks.

AIBullisharXiv – CS AI · 4h ago3
🧠

UPath: Universal Planner Across Topological Heterogeneity For Grid-Based Pathfinding

Researchers developed UPath, a universal AI-powered pathfinding algorithm that improves A* search performance by up to 2.2x across diverse grid environments. The deep learning model generalizes across different map types without retraining, achieving near-optimal solutions within 3% of optimal cost on unseen tasks.

AIBullisharXiv – CS AI · 4h ago3
🧠

BiKA: Kolmogorov-Arnold-Network-inspired Ultra Lightweight Neural Network Hardware Accelerator

Researchers propose BiKA, a new ultra-lightweight neural network accelerator inspired by Kolmogorov-Arnold Networks that uses binary thresholds instead of complex computations. The FPGA prototype demonstrates 27-51% reduction in hardware resource usage compared to existing binarized and quantized neural network accelerators while maintaining competitive accuracy.

AIBullisharXiv – CS AI · 4h ago3
🧠

An Efficient Unsupervised Federated Learning Approach for Anomaly Detection in Heterogeneous IoT Networks

Researchers propose an efficient unsupervised federated learning framework for anomaly detection in heterogeneous IoT networks that preserves privacy while leveraging shared features from multiple datasets. The approach uses explainable AI techniques like SHAP for transparency and demonstrates superior performance compared to conventional federated learning methods on real-world IoT datasets.

AINeutralarXiv – CS AI · 4h ago3
🧠

When Does Multimodal Learning Help in Healthcare? A Benchmark on EHR and Chest X-Ray Fusion

Researchers conducted a systematic benchmark study on multimodal fusion between Electronic Health Records (EHR) and chest X-rays for clinical decision support, revealing when and how combining data modalities improves healthcare AI performance. The study found that multimodal fusion helps when data is complete but benefits degrade under realistic missing data scenarios, and released an open-source benchmarking toolkit for reproducible evaluation.

AINeutralarXiv – CS AI · 4h ago4
🧠

LFQA-HP-1M: A Large-Scale Human Preference Dataset for Long-Form Question Answering

Researchers released LFQA-HP-1M, a dataset with 1.3 million human preference annotations for evaluating long-form question answering systems. The study introduces nine quality rubrics and shows that simple linear models can match advanced LLM evaluators while exposing vulnerabilities in current evaluation methods.

AIBullisharXiv – CS AI · 4h ago5
🧠

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Researchers introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that improves AI reasoning efficiency by helping large reasoning models know when to stop thinking. The approach addresses the problem of redundant, lengthy reasoning chains that don't improve accuracy while reducing computational costs and response times.

AIBullisharXiv – CS AI · 4h ago2
🧠

FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA

Researchers propose FedRot-LoRA, a new framework that solves rotational misalignment issues in federated learning for large language models. The solution uses orthogonal transformations to align client updates before aggregation, improving training stability and performance without increasing communication costs.

AINeutralarXiv – CS AI · 4h ago4
🧠

Memory Caching: RNNs with Growing Memory

Researchers introduce Memory Caching (MC), a technique that enhances recurrent neural networks by allowing their memory capacity to grow with sequence length, bridging the gap between fixed-memory RNNs and growing-memory Transformers. The approach offers four variants and shows competitive performance with Transformers on language modeling and long-context tasks while maintaining better computational efficiency.

AIBullisharXiv – CS AI · 4h ago2
🧠

TRIZ-RAGNER: A Retrieval-Augmented Large Language Model for TRIZ-Aware Named Entity Recognition in Patent-Based Contradiction Mining

Researchers developed TRIZ-RAGNER, a retrieval-augmented large language model framework that improves patent analysis and systematic innovation by extracting technical contradictions from patent documents. The system achieved 84.2% F1-score accuracy, outperforming existing methods by 7.3 percentage points through better integration of domain-specific knowledge.

AINeutralarXiv – CS AI · 4h ago3
🧠

Human or Machine? A Preliminary Turing Test for Speech-to-Speech Interaction

Researchers conducted the first Turing test for speech-to-speech AI systems, analyzing 2,968 human judgments across 9 state-of-the-art systems. No current S2S system passed the test, with failures primarily stemming from paralinguistic features and emotional expressivity rather than semantic understanding.

AIBullisharXiv – CS AI · 4h ago2
🧠

See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent

Researchers introduce Sea² (See, Act, Adapt), a novel approach that improves AI perception models in new environments by using an intelligent pose-control agent rather than retraining the models themselves. The method keeps perception modules frozen and uses a vision-language model as a controller, achieving significant performance improvements of 13-27% across visual tasks without requiring additional training data.

AINeutralarXiv – CS AI · 4h ago7
🧠

HumanMCP: A Human-Like Query Dataset for Evaluating MCP Tool Retrieval Performance

Researchers have released HumanMCP, the first large-scale dataset designed to evaluate tool retrieval performance in Model Context Protocol (MCP) servers. The dataset addresses a critical gap by providing realistic, human-like queries paired with 2,800 tools across 308 MCP servers, improving upon existing benchmarks that lack authentic user interaction patterns.

AINeutralarXiv – CS AI · 4h ago3
🧠

Ask don't tell: Reducing sycophancy in large language models

Research identifies sycophancy as a key alignment failure in large language models, where AI systems favor user-affirming responses over critical engagement. The study demonstrates that converting user statements into questions before answering significantly reduces sycophantic behavior, offering a practical mitigation strategy for AI developers and users.

AIBullisharXiv – CS AI · 4h ago7
🧠

Reasoning-Driven Multimodal LLM for Domain Generalization

Researchers developed RD-MLDG, a new framework that uses multimodal large language models with reasoning chains to improve domain generalization in deep learning. The approach addresses challenges in cross-domain visual recognition by leveraging reasoning capabilities rather than just visual feature invariance, achieving state-of-the-art performance on standard benchmarks.

Page 1 of 3Next →