🧠

AI

11,675 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

11675 articles

AIBullisharXiv – CS AI · Mar 56/10

🧠

LiteVLA-Edge: Quantized On-Device Multimodal Control for Embedded Robotics

Researchers developed LiteVLA-Edge, a deployment-oriented Vision-Language-Action model pipeline that enables fully on-device inference on embedded robotics hardware like Jetson Orin. The system achieves 150.5ms latency (6.6Hz) through FP32 fine-tuning combined with 4-bit quantization and GPU-accelerated inference, operating entirely offline within a ROS 2 framework.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Bridging the Reproducibility Divide: Open Source Software's Role in Standardizing Healthcare AI

A study reveals that 74% of healthcare AI research papers still use private datasets or don't share code, creating reproducibility issues that undermine trust in medical AI applications. Papers that embrace open practices by sharing both public datasets and code receive 110% more citations on average, demonstrating clear benefits for scientific impact.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Not All Candidates are Created Equal: A Heterogeneity-Aware Approach to Pre-ranking in Recommender Systems

Researchers developed HAP (Heterogeneity-Aware Adaptive Pre-ranking), a new framework for recommender systems that addresses gradient conflicts in training by separating easy and hard samples. The system has been deployed in Toutiao's production environment for 9 months, achieving 0.4% improvement in user engagement without additional computational costs.

AIBearisharXiv – CS AI · Mar 57/10

🧠

Image-based Prompt Injection: Hijacking Multimodal LLMs through Visually Embedded Adversarial Instructions

Researchers have developed Image-based Prompt Injection (IPI), a black-box attack that embeds adversarial instructions into natural images to manipulate multimodal AI models. Testing on GPT-4-turbo achieved up to 64% attack success rate, demonstrating a significant security vulnerability in vision-language AI systems.

🧠 GPT-4

AIBullisharXiv – CS AI · Mar 57/10

🧠

Perfect score on IPhO 2025 theory by Gemini agent

Google's Gemini 3.1 Pro Preview achieved a perfect score on IPhO 2025 theory problems across five runs, surpassing previous AI performance that fell behind top human contestants. However, the researchers acknowledge potential data contamination since the model was released after the competition.

🧠 Gemini

AIBullisharXiv – CS AI · Mar 56/10

🧠

Non-Invasive Reconstruction of Intracranial EEG Across the Deep Temporal Lobe from Scalp EEG based on Conditional Normalizing Flow

Researchers developed NeuroFlowNet, a novel AI framework using Conditional Normalizing Flow to reconstruct deep brain EEG signals from non-invasive scalp measurements. This breakthrough enables analysis of deep temporal lobe brain activity without requiring invasive electrode implantation, potentially transforming neuroscience research and clinical diagnosis.

AIBullisharXiv – CS AI · Mar 56/10

🧠

CubeComposer: Spatio-Temporal Autoregressive 4K 360{\deg} Video Generation from Perspective Video

CubeComposer is a new AI model that generates high-quality 4K 360-degree panoramic videos from regular perspective videos using a novel spatio-temporal autoregressive diffusion approach. The technology addresses computational limitations of existing methods by decomposing videos into cubemap representations, enabling native 4K resolution output for VR applications.

AINeutralarXiv – CS AI · Mar 57/10

🧠

When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies

Researchers analyzed 9,705 AI incident reports to create an expanded taxonomy of real-world AI risk mitigation strategies, identifying four new categories of responses including corrective actions, legal enforcement, financial controls, and avoidance tactics. The study expands existing mitigation frameworks by 67% and provides structured guidance for preventing cascading AI system failures in high-stakes deployments.

AINeutralarXiv – CS AI · Mar 57/10

🧠

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

Researchers introduce SWE-CI, a new benchmark that evaluates AI agents' ability to maintain codebases over time through continuous integration processes. Unlike existing static bug-fixing benchmarks, SWE-CI tests agents across 100 long-term tasks spanning an average of 233 days and 71 commits each.

AIBearisharXiv – CS AI · Mar 57/10

🧠

When Shallow Wins: Silent Failures and the Depth-Accuracy Paradox in Latent Reasoning

Research reveals that state-of-the-art AI mathematical reasoning models like Qwen2.5-Math-7B achieve 61% accuracy primarily through unreliable computational pathways, with only 18.4% using stable reasoning. The study exposes that 81.6% of correct predictions come from inconsistent methods and 8.8% are confident but incorrect outputs.

AIBullisharXiv – CS AI · Mar 56/10

🧠

Ethical and Explainable AI in Reusable MLOps Pipelines

Researchers developed a unified MLOps framework that integrates ethical AI principles, reducing demographic bias from 0.31 to 0.04 while maintaining predictive accuracy. The system automatically blocks deployments and triggers retraining based on fairness metrics, demonstrating practical implementation of ethical AI in production environments.

AIBullisharXiv – CS AI · Mar 57/10

🧠

AutoHarness: improving LLM agents by automatically synthesizing a code harness

Researchers developed AutoHarness, a technique where smaller LLMs like Gemini-2.5-Flash can automatically generate code harnesses to prevent illegal moves in games, outperforming larger models like Gemini-2.5-Pro and GPT-5.2-High. The method eliminates 78% of failures attributed to illegal moves in chess competitions and demonstrates superior performance across 145 different games.

🧠 Gemini

AIBullisharXiv – CS AI · Mar 56/10

🧠

Controllable and explainable personality sliders for LLMs at inference time

Researchers propose Sequential Adaptive Steering (SAS), a new framework for controlling Large Language Model personalities at inference time without retraining. The method uses orthogonalized steering vectors to enable precise, multi-dimensional personality control by adjusting coefficients, validated on Big Five personality traits.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Cross-Modal Mapping and Dual-Branch Reconstruction for 2D-3D Multimodal Industrial Anomaly Detection

Researchers have developed CMDR-IAD, a new AI framework for industrial anomaly detection that combines 2D and 3D data analysis without requiring memory banks. The system achieves state-of-the-art performance with 97.3% accuracy on standard benchmarks and demonstrates robust performance in real-world industrial applications.

AINeutralarXiv – CS AI · Mar 56/10

🧠

Order Is Not Layout: Order-to-Space Bias in Image Generation

Researchers have identified Order-to-Space Bias (OTS) in modern image generation models, where the order entities are mentioned in text prompts incorrectly determines spatial layout and role assignments. The study introduces OTS-Bench to measure this bias and demonstrates that targeted fine-tuning and early-stage interventions can reduce the problem while maintaining generation quality.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Controlling Chat Style in Language Models via Single-Direction Editing

Researchers developed a training-free method to control stylistic attributes in large language models by identifying that different styles are encoded as linear directions in the model's activation space. The approach enables precise style control while preserving core capabilities and supports linear style composition across over a dozen tested models.

AIBullisharXiv – CS AI · Mar 56/10

🧠

DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following

Researchers introduce DIALEVAL, a new automated framework that uses dual LLM agents to evaluate how well AI models follow instructions. The system achieves 90.38% accuracy by breaking down instructions into verifiable components and applying type-specific evaluation criteria, showing 26.45% error reduction over existing methods.

AIBullisharXiv – CS AI · Mar 56/10

🧠

PulseLM: A Foundation Dataset and Benchmark for PPG-Text Learning

Researchers introduced PulseLM, a large-scale dataset combining PPG cardiovascular sensor data with natural language processing for multimodal AI models. The dataset contains 1.31 million PPG segments with 3.15 million question-answer pairs, designed to enable language-based physiological reasoning in healthcare AI applications.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Certainty robustness: Evaluating LLM stability under self-challenging prompts

Researchers introduce the Certainty Robustness Benchmark, a new evaluation framework that tests how large language models handle challenges to their responses in interactive settings. The study reveals significant differences in how AI models balance confidence and adaptability when faced with prompts like "Are you sure?" or "You are wrong!", identifying a critical new dimension for AI evaluation.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery

Researchers have developed DBench-Bio, a dynamic benchmark system that automatically evaluates AI's ability to discover new biological knowledge using a three-stage pipeline of data acquisition, question-answer extraction, and quality filtering. The benchmark addresses the critical problem of data contamination in static datasets and provides monthly updates across 12 biomedical domains, revealing current limitations in state-of-the-art AI models' knowledge discovery capabilities.

AIBullisharXiv – CS AI · Mar 56/10

🧠

Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model

Researchers successfully developed Bielik-Q2-Sharp, the first systematic evaluation of extreme 2-bit quantization for Polish language models, achieving near-baseline performance while significantly reducing model size. The study compared six quantization methods on an 11B parameter model, with the best variant maintaining 71.92% benchmark performance versus 72.07% baseline at just 3.26 GB.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Why Do Unlearnable Examples Work: A Novel Perspective of Mutual Information

Researchers propose a new method called Mutual Information Unlearnable Examples (MI-UE) to protect data privacy by preventing unauthorized AI models from learning from scraped data. The approach uses mutual information theory to create more effective data poisoning techniques that impede deep learning model generalization.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement

Researchers introduce DCR (Discernment via Contrastive Refinement), a new method to reduce over-refusal in safety-aligned large language models. The approach helps LLMs better distinguish between genuinely toxic and seemingly toxic prompts, maintaining safety while improving helpfulness without degrading general capabilities.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Quantum-Inspired Self-Attention in a Large Language Model

Researchers developed a quantum-inspired self-attention (QISA) mechanism and integrated it into GPT-1's language modeling pipeline, marking the first such integration in autoregressive language models. The QISA mechanism demonstrated significant performance improvements over standard self-attention, achieving 15.5x better character error rate and 13x better cross-entropy loss with only 2.6x longer inference time.

AIBullisharXiv – CS AI · Mar 56/10

🧠

Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO

Researchers propose CoIPO (Contrastive Learning-based Inverse Direct Preference Optimization), a new method to improve Large Language Model robustness against noisy or imperfect user prompts. The approach enhances LLMs' intrinsic ability to handle prompt variations without relying on external preprocessing tools, showing significant accuracy improvements on benchmark tests.

← PrevPage 62 of 467Next →