🧠

AI

12,986 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

12986 articles

AINeutralarXiv – CS AI · Mar 37/107

🧠

EraseAnything++: Enabling Concept Erasure in Rectified Flow Transformers Leveraging Multi-Object Optimization

Researchers introduced EraseAnything++, a new framework for removing unwanted concepts from advanced AI image and video generation models like Stable Diffusion v3 and Flux. The method uses multi-objective optimization to balance concept removal while preserving overall generative quality, showing superior performance compared to existing approaches.

AIBearisharXiv – CS AI · Mar 37/108

🧠

VidDoS: Universal Denial-of-Service Attack on Video-based Large Language Models

Researchers have discovered VidDoS, a new universal attack framework that can severely degrade Video-based Large Language Models by causing extreme computational resource exhaustion. The attack increases token generation by over 205x and inference latency by more than 15x, creating critical safety risks in real-world applications like autonomous driving.

AIBullisharXiv – CS AI · Mar 36/1010

🧠

ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models

Researchers propose ClinCoT, a new framework for medical AI that improves Visual Language Models by grounding reasoning in specific visual regions rather than just text. The approach reduces factual hallucinations in medical AI systems by using visual chain-of-thought reasoning with clinically relevant image regions.

AIBullisharXiv – CS AI · Mar 37/109

🧠

SimAB: Simulating A/B Tests with Persona-Conditioned AI Agents for Rapid Design Evaluation

SimAB is a new system that uses persona-conditioned AI agents to simulate A/B tests for rapid design evaluation without requiring real user traffic. The system achieved 67% overall accuracy against 47 historical A/B tests, rising to 83% for high-confidence cases, potentially transforming how companies validate design decisions.

AIBullisharXiv – CS AI · Mar 36/107

🧠

Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization

Researchers developed a new mathematical framework called Curvature-Weighted Capacity Allocation that optimizes large language model performance by identifying which layers contribute most to loss reduction. The method uses the Minimum Description Length principle to make principled decisions about layer pruning and capacity allocation under hardware constraints.

$NEAR

AIBearisharXiv – CS AI · Mar 36/109

🧠

Prompt Sensitivity and Answer Consistency of Small Open-Source Large Language Models on Clinical Question Answering: Implications for Low-Resource Healthcare Deployment

Research evaluated five small open-source language models on clinical question answering, finding that high consistency doesn't guarantee accuracy - models can be reliably wrong. Llama 3.2 showed the best balance of accuracy and reliability, while roleplay prompts consistently reduced performance across all models.

$NEAR

AIBearisharXiv – CS AI · Mar 36/106

🧠

Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact

Research reveals that leading foundation models (LLMs) perform poorly on real-world educational tasks despite excelling on AI benchmarks. The study found that 50% of misalignment errors are shared across models due to common pretraining approaches, with model ensembles actually worsening performance on learning outcomes.

AIBullisharXiv – CS AI · Mar 36/109

🧠

Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards

Researchers introduced ARC (Adaptive Rewarding by self-Confidence), a new framework for improving text-to-image generation models through self-confidence signals rather than external rewards. The method uses internal self-denoising probes to evaluate model accuracy and converts this into scalar rewards for unsupervised optimization, showing improvements in compositional generation and text-image alignment.

AIBearisharXiv – CS AI · Mar 37/107

🧠

Artificial Superintelligence May be Useless: Equilibria in the Economy of Multiple AI Agents

A new research paper analyzes economic equilibria between AI and human agents in trading scenarios, finding that unless agents can at least double their marginal utility from purchases, no trading will occur. The study reveals that more powerful AI agents may contribute zero utility to less capable agents in certain equilibria.

AIBullisharXiv – CS AI · Mar 37/108

🧠

CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning

Researchers introduce CHIMERA, a compact 9K-sample synthetic dataset that enables smaller AI models to achieve reasoning performance comparable to much larger models. The dataset addresses key challenges in training reasoning-capable LLMs through automated generation and cross-validation across 8 scientific disciplines.

AIBullisharXiv – CS AI · Mar 36/105

🧠

AMDS: Attack-Aware Multi-Stage Defense System for Network Intrusion Detection with Two-Stage Adaptive Weight Learning

Researchers developed AMDS, an attack-aware multi-stage defense system for network intrusion detection that uses adaptive weight learning to counter adversarial attacks. The system achieved 94.2% AUC and improved classification accuracy by 4.5 percentage points over existing adversarially trained ensembles by learning attack-specific detection strategies.

$CRV

AINeutralarXiv – CS AI · Mar 37/107

🧠

A Comprehensive Evaluation of LLM Unlearning Robustness under Multi-Turn Interaction

Researchers found that machine unlearning in large language models, which aims to remove specific training data influence, is less effective in interactive settings than previously thought. Knowledge that appears forgotten in static tests can often be recovered through multi-turn conversations and self-correction interactions.

AINeutralarXiv – CS AI · Mar 36/107

🧠

A Gauge Theory of Superposition: Toward a Sheaf-Theoretic Atlas of Neural Representations

Researchers propose a new gauge-theoretic framework for understanding superposition in large language models, replacing traditional single-dictionary approaches with local semantic charts. The method introduces three measurable obstructions to interpretability and demonstrates results on Llama 3.2 3B model with various datasets.

AINeutralarXiv – CS AI · Mar 37/107

🧠

Constitutional Black-Box Monitoring for Scheming in LLM Agents

Researchers developed constitutional black-box monitors to detect scheming behavior in LLM agents using only observable inputs and outputs. The study found that monitors trained on synthetic data can generalize to realistic environments, but performance improvements plateau quickly with simple optimization techniques outperforming complex methods.

AIBullisharXiv – CS AI · Mar 36/109

🧠

QANTIS: A Hardware-Validated Quantum Platform for POMDP Planning and Multi-Target Data Association

QANTIS is a hardware-validated quantum computing platform that demonstrates quadratic improvements in autonomous navigation planning problems and multi-target data association tasks. The research shows successful implementation on IBM quantum hardware, achieving 5.1x amplification of rare observation probabilities while maintaining Bayesian posterior accuracy.

AIBullisharXiv – CS AI · Mar 36/107

🧠

ContextCov: Deriving and Enforcing Executable Constraints from Agent Instruction Files

Researchers have developed ContextCov, a framework that converts passive natural language instructions for AI agents into active, executable guardrails to prevent code violations. The system addresses 'Context Drift' where AI agents deviate from project guidelines, creating automated compliance checks across static code analysis, runtime commands, and architectural validation.

$COMP

AIBullisharXiv – CS AI · Mar 36/107

🧠

Stroke outcome and evolution prediction from CT brain using a spatiotemporal diffusion autoencoder

Researchers developed a spatiotemporal diffusion autoencoder using CT brain images to predict stroke outcomes and evolution. The AI model achieved best-in-class performance for predicting next-day severity and functional outcomes using a dataset of 5,824 CT images from 3,573 patients across two medical centers.

AINeutralarXiv – CS AI · Mar 37/106

🧠

Identifying and Characterising Response in Clinical Trials: Development and Validation of a Machine Learning Approach in Colorectal Cancer

Researchers developed a machine learning approach combining Virtual Twins method with survLIME to identify patient subgroups who respond differently to treatments in clinical trials. The method achieved 0.77 AUC for identifying treatment responders in colorectal cancer trials, finding genetic mutations, metastasis sites, and ethnicity as key response factors.

$CRV

AIBullisharXiv – CS AI · Mar 37/108

🧠

PARCER as an Operational Contract to Reduce Variance, Cost, and Risk in LLM Systems

Researchers propose PARCER, a new framework that acts as an operational contract to address major governance challenges in Large Language Model systems. The framework uses structured YAML configurations to reduce variance, improve cost control, and enhance predictability in LLM operations through seven operational phases and decision hygiene practices.

AIBullisharXiv – CS AI · Mar 36/109

🧠

Wild-Drive: Off-Road Scene Captioning and Path Planning via Robust Multi-modal Routing and Efficient Large Language Model

Researchers introduced Wild-Drive, a framework for autonomous off-road driving that combines scene captioning and path planning using multimodal AI. The system addresses challenges in harsh weather conditions through robust sensor fusion and efficient large language models, outperforming existing methods in degraded sensing conditions.

AIBullisharXiv – CS AI · Mar 36/109

🧠

AWE: Adaptive Agents for Dynamic Web Penetration Testing

Researchers introduced AWE, a memory-augmented multi-agent framework for autonomous web penetration testing that outperforms existing tools on injection vulnerabilities. AWE achieved 87% XSS success and 66.7% blind SQL injection success on benchmark tests, demonstrating superior accuracy and efficiency compared to general-purpose AI penetration testing tools.

AIBullisharXiv – CS AI · Mar 37/106

🧠

MultiPUFFIN: A Multimodal Domain-Constrained Foundation Model for Molecular Property Prediction of Small Molecules

Researchers introduce MultiPUFFIN, a multimodal AI foundation model that predicts molecular properties for drug discovery and materials science. The model combines multiple data types and thermodynamic principles to achieve superior performance while using 2000x fewer training molecules than existing models like ChemBERTa-2.

AIBullisharXiv – CS AI · Mar 37/109

🧠

From Verbatim to Gist: Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video Agents

Researchers have developed MM-Mem, a new pyramidal multimodal memory architecture that enables AI systems to better understand long-horizon videos by mimicking human cognitive memory processes. The system addresses current limitations in multimodal large language models by creating a hierarchical memory structure that progressively distills detailed visual information into high-level semantic understanding.

AIBullisharXiv – CS AI · Mar 37/106

🧠

General Proximal Flow Networks

Researchers introduce General Proximal Flow Networks (GPFNs), a generalization of Bayesian Flow Networks that allows for arbitrary divergence functions instead of fixed Kullback-Leibler divergence. The framework enables iterative generative modeling with improved generation quality when divergence functions are adapted to underlying data geometry.

$LINK

AIBullisharXiv – CS AI · Mar 37/107

🧠

Enhancing Molecular Property Predictions by Learning from Bond Modelling and Interactions

Researchers introduce DeMol, a new dual-graph framework for molecular property prediction that explicitly models both atoms and chemical bonds to achieve superior accuracy. The approach addresses limitations of conventional atom-centric models by incorporating bond-level phenomena like resonance and stereoselectivity, establishing new state-of-the-art results across multiple benchmarks.

$ATOM

← PrevPage 231 of 520Next →