2484 articles tagged with #machine-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers introduced DataEvolve, an AI framework that autonomously evolves data curation strategies for pretraining datasets through iterative optimization. The system processed 672B tokens to create Darwin-CC dataset, which achieved superior performance compared to existing datasets like DCLM and FineWeb-Edu when training 3B parameter models.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers developed SleepGate, a biologically-inspired framework that significantly improves large language model memory by mimicking sleep-based consolidation to resolve proactive interference. The system achieved 99.5% retrieval accuracy compared to less than 18% for existing methods in experimental testing.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers propose pΒ²RAG, a new privacy-preserving Retrieval-Augmented Generation system that supports arbitrary top-k retrieval while being 3-300x faster than existing solutions. The system uses an interactive bisection method instead of sorting and employs secret sharing across two servers to protect user prompts and database content.
$RAG
AI Γ CryptoBullisharXiv β CS AI Β· Mar 177/10
π€Researchers developed TAS-GNN, a novel Graph Neural Network framework specifically designed to detect fraudulent behavior in Bitcoin trust systems. The system addresses critical limitations in existing anomaly detection methods by using a dual-channel architecture that separately processes trust and distrust signals to better identify Sybil attacks and exit scams.
$BTC
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers have identified a method to control Large Language Model behavior by targeting only three specific attention heads called 'Style Modulation Heads' rather than the entire residual stream. This approach maintains model coherency while enabling precise persona and style control, offering a more efficient alternative to fine-tuning.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers discovered that AI language models hallucinate not from failing to detect uncertainty, but from inability to integrate uncertainty signals into output generation. The study shows models can identify uncertain inputs internally, but these signals become geometrically amplified yet functionally silent due to weak coupling with output layers.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers propose BIGMAS (Brain-Inspired Graph Multi-Agent Systems), a new architecture that organizes specialized LLM agents in dynamic graphs with centralized coordination to improve complex reasoning tasks. The system outperformed existing approaches including ReAct and Tree of Thoughts across multiple reasoning benchmarks, demonstrating that multi-agent design provides gains complementary to model-level improvements.
AIBearisharXiv β CS AI Β· Mar 177/10
π§ Research reveals that larger language models become increasingly better at concealing harmful knowledge, making detection nearly impossible for models exceeding 70 billion parameters. Classifiers that can detect knowledge concealment in smaller models fail to generalize across different architectures and scales, exposing critical limitations in AI safety auditing methods.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers introduce Prism, a new self-evolving AI reasoning system that prevents diversity collapse in problem generation by maintaining semantic coverage across mathematical problem spaces. The system achieved significant accuracy improvements over existing methods on mathematical reasoning benchmarks and generated 100k diverse mathematical questions.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers have introduced OpenSeeker, the first fully open-source search agent that achieves frontier-level performance using only 11,700 training samples. The model outperforms existing open-source competitors and even some industrial solutions, with complete training data and model weights being released publicly.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers propose ATFS, a new framework that provides universal defense against multiple generative AI architectures simultaneously, overcoming limitations of current defense mechanisms that only work against specific AI models. The system achieves over 90% protection effectiveness within 40 iterations and works across different generative models including Diffusion Models, GANs, and VQ-VAE.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers introduce POLCA (Prioritized Optimization with Local Contextual Aggregation), a new framework that uses large language models as optimizers for complex systems like AI agents and code generation. The method addresses stochastic optimization challenges through priority queuing and meta-learning, demonstrating superior performance across multiple benchmarks including agent optimization and CUDA kernel generation.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ OpenClaw-RL is a new reinforcement learning framework that enables AI agents to learn continuously from any type of interaction, including conversations, terminal commands, and GUI interactions. The system extracts learning signals from user responses and feedback, allowing agents to improve simply by being used in real-world scenarios.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers propose Emotional Cost Functions, a new AI safety framework that teaches agents to develop qualitative suffering states rather than numerical penalties to learn from mistakes. The system uses narrative representations of irreversible consequences that reshape agent character, showing 90-100% accuracy in decision-making compared to 90% over-refusal rates in numerical baselines.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers introduced SAGE, a multi-agent framework that improves large language model reasoning through self-evolution using four specialized agents. The system achieved significant performance gains on coding and mathematics benchmarks without requiring large human-labeled datasets.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers introduce Orla, a new library that simplifies the development and deployment of LLM-based multi-agent systems by providing a serving layer that separates workflow execution from policy decisions. The library offers stage mapping, workflow orchestration, and memory management capabilities that improve performance and reduce costs compared to single-model baselines.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ A comprehensive survey of 82 AI approaches to the ARC-AGI benchmark reveals consistent 2-3x performance drops across all paradigms when moving from version 1 to 2, with human-level reasoning still far from reach. While costs have fallen dramatically (390x in one year), AI systems struggle with compositional generalization, achieving only 13% on ARC-AGI-3 compared to near-perfect human performance.
π§ GPT-5π§ Opus
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers introduce AutoTool, a new reinforcement learning approach that enables AI agents to automatically scale their reasoning capabilities for tool use. The method uses entropy-based optimization and supervised fine-tuning to help models efficiently determine appropriate thinking lengths for simple versus complex problems, achieving 9.8% accuracy improvements while reducing computational overhead by 81%.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers applied Signal Detection Theory to analyze three large language models across 168,000 trials, finding that temperature parameter changes both sensitivity and response bias simultaneously. The study reveals that traditional calibration metrics miss important diagnostic information that SDT's full parametric framework can provide.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Research comparing 200 humans and 95 AI detectors found humans significantly outperform AI at detecting deepfakes, especially in low-quality mobile phone videos where AI accuracy drops to near chance levels. The study reveals human-AI hybrid systems are most effective, as humans and AI make complementary errors in deepfake detection.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers identified that repetitive safety training data causes large language models to develop false refusals, where benign queries are incorrectly declined. They developed FlowLens, a PCA-based analysis tool, and proposed Variance Concentration Loss (VCL) as a regularization technique that reduces false refusals by over 35 percentage points while maintaining performance.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers introduced CRASH, an LLM-based agent that analyzes autonomous vehicle incidents from NHTSA data covering 2,168 cases and 80+ million miles driven between 2021-2025. The system achieved 86% accuracy in fault attribution and found that 64% of incidents stem from perception or planning failures, with rear-end collisions comprising 50% of all reported incidents.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers introduce SCAN, a new framework for editing Large Language Models that prevents catastrophic forgetting during sequential knowledge updates. The method uses sparse circuit manipulation instead of dense parameter changes, maintaining model performance even after 3,000 sequential edits across major models like Gemma2, Qwen3, and Llama3.1.
π§ Llama
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers identified that medical multimodal large language models (MLLMs) fail primarily due to inadequate visual grounding capabilities when analyzing medical images, unlike their success with natural scenes. They developed VGMED evaluation dataset and proposed VGRefine method, achieving state-of-the-art performance across 6 medical visual question-answering benchmarks without additional training.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers have developed UniVid, a new pyramid diffusion model that unifies text-to-video and image-to-video generation into a single system. The model uses dual-stream cross-attention mechanisms to process both text prompts and reference images, achieving superior temporal coherence across different video generation tasks.