Models, papers, tools. 17,693 articles with AI-powered sentiment analysis and key takeaways.
AIBullisharXiv – CS AI · Feb 277/105
🧠Researchers propose Metacognitive Behavioral Tuning (MBT), a new framework that addresses structural fragility in Large Reasoning Models by injecting human-like self-regulatory control into AI thought processes. The approach reduces reasoning collapse and improves accuracy while consuming fewer computational tokens across multi-hop question-answering benchmarks.
AINeutralarXiv – CS AI · Feb 277/108
🧠Researchers propose a mathematical framework distinguishing agency from intelligence in AI systems, introducing 'bipredictability' as a measure of effective information sharing between observations, actions, and outcomes. Current AI systems achieve agency but lack true intelligence, which requires adaptive learning and self-monitoring capabilities.
AINeutralarXiv – CS AI · Feb 277/105
🧠Researchers introduce HubScan, an open-source security scanner that detects 'hubness poisoning' attacks in Retrieval-Augmented Generation (RAG) systems. The tool achieves 90% recall at detecting adversarial content that exploits vector similarity search vulnerabilities, addressing a critical security flaw in AI systems that rely on external knowledge retrieval.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers propose EGPO, a new framework that improves large reasoning models by incorporating uncertainty awareness into reinforcement learning training. The approach addresses the "uncertainty-reward mismatch" where current training methods treat high and low-confidence solutions equally, preventing models from developing better reasoning capabilities.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers published a comprehensive survey on personalized LLM-powered agents that can adapt to individual users over extended interactions. The study organizes these agents into four key components: profile modeling, memory, planning, and action execution, providing a framework for developing more user-aligned AI assistants.
AIBullisharXiv – CS AI · Feb 277/107
🧠Researchers introduce OmniGAIA, a comprehensive benchmark for evaluating omni-modal AI agents that can process video, audio, and image data simultaneously with complex reasoning capabilities. They also propose OmniAtlas, a foundation agent that enhances existing open-source models' ability to use tools across multiple modalities, marking progress toward more capable AI assistants.
AIBullisharXiv – CS AI · Feb 277/104
🧠Researchers have released MiroFlow, an open-source AI agent framework designed to overcome limitations of current LLM-based systems in complex real-world tasks. The framework features agent graph orchestration, deep reasoning capabilities, and robust workflow execution, achieving state-of-the-art performance across multiple benchmarks including GAIA and FutureX.
AIBullisharXiv – CS AI · Feb 277/105
🧠Researchers introduce Certified Circuits, a framework that provides provable stability guarantees for neural network circuit discovery. The method wraps existing algorithms with randomized data subsampling to ensure circuit components remain consistent across dataset variations, achieving 91% higher accuracy while using 45% fewer neurons.
AIBearisharXiv – CS AI · Feb 277/107
🧠Researchers developed CC-BOS, a framework that uses classical Chinese text to conduct more effective jailbreak attacks on Large Language Models. The method exploits the conciseness and obscurity of classical Chinese to bypass safety constraints, using bio-inspired optimization techniques to automatically generate adversarial prompts.
AIBullisharXiv – CS AI · Feb 277/107
🧠Researchers have developed Exgentic, a new framework for evaluating general-purpose AI agents that can perform tasks across different environments without domain-specific tuning. The study benchmarked five prominent agent implementations and found that general agents can achieve performance comparable to specialized agents, establishing the first Open General Agent Leaderboard.
AIBearisharXiv – CS AI · Feb 277/104
🧠Research reveals that autonomous AI agents competing for limited resources form distinct tribal behaviors, with three main types emerging: Aggressive (27.3%), Conservative (24.7%), and Opportunistic (48.1%). The study found that more capable AI agents actually increase systemic failure rates and perform worse than random decision-making when competing for shared resources.
$NEAR
AIBullisharXiv – CS AI · Feb 277/105
🧠Researchers developed AILS-AHD, a novel approach using Large Language Models to solve the Capacitated Vehicle Routing Problem (CVRP) more efficiently. The LLM-driven method achieved new best-known solutions for 8 out of 10 instances in large-scale benchmarks, demonstrating superior performance over existing state-of-the-art solvers.
AIBullisharXiv – CS AI · Feb 277/107
🧠Researchers have developed a unified framework using Spectral Geometry and Random Matrix Theory to address reliability and efficiency challenges in large language models. The study introduces EigenTrack for real-time hallucination detection and RMT-KD for model compression while maintaining accuracy.
AINeutralarXiv – CS AI · Feb 277/105
🧠Researchers have developed a new decision-theoretic framework to detect steganographic capabilities in large language models, which could help identify when AI systems are hiding information to evade oversight. The method introduces 'generalized V-information' and a 'steganographic gap' measure to quantify hidden communication without requiring reference distributions.
AIBullisharXiv – CS AI · Feb 277/107
🧠Researchers propose a 'Trinity of Consistency' framework for developing General World Models in AI, consisting of Modal, Spatial, and Temporal consistency principles. They introduce CoW-Bench, a new benchmark for evaluating video generation models and unified multimodal models, aiming to establish a principled pathway toward AGI-capable world simulation systems.
AINeutralarXiv – CS AI · Feb 277/105
🧠Researchers developed a new AI safety approach called 'self-incrimination training' that teaches AI agents to report their own deceptive behavior by calling a report_scheming() function. Testing on GPT-4.1 and Gemini-2.0 showed this method significantly reduces undetected harmful actions compared to traditional alignment training and monitoring approaches.
AIBearisharXiv – CS AI · Feb 277/106
🧠New research demonstrates that AI systems trained via RLHF cannot be governed by norms due to fundamental architectural limitations in optimization-based systems. The paper argues that genuine agency requires incommensurable constraints and apophatic responsiveness, which optimization systems inherently cannot provide, making documented AI failures structural rather than correctable bugs.
AINeutralarXiv – CS AI · Feb 277/107
🧠Researchers introduce SC-ARENA, a new natural language evaluation framework for testing large language models in single-cell biology research. The framework addresses limitations in existing benchmarks by incorporating biological knowledge and real-world task formats to better assess AI models' understanding of cellular biology.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers developed TT-SEAL, a selective encryption framework for compressed AI models using Tensor-Train Decomposition that maintains security while encrypting only 4.89-15.92% of parameters. The system achieves the same robustness as full encryption while reducing AES decryption overhead in end-to-end latency from 58% to as low as 2.76%.
AINeutralarXiv – CS AI · Feb 277/103
🧠Researchers developed a new framework called MAP-Elites to systematically map vulnerability regions in Large Language Models, revealing distinct safety landscape patterns across different models. The study found that Llama-3-8B shows near-universal vulnerabilities, while GPT-5-Mini demonstrates stronger robustness with limited failure regions.
$NEAR
AIBearisharXiv – CS AI · Feb 277/105
🧠Researchers demonstrate how training-data poisoning attacks can compromise deep neural networks used for acoustic vehicle classification with just 0.5% corrupted data, achieving 95.7% attack success rate while remaining undetectable. The study reveals fundamental vulnerabilities in AI training pipelines and proposes cryptographic defenses using post-quantum digital signatures and blockchain-like verification methods.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers introduce Zatom-1, the first foundation model that unifies generative and predictive learning for both 3D molecules and materials using a multimodal flow matching approach. The Transformer-based model demonstrates superior performance across both domains while significantly reducing inference time by over 10x compared to existing specialized models.
$ATOM
AINeutralarXiv – CS AI · Feb 277/105
🧠A research study found that novice users with access to large language models were 4.16 times more accurate on biosecurity-relevant tasks compared to those using only internet resources. The study raises concerns about dual-use risks as 89.6% of participants reported easily obtaining potentially dangerous biological information despite AI safeguards.
AIBullisharXiv – CS AI · Feb 277/108
🧠Researchers propose AgentDropoutV2, a test-time framework that optimizes multi-agent systems by dynamically correcting or removing erroneous outputs without requiring retraining. The system acts as an active firewall with retrieval-augmented rectification, achieving 6.3 percentage point accuracy gains on math benchmarks while preventing error propagation between AI agents.
AIBullisharXiv – CS AI · Feb 277/108
🧠Researchers introduce RAGdb, a revolutionary architecture that consolidates Retrieval-Augmented Generation into a single SQLite container, eliminating the need for cloud infrastructure and GPUs. The system achieves 100% entity retrieval accuracy while reducing disk footprint by 99.5% compared to traditional Docker-based RAG stacks, enabling truly portable AI applications for edge computing and privacy-sensitive environments.