12,178 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers developed TT-SEAL, a selective encryption framework for compressed AI models using Tensor-Train Decomposition that maintains security while encrypting only 4.89-15.92% of parameters. The system achieves the same robustness as full encryption while reducing AES decryption overhead in end-to-end latency from 58% to as low as 2.76%.
AIBearisharXiv – CS AI · Feb 277/106
🧠New research demonstrates that AI systems trained via RLHF cannot be governed by norms due to fundamental architectural limitations in optimization-based systems. The paper argues that genuine agency requires incommensurable constraints and apophatic responsiveness, which optimization systems inherently cannot provide, making documented AI failures structural rather than correctable bugs.
AIBearisharXiv – CS AI · Feb 277/107
🧠Researchers developed CC-BOS, a framework that uses classical Chinese text to conduct more effective jailbreak attacks on Large Language Models. The method exploits the conciseness and obscurity of classical Chinese to bypass safety constraints, using bio-inspired optimization techniques to automatically generate adversarial prompts.
AIBullisharXiv – CS AI · Feb 277/104
🧠Researchers have released MiroFlow, an open-source AI agent framework designed to overcome limitations of current LLM-based systems in complex real-world tasks. The framework features agent graph orchestration, deep reasoning capabilities, and robust workflow execution, achieving state-of-the-art performance across multiple benchmarks including GAIA and FutureX.
AINeutralarXiv – CS AI · Feb 277/108
🧠Researchers introduce MM-NeuroOnco, a large-scale multimodal dataset containing 24,726 MRI slices and 200,000 instructions for training AI models in brain tumor diagnosis. The benchmark reveals significant challenges in medical AI, with even advanced models like Gemini 3 Flash achieving only 41.88% accuracy on diagnostic questions.
AIBullisharXiv – CS AI · Feb 277/107
🧠Researchers have developed Exgentic, a new framework for evaluating general-purpose AI agents that can perform tasks across different environments without domain-specific tuning. The study benchmarked five prominent agent implementations and found that general agents can achieve performance comparable to specialized agents, establishing the first Open General Agent Leaderboard.
AIBullisharXiv – CS AI · Feb 277/107
🧠Researchers introduce OmniGAIA, a comprehensive benchmark for evaluating omni-modal AI agents that can process video, audio, and image data simultaneously with complex reasoning capabilities. They also propose OmniAtlas, a foundation agent that enhances existing open-source models' ability to use tools across multiple modalities, marking progress toward more capable AI assistants.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers developed ViT-Linearizer, a distillation framework that transfers Vision Transformer knowledge into linear-time models, addressing quadratic complexity issues for high-resolution inputs. The method achieves 84.3% ImageNet accuracy while providing significant speedups, bridging the gap between efficient RNN-based architectures and transformer performance.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers introduce VALTEST, a framework that uses semantic entropy to automatically validate test cases generated by Large Language Models, addressing the problem of invalid or hallucinated tests that mislead AI programming agents. The system improves test validity by up to 29% and enhances code generation performance through better filtering of LLM-generated test cases.
AINeutralarXiv – CS AI · Feb 277/107
🧠A new academic paper proposes that machine consciousness requires simultaneous computation rather than sequential processing. The research introduces 'Stack Theory' with temporal semantics, arguing that conscious unity depends on objective co-instantiation of mental processes within specific time windows, potentially making software consciousness impossible on purely sequential computer architectures.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers introduce Zatom-1, the first foundation model that unifies generative and predictive learning for both 3D molecules and materials using a multimodal flow matching approach. The Transformer-based model demonstrates superior performance across both domains while significantly reducing inference time by over 10x compared to existing specialized models.
$ATOM
AIBullisharXiv – CS AI · Feb 277/105
🧠Researchers introduce Certified Circuits, a framework that provides provable stability guarantees for neural network circuit discovery. The method wraps existing algorithms with randomized data subsampling to ensure circuit components remain consistent across dataset variations, achieving 91% higher accuracy while using 45% fewer neurons.
AINeutralarXiv – CS AI · Feb 277/106
🧠Researchers introduced VeRO (Versioning, Rewards, and Observations), a new evaluation framework for testing AI coding agents that can optimize other AI agents through iterative improvement cycles. The system provides reproducible benchmarks and structured execution traces to systematically measure how well coding agents can improve target agents' performance.
AIBullisharXiv – CS AI · Feb 277/105
🧠Researchers introduce CourtGuard, a new framework for AI safety that uses retrieval-augmented multi-agent debate to evaluate LLM outputs without requiring expensive retraining. The system achieves state-of-the-art performance across 7 safety benchmarks and demonstrates zero-shot adaptability to new policy requirements, offering a more flexible approach to AI governance.
AIBullisharXiv – CS AI · Feb 277/105
🧠Researchers propose Metacognitive Behavioral Tuning (MBT), a new framework that addresses structural fragility in Large Reasoning Models by injecting human-like self-regulatory control into AI thought processes. The approach reduces reasoning collapse and improves accuracy while consuming fewer computational tokens across multi-hop question-answering benchmarks.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers introduce GraftLLM, a new method for transferring knowledge between large language models using 'SkillPack' format that preserves capabilities while avoiding catastrophic forgetting. The approach enables efficient model fusion and continual learning for heterogeneous models through modular knowledge storage.
AIBullisharXiv – CS AI · Feb 277/105
🧠Researchers developed a new economic framework called 'cost-of-pass' to evaluate AI language models by combining accuracy with inference costs. The study found that lightweight models are most cost-effective for basic tasks while reasoning models excel at complex problems, with costs for complex quantitative tasks roughly halving every few months.
AINeutralarXiv – CS AI · Feb 277/108
🧠Researchers propose a mathematical framework distinguishing agency from intelligence in AI systems, introducing 'bipredictability' as a measure of effective information sharing between observations, actions, and outcomes. Current AI systems achieve agency but lack true intelligence, which requires adaptive learning and self-monitoring capabilities.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers developed a runtime-reconfigurable bitwise systolic array architecture for multi-precision quantized neural networks on FPGA hardware accelerators. The system achieves 1.3-3.6x speedup on mixed-precision models while supporting higher clock frequencies up to 250MHz, addressing the trade-off between hardware efficiency and inference accuracy.
AINeutralarXiv – CS AI · Feb 277/105
🧠Researchers propose FedWQ-CP, a new approach for uncertainty quantification in federated learning that addresses both data and model heterogeneity challenges. The method enables reliable uncertainty estimation across distributed agents while maintaining efficiency through single-round communication and weighted threshold aggregation.
AIBullisharXiv – CS AI · Feb 277/108
🧠FlashOptim introduces memory optimization techniques that reduce AI training memory requirements by over 50% per parameter while maintaining model quality. The suite reduces AdamW memory usage from 16 bytes to 7 bytes per parameter through improved master weight splitting and 8-bit optimizer state quantization.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers published a comprehensive survey on personalized LLM-powered agents that can adapt to individual users over extended interactions. The study organizes these agents into four key components: profile modeling, memory, planning, and action execution, providing a framework for developing more user-aligned AI assistants.
AINeutralarXiv – CS AI · Feb 277/107
🧠A qualitative study with 26 non-AI expert stakeholders reveals that everyday users assess AI fairness more comprehensively than AI experts, considering broader features beyond legally protected categories and setting stricter fairness thresholds. The research highlights the importance of incorporating stakeholder perspectives in AI governance and fairness assessment processes.
AINeutralarXiv – CS AI · Feb 277/107
🧠LiveMCPBench introduces the first large-scale benchmark evaluating AI agents' ability to navigate real-world tasks using Model Context Protocol (MCP) tools across multiple servers. The benchmark reveals significant performance gaps, with top model Claude-Sonnet-4 achieving 78.95% success while most models only reach 30-50%, identifying tool retrieval as the primary bottleneck.
$OCEAN
AINeutralarXiv – CS AI · Feb 277/107
🧠Researchers developed Compositional-ARC, a dataset to test AI models' ability to systematically generalize abstract spatial reasoning tasks. A small 5.7M parameter transformer model trained with meta-learning outperformed large language models like GPT-4o and Gemini 2.0 Flash on novel geometric transformation combinations.