11,655 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers developed KernelSkill, a multi-agent framework that optimizes GPU kernel performance using expert knowledge rather than trial-and-error approaches. The system achieved 100% success rates and significant speedups (1.92x to 5.44x) over existing methods, addressing a critical bottleneck in AI system efficiency.
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers introduce Super Neurons (SNs), a new method that probes raw activations in Vision Language Models to improve classification performance while achieving up to 5.10x speedup. Unlike Sparse Attention Vectors, SNs can identify discriminative neurons in shallow layers, enabling extreme early exiting from the first layer at the first generated token.
AIBearisharXiv – CS AI · Mar 127/10
🧠Researchers have developed 'Amnesia,' a lightweight adversarial attack that bypasses safety mechanisms in open-weight Large Language Models by manipulating internal transformer states. The attack enables generation of harmful content without requiring fine-tuning or additional training, highlighting vulnerabilities in current LLM safety measures.
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers have developed HTMuon, an improved optimization algorithm for training large language models that builds upon the existing Muon optimizer. HTMuon addresses limitations in Muon's weight spectra by incorporating heavy-tailed spectral corrections, showing up to 0.98 perplexity reduction in LLaMA pretraining experiments.
🏢 Perplexity
AIBearisharXiv – CS AI · Mar 127/10
🧠Researchers developed a new framework for evaluating AI security risks specifically in banking and financial services, introducing the Risk-Adjusted Harm Score (RAHS) to measure severity of AI model failures. The study found that AI models become more vulnerable to security exploits during extended interactions, exposing critical weaknesses in current AI safety assessments for financial institutions.
AINeutralarXiv – CS AI · Mar 127/10
🧠Researchers applied sparse autoencoders to analyze Chronos-T5-Large, a 710M parameter time series foundation model, revealing how different layers process temporal data. The study found that mid-encoder layers contain the most causally important features for change detection, while early layers handle frequency patterns and final layers compress semantic concepts.
AINeutralarXiv – CS AI · Mar 127/10
🧠Researchers propose treating multi-agent AI memory as a computer architecture problem, introducing a three-layer memory hierarchy and identifying critical protocol gaps. The paper highlights multi-agent memory consistency as the most pressing challenge for building scalable collaborative AI systems.
AIBearisharXiv – CS AI · Mar 127/10
🧠Researchers have introduced Flip-Agent, the first targeted bit-flip attack framework specifically designed to exploit LLM-based agents by manipulating hardware faults. The attack can manipulate both final outputs and tool invocations in multi-stage AI agent pipelines, revealing critical security vulnerabilities in these systems.
AINeutralarXiv – CS AI · Mar 127/10
🧠Researchers conducted comprehensive benchmarks of LLM inference on AMD Instinct MI325X GPUs, testing models from 235B to 1 trillion parameters. The study reveals that architecture-aware optimization is critical, with different model types requiring specific configurations for optimal performance on AMD hardware.
🧠 Llama
AIBearisharXiv – CS AI · Mar 127/10
🧠A large-scale study of 62,808 AI safety evaluations across six frontier models reveals that deployment scaffolding architectures can significantly impact measured safety, with map-reduce scaffolding degrading safety performance. The research found that evaluation format (multiple-choice vs open-ended) affects safety scores more than scaffold architecture itself, and safety rankings vary dramatically across different models and configurations.
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers have developed dmaplane, a Linux kernel module that provides buffer orchestration for AI workloads, addressing the gap between efficient data transport and proper buffer management. The system integrates RDMA, GPU memory management, and NUMA-aware allocation to optimize high-performance AI data paths at the kernel level.
AIBullisharXiv – CS AI · Mar 127/10
🧠RedFuser is a new automated framework that optimizes AI model deployment by fusing cascaded reduction operations into single loops, achieving 2-5x performance improvements. The system addresses limitations in existing AI compilers that struggle with complex multi-loop operations like those found in attention mechanisms.
AIBullisharXiv – CS AI · Mar 127/10
🧠OpenAI researchers introduce IH-Challenge, a reinforcement learning dataset designed to improve instruction hierarchy in frontier LLMs. Fine-tuning GPT-5-Mini with this dataset improved robustness by 10% and significantly reduced unsafe behavior while maintaining helpfulness.
🏢 OpenAI🏢 Hugging Face🧠 GPT-5
AINeutralarXiv – CS AI · Mar 127/10
🧠Researchers introduce TRACED, a framework that evaluates AI reasoning quality through geometric analysis rather than traditional scalar probabilities. The system identifies correct reasoning as high-progress stable trajectories, while AI hallucinations show low-progress unstable patterns with high curvature fluctuations.
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers developed HyMEM, a brain-inspired hybrid memory system that significantly improves GUI agents' ability to interact with computers. The system uses graph-based structured memory combining symbolic nodes with trajectory embeddings, enabling smaller 7B/8B models to match or exceed performance of larger closed-source models like GPT-4o.
🧠 GPT-4
AINeutralarXiv – CS AI · Mar 127/10
🧠A comprehensive study analyzing 896 academic papers and 80+ regulatory documents reveals critical ambiguities in how 'AI models' and 'AI systems' are defined across regulations like the EU AI Act. The research proposes clear operational definitions to resolve regulatory boundary problems that complicate responsibility allocation across the AI value chain.
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers developed a method using neural cellular automata (NCA) to generate synthetic data for pre-training language models, achieving up to 6% improvement in downstream performance with only 164M synthetic tokens. This approach outperformed traditional pre-training on 1.6B natural language tokens while being more computationally efficient and transferring well to reasoning benchmarks.
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers developed ES-dLLM, a training-free inference acceleration framework that speeds up diffusion large language models by selectively skipping tokens in early layers based on importance scoring. The method achieves 5.6x to 16.8x speedup over vanilla implementations while maintaining generation quality, offering a promising alternative to autoregressive models.
🏢 Nvidia
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers propose ROVA, a new training framework that improves vision-language models' robustness in real-world conditions by up to 24% accuracy gains. The framework addresses performance degradation from weather, occlusion, and camera motion that can cause up to 35% accuracy drops in current models.
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers propose a novel lightweight architecture for verifiable aggregation in federated learning that uses backdoor injection as intrinsic proofs instead of expensive cryptographic methods. The approach achieves over 1000x speedup compared to traditional cryptographic baselines while maintaining high detection rates against malicious servers.
AIBullisharXiv – CS AI · Mar 127/10
🧠Researchers have developed a new method to detect and eliminate backdoor triggers in neural networks using active path analysis. The approach shows promising results in experiments with machine learning models used for intrusion detection, addressing a critical cybersecurity vulnerability.
AINeutralarXiv – CS AI · Mar 127/10
🧠Researchers developed the first benchmark dataset to measure refusal rates in military Large Language Models, finding that current LLMs refuse up to 98.2% of legitimate military queries due to safety behaviors. The study tested 34 models and demonstrated techniques to reduce refusals while maintaining military task performance.
AINeutralarXiv – CS AI · Mar 127/10
🧠A research study reveals that large language models develop strong internal compositional representations for adjective-noun combinations, but struggle to consistently translate these representations into successful task performance. The findings highlight a significant gap between what LLMs understand internally and their functional capabilities.
AIBearisharXiv – CS AI · Mar 127/10
🧠Research study finds that LLaMA-70B-Instruct hallucinated in 19.7% of medical Q&A responses despite high plausibility scores, highlighting significant reliability issues in AI healthcare applications. The study shows that lower hallucination rates correlate with higher usefulness scores, emphasizing the need for better safeguards in medical AI systems.
AINeutralarXiv – CS AI · Mar 127/10
🧠Research examining five major LLMs found they exhibit human-like cognitive biases when evaluating judicial scenarios, showing stronger virtuous victim effects but reduced credential-based halo effects compared to humans. The study suggests LLMs may offer modest improvements over human decision-making in judicial contexts, though variability across models limits current practical application.
🧠 ChatGPT🧠 Claude🧠 Sonnet