AIBullisharXiv – CS AI · May 297/10
🧠Researchers introduce MEMENTO, a framework that treats web exploration as a learning signal for AI agents operating in data-scarce domains. By combining iterative web search with dual-channel memory systems, MEMENTO achieves 25-36% performance improvements over baseline models in professional applications like sales automation and legal research without requiring additional model training.
AIBullisharXiv – CS AI · May 287/10
🧠AIBuildAI-2 introduces a knowledge-enhanced AI agent that automatically builds machine learning models by combining large language models with an external, evolving knowledge system. The system achieves state-of-the-art performance, ranking first on MLE-Bench and placing in the top 6.6% of human teams in a predictive competition, democratizing AI model development for non-specialists.
AINeutralarXiv – CS AI · Mar 46/105
🧠Researchers propose a framework for developing trustworthy AI agents that function as epistemic entities, capable of pursuing knowledge goals and shaping information environments. The paper argues that as AI models increasingly replace traditional search methods and provide specialized advice, their calibration to human epistemic norms becomes critical to prevent cognitive deskilling and epistemic drift.
AIBearisharXiv – CS AI · May 276/10
🧠A new research paper examines how generative AI systems in higher education perpetuate marginalization of non-Western epistemologies and disability perspectives due to Western-centric training data. The study argues that AI's claim to neutrality masks its active role in reinforcing epistemic coloniality, with persons with disabilities experiencing particular exclusion from both AI design processes and knowledge validation systems.
AINeutralarXiv – CS AI · May 76/10
🧠Researchers present a neuro-symbolic framework that challenges the conventional belief that temporal reasoning failures in LLMs stem from inherent logical deduction deficits. By decoupling text-to-event representation from symbolic reasoning using a Probabilistic Inconsistency Signal, the framework achieves perfect accuracy on structured temporal tasks and identifies that representation quality—not reasoning capability—is the true bottleneck.
AIBullisharXiv – CS AI · Apr 156/10
🧠Researchers introduce KG-Reasoner, an end-to-end framework that uses reinforcement learning to train large language models to perform multi-hop reasoning over knowledge graphs without decomposing tasks into isolated pipeline steps. The approach demonstrates competitive or superior performance across eight reasoning benchmarks by enabling LLMs to dynamically explore reasoning paths and backtrack when necessary.
AIBearisharXiv – CS AI · Apr 136/10
🧠Researchers conducted a large-scale computational analysis comparing 17,790 articles from Grokipedia, Elon Musk's AI-generated encyclopedia, against Wikipedia. The study found that Grokipedia articles are longer but contain fewer citations, with some entries showing systematic rightward political bias in media sources, particularly in history, religion, and arts sections.
🏢 xAI🧠 Grok
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers have developed QA-Dragon, a new Query-Aware Dynamic RAG System that significantly improves knowledge-intensive Visual Question Answering by combining text and image retrieval strategies. The system achieved substantial performance improvements of 5-6% across different tasks in the Meta CRAG-MM Challenge at KDD Cup 2025.
AIBearisharXiv – CS AI · Mar 36/104
🧠A new research study analyzes how Large Language Models are impacting Wikipedia content and structure, finding approximately 1% influence in certain categories. The research warns of potential risks to AI benchmarks and natural language processing tasks if Wikipedia becomes contaminated by LLM-generated content.
AIBullisharXiv – CS AI · Mar 115/10
🧠Researchers developed ELERAG, an enhanced Retrieval-Augmented Generation architecture that integrates Entity Linking with Wikidata to improve factual accuracy in educational AI systems. The system shows significant performance improvements in domain-specific contexts compared to standard RAG approaches, particularly for Italian educational question-answering applications.