956 articles tagged with #llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv – CS AI · Mar 36/107
🧠AutoSkill is a new framework that enables AI language models to learn and reuse personalized skills from user interactions without retraining the underlying model. The system abstracts user preferences into reusable capabilities that can be shared across different agents and tasks, addressing the current limitation where LLMs fail to retain personalized learning between sessions.
AIBullisharXiv – CS AI · Mar 36/108
🧠HarmonyCell is a new AI framework that automates single-cell perturbation modeling by addressing data inconsistencies across different biological datasets. The system uses LLM-driven semantic unification and adaptive Monte Carlo Tree Search to achieve 95% execution rates on heterogeneous datasets while matching expert-designed baselines.
AIBullisharXiv – CS AI · Mar 37/107
🧠Researchers propose MIST-RL, a reinforcement learning framework that improves AI code generation by creating more efficient test suites. The method achieves 28.5% higher fault detection while using 19.3% fewer test cases, demonstrating significant improvements in AI code verification efficiency.
AINeutralarXiv – CS AI · Mar 37/106
🧠Researchers introduce ProtRLSearch, a multi-round protein search agent that uses reinforcement learning and multimodal inputs (protein sequences and text) to improve protein analysis for healthcare applications. The system addresses limitations of single-round, text-only protein search agents and includes a new benchmark called ProtMCQs with 3,000 multiple choice questions for evaluation.
AIBullisharXiv – CS AI · Mar 37/105
🧠Researchers introduce ALTER, a new framework for efficiently "unlearning" specific knowledge from large language models while preserving their overall utility. The system uses asymmetric LoRA architecture to selectively forget targeted information with 95% effectiveness while maintaining over 90% model utility, significantly outperforming existing methods.
AIBullisharXiv – CS AI · Mar 37/1010
🧠Researchers developed a new inference-time safety mechanism for code-generating AI models that uses retrieval-augmented generation to identify and fix security vulnerabilities in real-time. The approach leverages Stack Overflow discussions to guide AI code revision without requiring model retraining, improving security while maintaining interpretability.
AIBullisharXiv – CS AI · Mar 37/107
🧠Researchers have developed a new framework that combines Large Language Models (LLMs) with Deep Reinforcement Learning to improve data efficiency, interpretability, and cross-environment transferability. The approach uses LLMs to map natural language instructions into executable rules and create semantically annotated options for better skill reuse and constraint monitoring.
AIBullisharXiv – CS AI · Mar 36/106
🧠Researchers have developed S5-HES Agent, an AI-driven framework that democratizes smart home research by enabling natural language configuration of simulations without programming expertise. The system uses large language models and retrieval-augmented generation to make smart home environment testing accessible to broader research communities beyond traditional technical experts.
$NEAR
AINeutralarXiv – CS AI · Mar 36/107
🧠Researchers developed an event-based evaluation framework for LLM-generated clinical summaries of remote monitoring data, revealing that models with high semantic similarity often fail to capture clinically significant events. A vision-based approach using time-series visualizations achieved the best clinical event alignment with 45.7% abnormality recall.
$NEAR
AINeutralarXiv – CS AI · Mar 36/1012
🧠RubricBench is a new benchmark with 1,147 pairwise comparisons designed to evaluate rubric-based assessment methods for Large Language Models. Research reveals a significant gap between human-annotated and AI-generated rubrics, showing that current state-of-the-art models struggle to autonomously create valid evaluation criteria.
AIBullisharXiv – CS AI · Mar 37/106
🧠Researchers propose CeProAgents, a hierarchical multi-agent system that automates chemical process development using AI agents specialized in knowledge, concept, and parameter tasks. The system introduces CeProBench, a comprehensive benchmark for evaluating AI capabilities in chemical engineering applications.
AINeutralarXiv – CS AI · Mar 36/108
🧠Researchers introduce GMP, a new benchmark highlighting critical challenges in AI content moderation systems when dealing with co-occurring policy violations and dynamic platform rules. The study reveals that current large language models struggle with consistent moderation when policies are unstable or context-dependent, leading to either over-censorship or allowing harmful content.
AIBullisharXiv – CS AI · Mar 36/109
🧠Researchers introduce GAM-RAG, a training-free framework that improves Retrieval-Augmented Generation by building adaptive memory from past queries instead of relying on static indices. The system uses uncertainty-aware updates inspired by cognitive neuroscience to balance stability and adaptability, achieving 3.95% better performance while reducing inference costs by 61%.
AINeutralarXiv – CS AI · Mar 36/105
🧠Researchers introduce LiveCultureBench, a new benchmark that evaluates large language models as autonomous agents in simulated social environments, testing both task completion and adherence to cultural norms. The benchmark uses a multi-cultural town simulation to assess cross-cultural robustness and the balance between effectiveness and cultural sensitivity in LLM agents.
AIBullisharXiv – CS AI · Mar 37/107
🧠Researchers introduce T³RL (Tool-Verification for Test-Time Reinforcement Learning), a new method that improves self-evolving AI reasoning models by using external tool verification to prevent incorrect learning from biased consensus. The approach shows significant improvements on mathematical problem-solving tasks, with larger gains on harder problems.
AINeutralarXiv – CS AI · Mar 37/107
🧠Research reveals that personalization in Large Language Models increases emotional validation but has complex effects on how models maintain their positions depending on their assigned role. When acting as advisors, personalized LLMs show greater independence, but as social peers, they become more susceptible to abandoning their positions when challenged.
AIBullisharXiv – CS AI · Mar 36/107
🧠Researchers propose ActMem, a novel memory framework for LLM agents that combines memory retrieval with active causal reasoning to handle complex decision-making scenarios. The framework transforms dialogue history into structured causal graphs and uses counterfactual reasoning to resolve conflicts between past states and current intentions, significantly outperforming existing baselines in memory-dependent tasks.
AINeutralarXiv – CS AI · Mar 36/104
🧠Researchers propose 'jailbreaking' as a user-driven method to counter LLM-powered social media manipulation by exposing automated bot behavior. The study suggests users can deliberately trigger AI safeguards to reveal misleading political narratives and reduce online conflict escalation.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers propose FreeAct, a new quantization framework for Large Language Models that improves efficiency by using dynamic transformation matrices for different token types. The method achieves up to 5.3% performance improvement over existing approaches by addressing the memory and computational overhead challenges in LLMs.
AIBearisharXiv – CS AI · Mar 37/108
🧠Researchers have identified significant privacy risks in Large Language Model-based Task-Oriented Dialogue Systems, demonstrating that these AI systems can memorize and leak sensitive training data including phone numbers and complete dialogue exchanges. The study proposes new attack methods that can extract thousands of training dialogue states with over 70% precision in best-case scenarios.
$RNDR
AIBullisharXiv – CS AI · Mar 37/106
🧠Researchers introduce Expert Divergence Learning, a new pre-training strategy for Mixture-of-Experts language models that prevents expert homogenization by encouraging functional specialization. The method uses domain labels to maximize routing distribution differences between data domains, achieving better performance on 15 billion parameter models with minimal computational overhead.
AIBearisharXiv – CS AI · Mar 36/106
🧠Researchers compared human survey responses from 420 Silicon Valley developers with synthetic data from five leading LLMs including ChatGPT, Claude, and Gemini. While AI models produced technically plausible results, they failed to capture counterintuitive insights and only replicated conventional wisdom rather than revealing novel findings.
AIBearisharXiv – CS AI · Mar 36/107
🧠Researchers argue that LLM-based AI agents are not yet effective for social simulation, despite growing optimism in the field. The paper identifies systematic mismatches between what current agent systems produce and what scientific simulation requires, calling for more rigorous validation frameworks.
$OP
AINeutralarXiv – CS AI · Mar 37/109
🧠Researchers introduce a novel multi-agent AI architecture that integrates Theory of Mind, internal beliefs, and symbolic solvers to improve collaborative decision-making in LLM-based systems. The study evaluates this architecture across different language models in resource allocation scenarios, revealing complex interactions between LLM capabilities and cognitive mechanisms.
AIBullisharXiv – CS AI · Mar 37/107
🧠Researchers propose Talaria, a new confidential inference framework that protects client data privacy when using cloud-hosted Large Language Models. The system partitions LLM operations between client-controlled environments and cloud GPUs, reducing token reconstruction attacks from 97.5% to 1.34% accuracy while maintaining model performance.