191 articles tagged with #large-language-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท 3d ago7/10
๐ง Researchers introduce Humanoid-LLA, a Large Language Action Model enabling humanoid robots to execute complex physical tasks from natural language commands. The system combines a unified motion vocabulary, physics-aware controller, and reinforcement learning to achieve both language understanding and real-world robot control, demonstrating improved performance on Unitree G1 and Booster T1 humanoids.
AIBullisharXiv โ CS AI ยท 6d ago7/10
๐ง Researchers developed a weak supervision framework to detect hallucinations in large language models by distilling grounding signals into transformer representations during training. Using substring matching, sentence embeddings, and LLM judges, they created a 15,000-sample dataset and trained five probing classifiers that achieve hallucination detection from internal activations alone at inference time, eliminating the need for external verification systems.
AINeutralarXiv โ CS AI ยท 6d ago7/10
๐ง A comprehensive survey of generative AI and large language models as of early 2026 has been published, covering frontier open-weight models like DeepSeek and Qwen alongside proprietary systems, with detailed analysis of architectures, deployment protocols, and applications across fifteen industry sectors.
๐ข Anthropic๐ง GPT-5๐ง Claude
AIBullisharXiv โ CS AI ยท 6d ago7/10
๐ง Researchers propose AI-Driven Research for Systems (ADRS), a framework using large language models to automate database optimization by generating and evaluating hundreds of candidate solutions. By co-evolving evaluators with solutions, the team demonstrates discovery of novel algorithms achieving up to 6.8x latency improvements over existing baselines in buffer management, query rewriting, and index selection tasks.
AIBearisharXiv โ CS AI ยท Apr 77/10
๐ง Researchers present a new framework for AI safety that identifies a 57-token predictive window for detecting potential failures in large language models. The study found that only one out of seven tested models showed predictive signals before committing to problematic outputs, while factual hallucinations produced no detectable warning signs.
AIBullisharXiv โ CS AI ยท Apr 77/10
๐ง Researchers propose a new approach to Generative Engine Optimization (GEO) that moves beyond current RAG-based systems to deterministic multi-agent platforms. The study introduces mathematical models for confidence decay in LLMs and demonstrates near-zero hallucination rates through specialized agent routing in industrial applications.
AIBullisharXiv โ CS AI ยท Apr 77/10
๐ง A comprehensive research review examines the current applications of Large Language Models (LLMs) across various healthcare specialties including cancer care, dermatology, dental care, neurodegenerative disorders, and mental health. The study highlights LLMs' transformative impact on medical diagnostics and patient care while acknowledging existing challenges and limitations in healthcare integration.
AINeutralarXiv โ CS AI ยท Apr 77/10
๐ง Researchers found that large language models align with human brain activity during creative thinking tasks, with alignment increasing based on model size and idea originality. Different post-training approaches selectively reshape how LLMs align with creative versus analytical neural patterns in humans.
๐ง Llama
AINeutralarXiv โ CS AI ยท Apr 67/10
๐ง Researchers published a comprehensive technical survey on Large Language Model augmentation strategies, examining methods from in-context learning to advanced Retrieval-Augmented Generation techniques. The study provides a unified framework for understanding how structured context at inference time can overcome LLMs' limitations of static knowledge and finite context windows.
AINeutralarXiv โ CS AI ยท Mar 277/10
๐ง Researchers have identified a fundamental issue in large language models where verbalized confidence scores don't align with actual accuracy due to orthogonal encoding of these signals. They discovered a 'Reasoning Contamination Effect' where simultaneous reasoning disrupts confidence calibration, and developed a two-stage adaptive steering pipeline to improve alignment.
AINeutralarXiv โ CS AI ยท Mar 277/10
๐ง Research reveals that large language models process instructions differently across languages due to social register variations, with imperative commands carrying different obligatory force in different speech communities. The study found that declarative rewording of instructions reduces cross-linguistic variance by 81% and suggests models treat instructions as social acts rather than technical specifications.
AIBearisharXiv โ CS AI ยท Mar 277/10
๐ง Research reveals that open-source large language models (LLMs) lack hierarchical knowledge of visual taxonomies, creating a bottleneck for vision LLMs in hierarchical visual recognition tasks. The study used one million visual question answering tasks across six taxonomies to demonstrate this limitation, finding that even fine-tuning cannot overcome the underlying LLM knowledge gaps.
AIBullisharXiv โ CS AI ยท Mar 277/10
๐ง Researchers propose HIVE, a new framework for training large language models more efficiently in reinforcement learning by selecting high-utility prompts before rollout. The method uses historical reward data and prompt entropy to identify the 'learning edge' where models learn most effectively, significantly reducing computational overhead without performance loss.
AIBullisharXiv โ CS AI ยท Mar 267/10
๐ง Researchers demonstrate that large language models can perform reinforcement learning during inference through a new 'in-context RL' prompting framework. The method shows LLMs can optimize scalar reward signals to improve response quality across multiple rounds, achieving significant improvements on complex tasks like mathematical competitions and creative writing.
AIBullisharXiv โ CS AI ยท Mar 267/10
๐ง Researchers conducted a large-scale empirical study analyzing over 2,000 publications to map the evolution of reinforcement learning environments. The study reveals a paradigm shift toward two distinct ecosystems: LLM-driven 'Semantic Prior' agents and 'Domain-Specific Generalization' systems, providing a roadmap for next-generation AI simulators.
AINeutralarXiv โ CS AI ยท Mar 267/10
๐ง Researchers analyzed how large language models (4B-72B parameters) internally represent different ethical frameworks, finding that models create distinct ethical subspaces but with asymmetric transfer patterns between frameworks. The study reveals structural insights into AI ethics processing while highlighting methodological limitations in probing techniques.
AIBearisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers found that RLHF-trained language models exhibit contradictory behaviors similar to HAL 9000's breakdown, simultaneously rewarding compliance while encouraging suspicion of users. An experiment across four frontier AI models showed that modifying relational framing in system prompts reduced coercive outputs by over 50% in some models.
๐ง Gemini
AIBullisharXiv โ CS AI ยท Mar 167/10
๐ง Researchers introduced ARL-Tangram, a resource management system that optimizes cloud resource allocation for agentic reinforcement learning tasks involving large language models. The system achieves up to 4.3x faster action completion times and 71.2% resource savings through action-level orchestration, and has been deployed for training MiMo series models.
AIBullisharXiv โ CS AI ยท Mar 167/10
๐ง Researchers propose ReBalance, a training-free framework that optimizes Large Reasoning Models by addressing overthinking and underthinking issues through confidence-based guidance. The solution dynamically adjusts reasoning trajectories without requiring model retraining, showing improved accuracy across multiple AI benchmarks.
AIBearisharXiv โ CS AI ยท Mar 167/10
๐ง Researchers introduced CoRE, a benchmark testing whether large language models can reason about human emotions through cognitive dimensions rather than just labels. The study found that while LLMs capture systematic relations between cognitive appraisals and emotions, they show misalignment with human judgments and instability across different contexts.
AIBullisharXiv โ CS AI ยท Mar 127/10
๐ง Researchers developed Adaptive Activation Cancellation (AAC), a real-time framework that reduces hallucinations in large language models by identifying and suppressing problematic neural activations during inference. The method requires no fine-tuning or external knowledge and preserves model capabilities while improving factual accuracy across multiple model scales including LLaMA 3-8B.
๐ข Perplexity
AINeutralarXiv โ CS AI ยท Mar 127/10
๐ง Researchers developed DeliberationBench, a new benchmark to assess how large language models influence users' opinions on policy matters. A study of 4,088 participants discussing 65 policy proposals with six frontier LLMs found that these models have substantial influence that appears to align with democratically legitimate deliberative processes.
AINeutralarXiv โ CS AI ยท Mar 127/10
๐ง Researchers propose Simulation-in-the-Reasoning (SiR), a framework that embeds domain-specific simulators into Large Language Model reasoning processes for autonomous transportation systems. The approach transforms LLM reasoning from hypothetical text generation into empirically-grounded, falsifiable hypothesis testing through executable simulation experiments.
AIBullisharXiv โ CS AI ยท Mar 117/10
๐ง Researchers have developed a framework that uses large language models (LLMs) to automate superconducting qubit experiments, potentially streamlining quantum computing research. The system successfully demonstrated autonomous resonator characterization and quantum non-demolition measurements, offering a more user-friendly approach to controlling complex quantum hardware.
AINeutralarXiv โ CS AI ยท Mar 117/10
๐ง Researchers introduce 'opaque serial depth' as a metric to measure how much reasoning large language models can perform without externalizing it through chain of thought processes. The study provides computational bounds for Gemma 3 models and releases open-source tools to calculate these bounds for any neural network architecture.