350 articles tagged with #language-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers propose MDS (Multi-turn Dialogue Selection), a framework for improving instruction-tuned language models by intelligently selecting high-quality multi-turn dialogue data. The method combines global coverage analysis with local structural evaluation to filter noisy datasets, demonstrating superior performance across multiple benchmarks compared to existing selection approaches.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers identify a critical failure mode in non-autoregressive diffusion language models caused by proximity bias, where the denoising process concentrates on adjacent tokens, creating spatial error propagation. They propose a minimal-intervention approach using a lightweight planner and temperature annealing to guide early token selection, achieving substantial improvements on reasoning and planning tasks.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers introduce a Cross-Lingual Mapping Task during LLM pre-training to improve multilingual performance across languages with varying data availability. The method achieves significant improvements in machine translation, cross-lingual question answering, and multilingual understanding without requiring extensive parallel data.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers used computational lesions on multilingual large language models to identify how the brain processes language across different languages. By selectively disabling parameters, they found that a shared computational core handles 60% of multilingual processing, while language-specific components fine-tune predictions for individual languages, providing new insights into how multilingual AI aligns with human neurobiology.
AIBullisharXiv โ CS AI ยท 2d ago6/10
๐ง Researchers have optimized the Bielik v3 language models (7B and 11B parameters) by replacing universal tokenizers with Polish-specific vocabulary, addressing inefficiencies in morphological representation. This optimization reduces token fertility, lowers inference costs, and expands effective context windows while maintaining multilingual capabilities through advanced training techniques including supervised fine-tuning and reinforcement learning.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers demonstrate that five mature small language model architectures (1.5B-8B parameters) share nearly identical emotion vector representations despite exhibiting opposite behavioral profiles, suggesting emotion geometry is a universal feature organized early in model development. The study also deconstructs prior emotion-vector research methodology into four distinct layers of confounding factors, revealing that single correlations between studies cannot safely establish comparability.
๐ง Llama
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers introduce Cross-lingual Speech Language Models (CSLM), an efficient training method for building multilingual speech AI systems using discrete speech tokens. The approach achieves cross-modal and cross-lingual alignment through continual pre-training and instruction fine-tuning, enabling effective speech LLMs without requiring massive datasets.
AIBullisharXiv โ CS AI ยท 2d ago6/10
๐ง Researchers propose a method for training open-source language models to simulate how programming students learn and debug code, using authentic student data serialized into conversational formats. This approach addresses privacy and cost concerns with proprietary models while demonstrating improved performance in replicating student problem-solving behavior compared to existing baselines.
AIBullisharXiv โ CS AI ยท 2d ago6/10
๐ง Researchers introduce MEDS, a memory-enhanced reward shaping framework that addresses a critical reinforcement learning failure mode where language models repeatedly generate similar errors. By tracking historical behavioral patterns and penalizing recurring mistake clusters, the method achieves consistent performance improvements across multiple datasets and models while increasing sampling diversity.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers propose Triadic Suffix Tokenization (TST), a novel tokenization scheme that addresses how large language models process numbers by fragmenting digits into three-digit groups with explicit magnitude markers. The method aims to improve arithmetic and scientific reasoning in LLMs by preserving decimal structure and positional information, with two implementation variants offering scalability across 33 orders of magnitude.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers conducted a mechanistic analysis of looped reasoning language models, discovering that these recurrent architectures learn inference stages similar to feedforward models but execute them iteratively. The study reveals that recurrent blocks converge to distinct fixed points with stable attention behavior, providing architectural insights for improving LLM reasoning capabilities.
AINeutralarXiv โ CS AI ยท 3d ago6/10
๐ง Researchers present a novel approach using agentic language model feedback frameworks to generate planning domains from natural language descriptions augmented with symbolic information. The method employs heuristic search over model space optimized by various feedback mechanisms, including landmarks and plan validator outputs, to improve domain quality for practical deployment.
AINeutralarXiv โ CS AI ยท 3d ago6/10
๐ง Researchers propose StaRPO, a reinforcement learning framework that improves large language model reasoning by incorporating stability metrics alongside task rewards. The method uses Autocorrelation Function and Path Efficiency measurements to evaluate logical coherence and goal-directedness, demonstrating improved accuracy and reasoning consistency across four benchmarks.
AINeutralarXiv โ CS AI ยท 3d ago6/10
๐ง Researchers analyzed how large language models decide whether to act on predictions or escalate to humans, finding that models use inconsistent and miscalibrated thresholds across five real-world domains. Supervised fine-tuning on chain-of-thought reasoning proved most effective at establishing robust escalation policies that generalize across contexts, suggesting escalation behavior requires explicit characterization before AI system deployment.
AINeutralarXiv โ CS AI ยท 3d ago6/10
๐ง Researchers introduce Litmus (Re)Agent, an agentic system that predicts how multilingual AI models will perform on tasks lacking direct benchmark data. Using a controlled benchmark of 1,500 questions across six tasks, the system decomposes queries into hypotheses and synthesizes predictions through structured reasoning, outperforming competing approaches particularly when direct evidence is sparse.
AINeutralarXiv โ CS AI ยท 3d ago6/10
๐ง Researchers introduce PerMix-RLVR, a training method that enables large language models to maintain persona flexibility while preserving task robustness. The approach addresses a fundamental trade-off in reinforcement learning with verifiable rewards, where models become less responsive to persona prompts but gain improved performance on objective tasks.
AIBullisharXiv โ CS AI ยท 3d ago6/10
๐ง Researchers introduce RecaLLM, a post-trained language model that addresses the 'lost-in-thought' phenomenon where retrieval performance degrades during extended reasoning chains. The model interleaves explicit in-context retrieval with reasoning steps and achieves strong performance on long-context benchmarks using training data significantly shorter than existing approaches.
AINeutralarXiv โ CS AI ยท 3d ago6/10
๐ง Researchers introduce ASPECT, a novel reinforcement learning framework that uses large language models as semantic operators to enable zero-shot transfer learning across novel tasks. By conditioning a text-based VAE on LLM-generated task descriptions, the approach allows agents to reuse policies on structurally similar but previously unseen tasks without discrete category constraints.
AIBearisharXiv โ CS AI ยท 3d ago6/10
๐ง Researchers conducted a large-scale computational analysis comparing 17,790 articles from Grokipedia, Elon Musk's AI-generated encyclopedia, against Wikipedia. The study found that Grokipedia articles are longer but contain fewer citations, with some entries showing systematic rightward political bias in media sources, particularly in history, religion, and arts sections.
๐ข xAI๐ง Grok
AINeutralarXiv โ CS AI ยท 6d ago6/10
๐ง Researchers investigate in-context learning (ICL) in speech language models, revealing that speaking rate significantly affects model performance and acoustic mimicry, while induction heads play a causal role identical to text-based ICL. The study bridges the gap between text and speech domains by analyzing how models learn from demonstrations in text-to-speech tasks.
AINeutralarXiv โ CS AI ยท 6d ago6/10
๐ง Researchers have developed Luwen, an open-source Chinese legal language model built on Baichuan that uses continual pre-training, supervised fine-tuning, and retrieval-augmented generation to excel at legal tasks. The model outperforms baselines on five legal benchmarks including judgment prediction, judicial examination, and legal reasoning, demonstrating effective domain adaptation for specialized legal applications.
AINeutralarXiv โ CS AI ยท 6d ago6/10
๐ง Researchers evaluated how well large language models can perform formal grammar-based translation tasks using in-context learning, finding that LLM translation accuracy degrades significantly with grammar complexity and sentence length. The study identifies specific failure modes including vocabulary hallucination and untranslated source words, revealing fundamental limitations in LLMs' ability to apply formal grammatical rules to translation tasks.
AINeutralarXiv โ CS AI ยท 6d ago6/10
๐ง Researchers introduce Commander-GPT, a modular framework that orchestrates multiple specialized AI agents for multimodal sarcasm detection rather than relying on a single LLM. The system achieves 4.4-11.7% F1 score improvements over existing baselines on standard benchmarks, demonstrating that task decomposition and intelligent routing can overcome LLM limitations in understanding sarcasm.
๐ง GPT-4๐ง Gemini
AINeutralarXiv โ CS AI ยท 6d ago6/10
๐ง Researchers introduce a framework for studying how emotional states affect decision-making in small language models (SLMs) used as autonomous agents. Using activation steering techniques grounded in real-world emotion-eliciting texts, they benchmark SLMs across game-theoretic scenarios and find that emotional perturbations systematically influence strategic choices, though behaviors often remain unstable and misaligned with human patterns.
AINeutralarXiv โ CS AI ยท 6d ago6/10
๐ง Researchers propose T-STAR, a novel reinforcement learning framework that structures multi-step agent trajectories as trees rather than independent chains, enabling better credit assignment for LLM agents. The method uses tree-based reward propagation and surgical policy optimization to improve reasoning performance across embodied, interactive, and planning tasks.