311 articles tagged with #language-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv โ CS AI ยท Mar 36/104
๐ง Researchers developed a framework using cognitive models from psychology to analyze value trade-offs in language models, revealing how AI systems balance competing priorities like politeness and directness. The study shows LLMs' behavioral profiles shift predictably when prompted to prioritize certain goals and are influenced by reasoning budgets and training dynamics.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers have introduced PiKV, an open-source KV cache management framework designed to optimize memory and communication costs for Mixture of Experts (MoE) language models across multi-GPU and multi-node inference. The system uses expert-sharded storage, intelligent routing, adaptive scheduling, and compression to improve efficiency in large-scale AI model deployment.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers introduce MENLO, a new framework for evaluating native-like quality in large language model responses across 47 languages. The study reveals significant improvements in multilingual LLM performance through reinforcement learning and fine-tuning, though gaps with human judgment persist.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers introduce soft-masking (SM), a novel approach for diffusion-based language models that improves upon traditional binary masked diffusion by blending mask token embeddings with predicted tokens. Testing on models up to 7B parameters shows consistent improvements in performance metrics and coding benchmarks.
AIBullisharXiv โ CS AI ยท Mar 36/104
๐ง Researchers found that fine-tuning large language models with explanations attached to labels significantly improves classification accuracy compared to label-only training. Surprisingly, even random token sequences that mimic explanation structure provide similar benefits, suggesting the improvement comes from increased token budget and regularization rather than semantic meaning.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers have developed GeoBPE, a new protein structure tokenization method that converts protein backbone structures into discrete geometric tokens, achieving over 10x compression and data efficiency improvements. The approach uses geometry-grounded byte-pair encoding to create hierarchical vocabularies of protein structural primitives that align with functional families and enable better multimodal protein modeling.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers introduce WavefrontDiffusion, a new dynamic decoding approach for Diffusion Language Models that improves text generation quality by expanding from finalized positions rather than using fixed blocks. The method achieves state-of-the-art performance on reasoning and code generation benchmarks while maintaining computational efficiency equivalent to existing block-based methods.
AIBullisharXiv โ CS AI ยท Mar 36/102
๐ง Researchers propose a new inference technique called "inner loop inference" that improves pretrained transformer models' performance by repeatedly applying selected layers during inference without additional training. The method yields consistent but modest accuracy improvements across benchmarks by allowing more refinement of internal representations.
AIBullisharXiv โ CS AI ยท Mar 27/1016
๐ง Researchers introduce PseudoAct, a new framework that uses pseudocode synthesis to improve large language model agent planning and action control. The method achieves significant performance improvements over existing reactive approaches, with a 20.93% absolute gain in success rate on FEVER benchmark and new state-of-the-art results on HotpotQA.
AIBullisharXiv โ CS AI ยท Mar 26/1012
๐ง Researchers developed a new discriminative AI model based on Qwen3-0.6B that can efficiently segment ultra-long documents up to 13k tokens for better information retrieval. The model achieves superior performance compared to generative alternatives while delivering two orders of magnitude faster inference on the Wikipedia WIKI-727K dataset.
AINeutralarXiv โ CS AI ยท Mar 27/1015
๐ง Research reveals that reward model accuracy alone doesn't determine effectiveness in RLHF systems. The study proves that low reward variance can create flat optimization landscapes, making even perfectly accurate reward models inefficient teachers that underperform less accurate models with higher variance.
AIBullisharXiv โ CS AI ยท Mar 26/1014
๐ง Researchers introduce Latent Self-Consistency (LSC), a new method for improving Large Language Model output reliability across both short and long-form reasoning tasks. LSC uses learnable token embeddings to select semantically consistent responses with only 0.9% computational overhead, outperforming existing consistency methods like Self-Consistency and Universal Self-Consistency.
AIBullisharXiv โ CS AI ยท Mar 27/1016
๐ง Researchers introduce DiffuMamba, a new diffusion language model using Mamba backbone architecture that achieves up to 8.2x higher inference throughput than Transformer-based models while maintaining comparable performance. The model demonstrates linear scaling with sequence length and represents a significant advancement in efficient AI text generation systems.
AINeutralarXiv โ CS AI ยท Mar 27/1017
๐ง Researchers introduce RooflineBench, a framework for measuring performance capabilities of Small Language Models on edge devices using operational intensity analysis. The study reveals that sequence length significantly impacts performance, model depth causes efficiency regression, and structural improvements like Multi-head Latent Attention can unlock better hardware utilization.
AIBullisharXiv โ CS AI ยท Feb 276/106
๐ง Researchers have developed SmartChunk retrieval, a query-adaptive framework that improves retrieval-augmented generation (RAG) systems by dynamically adjusting chunk sizes and compression for document question answering. The system uses a planner to predict optimal chunk abstraction levels and a compression module to create efficient embeddings, outperforming existing RAG baselines while reducing costs.
AIBullisharXiv โ CS AI ยท Feb 276/108
๐ง Researchers introduce a quantum-inspired sequence modeling framework that uses complex-valued wave functions and quantum interference for language processing. The approach shows theoretical advantages over traditional recurrent neural networks by utilizing quantum dynamics and the Born rule for token probability extraction.
AIBullisharXiv โ CS AI ยท Feb 275/107
๐ง Researchers have developed Decoder-based Sense Knowledge Distillation (DSKD), a new framework that integrates lexical resources into decoder-style large language models during training. The method enhances knowledge distillation performance while enabling generative models to inherit structured semantics without requiring dictionary lookup during inference.
AIBullisharXiv โ CS AI ยท Feb 276/105
๐ง Researchers demonstrated that prompt optimization using Genetic-Pareto (GEPA) significantly improves language models' ability to detect errors in medical notes. The technique boosted accuracy from 0.669 to 0.785 with GPT-5 and from 0.578 to 0.690 with Qwen3-32B, achieving state-of-the-art performance on medical error detection benchmarks.
AIBullisharXiv โ CS AI ยท Feb 276/107
๐ง Researchers developed an AI-powered text summarization system using GPT-4o to create dyslexia-friendly content for approximately 10% of the global population who struggle with reading fluency. The system successfully generates readable summaries for news articles within four attempts, achieving stable performance across 2,000 samples with readability scores meeting accessibility targets.
$NEAR
AIBullisharXiv โ CS AI ยท Feb 276/105
๐ง Researchers introduce dLLM, an open-source framework that unifies core components of diffusion language modeling including training, inference, and evaluation. The framework enables users to reproduce, finetune, and deploy large diffusion language models like LLaDA and Dream while providing tools to build smaller models from scratch with accessible compute resources.
AIBullisharXiv โ CS AI ยท Feb 276/104
๐ง Researchers have developed Hierarchy-of-Groups Policy Optimization (HGPO), a new reinforcement learning method that improves AI agents' performance on long-horizon tasks by addressing context inconsistency issues in stepwise advantage estimation. The method shows significant improvements over existing approaches when tested on challenging agentic tasks using Qwen2.5 models.
AIBullisharXiv โ CS AI ยท Feb 276/108
๐ง Researchers developed a new framework called 'Stitching Noisy Diffusion Thoughts' that improves AI reasoning by combining the best parts of multiple solution attempts rather than just selecting complete answers. The method achieves up to 23.8% accuracy improvement on math and coding tasks while reducing computation time by 1.8x compared to existing approaches.
AINeutralarXiv โ CS AI ยท Feb 276/1011
๐ง Researchers identify why Diffusion Language Models (DLMs) struggle with parallel token generation, finding that training data structure forces autoregressive-like behavior. They propose NAP, a data-centric approach using multiple independent reasoning trajectories that improves parallel decoding performance on math benchmarks.
AIBullisharXiv โ CS AI ยท Feb 276/106
๐ง Researchers developed an unbiased sliced Wasserstein RBF kernel with rotary positional embedding to improve audio captioning systems by addressing exposure bias and temporal relationship issues. The method shows significant improvements in caption quality and text-to-audio retrieval accuracy on AudioCaps and Clotho datasets, while also enhancing audio reasoning capabilities in large language models.
AIBullisharXiv โ CS AI ยท Feb 276/107
๐ง Researchers have identified 'modal difference vectors' in language models that can distinguish between possible, impossible, and nonsensical statements, revealing better modal categorization abilities than previously thought. The study shows these vectors emerge consistently as models become more capable and can even predict human judgment patterns about event plausibility.