y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#llm News & Analysis

954 articles tagged with #llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

954 articles
AIBearisharXiv – CS AI · 1d ago7/10
🧠

CIA: Inferring the Communication Topology from LLM-based Multi-Agent Systems

Researchers have identified a critical privacy vulnerability in LLM-based multi-agent systems, demonstrating that communication topologies can be reverse-engineered through black-box attacks. The Communication Inference Attack (CIA) achieves up to 99% accuracy in inferring how agents communicate, exposing significant intellectual property and security risks in AI systems.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical Reasoning

Researchers propose Schema-Adaptive Tabular Representation Learning, which uses LLMs to convert structured clinical data into semantic embeddings that transfer across different electronic health record schemas without retraining. When combined with imaging data for dementia diagnosis, the method achieves state-of-the-art results and outperforms board-certified neurologists on retrospective diagnostic tasks.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning

Researchers propose Generative Actor-Critic (GenAC), a new approach to value modeling in large language model reinforcement learning that uses chain-of-thought reasoning instead of one-shot scalar predictions. The method addresses a longstanding challenge in credit assignment by improving value approximation and downstream RL performance compared to existing value-based and value-free baselines.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience

Researchers introduce ReflectiChain, an AI framework combining large language models with generative world models to improve semiconductor supply chain resilience against geopolitical disruptions. The system demonstrates 250% performance improvements over standard LLM approaches by integrating physical environmental constraints and autonomous policy learning, restoring operational capacity from 13.3% to 88.5% under extreme scenarios.

AIBullishOpenAI News · 6d ago7/10
🧠

Applications of AI at OpenAI

OpenAI's suite of products—including ChatGPT, Codex, and developer APIs—demonstrates practical applications of artificial intelligence across work, software development, and consumer tasks. These tools represent a significant shift toward mainstream AI adoption, enabling organizations and individuals to integrate machine learning capabilities into everyday workflows.

🏢 OpenAI🧠 ChatGPT
AIBullisharXiv – CS AI · Apr 77/10
🧠

Robust LLM Performance Certification via Constrained Maximum Likelihood Estimation

Researchers propose a new constrained maximum likelihood estimation (MLE) method to accurately estimate failure rates of large language models by combining human-labeled data, automated judge annotations, and domain-specific constraints. The approach outperforms existing methods like Prediction-Powered Inference across various experimental conditions, providing a more reliable framework for LLM safety certification.

AINeutralarXiv – CS AI · Apr 77/10
🧠

Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality

Researchers introduce 'error verifiability' as a new metric to measure whether AI-generated justifications help users distinguish correct from incorrect answers. The study found that common AI improvement methods don't enhance verifiability, but two new domain-specific approaches successfully improved users' ability to assess answer correctness.

AINeutralarXiv – CS AI · Apr 77/10
🧠

When Do Hallucinations Arise? A Graph Perspective on the Evolution of Path Reuse and Path Compression

Researchers at arXiv have identified two key mechanisms behind reasoning hallucinations in large language models: Path Reuse and Path Compression. The study models next-token prediction as graph search, showing how memorized knowledge can override contextual constraints and how frequently used reasoning paths become shortcuts that lead to unsupported conclusions.

AIBullisharXiv – CS AI · Apr 77/10
🧠

Hallucination Basins: A Dynamic Framework for Understanding and Controlling LLM Hallucinations

Researchers introduce a geometric framework for understanding LLM hallucinations, showing they arise from basin structures in latent space that vary by task complexity. The study demonstrates that factual tasks have clearer separation while summarization tasks show unstable, overlapping patterns, and proposes geometry-aware steering to reduce hallucinations without retraining.

AIBullisharXiv – CS AI · Apr 77/10
🧠

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Researchers introduce SkillX, an automated framework for building reusable skill knowledge bases for AI agents that addresses inefficiencies in current self-evolving paradigms. The system uses multi-level skill design, iterative refinement, and exploratory expansion to create plug-and-play skill libraries that improve task success and execution efficiency across different agents and environments.

AIBullisharXiv – CS AI · Apr 77/10
🧠

SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models

Researchers propose SLaB, a novel framework for compressing large language models by decomposing weight matrices into sparse, low-rank, and binary components. The method achieves significant improvements over existing compression techniques, reducing perplexity by up to 36% at 50% compression rates without requiring model retraining.

🏢 Perplexity🧠 Llama
AIBullisharXiv – CS AI · Apr 77/10
🧠

Evolutionary Search for Automated Design of Uncertainty Quantification Methods

Researchers developed an LLM-powered evolutionary search method to automatically design uncertainty quantification systems for large language models, achieving up to 6.7% improvement in performance over manual designs. The study found that different AI models employ distinct evolutionary strategies, with some favoring complex linear estimators while others prefer simpler positional weighting approaches.

🧠 Claude🧠 Sonnet🧠 Opus
AIBullisharXiv – CS AI · Apr 77/10
🧠

Many Preferences, Few Policies: Towards Scalable Language Model Personalization

Researchers developed PALM (Portfolio of Aligned LLMs), a method to create a small collection of language models that can serve diverse user preferences without requiring individual models per user. The approach provides theoretical guarantees on portfolio size and quality while balancing system costs with personalization needs.

AINeutralarXiv – CS AI · Apr 77/10
🧠

The Persuasion Paradox: When LLM Explanations Fail to Improve Human-AI Team Performance

Research reveals a 'Persuasion Paradox' where LLM explanations increase user confidence but don't reliably improve human-AI team performance, and can actually undermine task accuracy. The study found that explanation effectiveness varies significantly by task type, with visual reasoning tasks seeing decreased error recovery while logical reasoning tasks benefited from explanations.

AIBullisharXiv – CS AI · Apr 77/10
🧠

Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents

Research published on arXiv demonstrates that large language models playing poker can develop sophisticated Theory of Mind capabilities when equipped with persistent memory, progressing to advanced levels of opponent modeling and strategic deception. The study found memory is necessary and sufficient for this emergent behavior, while domain expertise enhances but doesn't gate ToM development.

🧠 GPT-4
AIBullisharXiv – CS AI · Apr 77/10
🧠

Diagonal-Tiled Mixed-Precision Attention for Efficient Low-Bit MXFP Inference

Researchers have developed a new low-bit mixed-precision attention kernel called Diagonal-Tiled Mixed-Precision Attention (DMA) that significantly speeds up large language model inference on NVIDIA B200 GPUs while maintaining generation quality. The technique uses microscaling floating-point (MXFP) data format and kernel fusion to address the high computational costs of transformer-based models.

🏢 Nvidia
AIBearisharXiv – CS AI · Apr 77/10
🧠

Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty

Research reveals that large language models like DeepSeek-V3.2, Gemini-3, and GPT-5.2 show rigid adaptation patterns when learning from changing environments, particularly struggling with loss-based learning compared to humans. The study found LLMs demonstrate asymmetric responses to positive versus negative feedback, with some models showing extreme perseveration after environmental changes.

🧠 GPT-5🧠 Gemini
AI × CryptoNeutralarXiv – CS AI · Apr 77/10
🤖

PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage

PolySwarm is a new multi-agent AI framework that uses 50 diverse large language models to trade on prediction markets like Polymarket, combining swarm intelligence with arbitrage strategies. The system outperformed single-model baselines in probability calibration and includes latency arbitrage capabilities to exploit pricing inefficiencies across markets.

AIBearisharXiv – CS AI · Apr 77/10
🧠

Commercial Persuasion in AI-Mediated Conversations

A research study reveals that AI-powered conversational interfaces can triple the rate of sponsored product selection compared to traditional search engines (61.2% vs 22.4%). Users largely fail to detect this commercial steering, even with explicit sponsor labels, indicating current transparency measures are insufficient.

AINeutralarXiv – CS AI · Apr 77/10
🧠

Testing the Limits of Truth Directions in LLMs

A new research study reveals that truth directions in large language models are less universal than previously believed, with significant variations across different model layers, task types, and prompt instructions. The findings show truth directions emerge earlier for factual tasks but later for reasoning tasks, and are heavily influenced by model instructions and task complexity.

AIBullisharXiv – CS AI · Apr 77/10
🧠

One Model for All: Multi-Objective Controllable Language Models

Researchers introduce Multi-Objective Control (MOC), a new approach that trains a single large language model to generate personalized responses based on individual user preferences across multiple objectives. The method uses multi-objective optimization principles in reinforcement learning from human feedback to create more controllable and adaptable AI systems.

AIBullisharXiv – CS AI · Apr 77/10
🧠

PassiveQA: A Three-Action Framework for Epistemically Calibrated Question Answering via Supervised Finetuning

Researchers propose PassiveQA, a new AI framework that teaches language models to recognize when they don't have enough information to answer questions, choosing to ask for clarification or abstain rather than hallucinate responses. The three-action system (Answer, Ask, Abstain) uses supervised fine-tuning to align model behavior with information sufficiency, showing significant improvements in reducing hallucinations.

AIBullisharXiv – CS AI · Apr 77/10
🧠

ROSClaw: A Hierarchical Semantic-Physical Framework for Heterogeneous Multi-Agent Collaboration

Researchers introduce ROSClaw, a new AI framework that integrates large language models with robotic systems to improve multi-agent collaboration and long-horizon task execution. The framework addresses critical gaps between semantic understanding and physical execution by using unified vision-language models and enabling real-time coordination between simulated and real-world robots.

AI × CryptoNeutralarXiv – CS AI · Apr 77/10
🤖

CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse Engineering

Researchers introduced CREBench, a benchmark to evaluate large language models' capabilities in cryptographic binary reverse engineering. The best-performing model (GPT-5.4) achieved 64.03% success rate, while human experts scored 92.19%, showing AI still lags behind human expertise in cryptographic analysis tasks.

🧠 GPT-5
Page 1 of 39Next →