y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto
🤖All39,828🧠AI16,636⛓️Crypto12,938💎DeFi1,352🤖AI × Crypto827📰General8,075
🧠

AI

16,636 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

16636 articles
AIBullisharXiv – CS AI · 1d ago7/10
🧠

GRASP: Gated Regression-Aware Skill Proposer for Self-Improving LLM Agents

Researchers introduce GRASP, a method for improving large language model agents through controlled skill library updates that prevent performance regression. Tested across five base models on clinical benchmarks, GRASP achieves dramatic improvements (40.6% to 88.8% on MedAgentBench) while maintaining stability, outperforming existing self-improvement approaches by significant margins.

🧠 GPT-4🧠 GPT-5🧠 Gemini
AIBullisharXiv – CS AI · 1d ago7/10
🧠

JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

JAEGER is a new AI framework that extends audio-visual large language models from 2D to 3D space, enabling spatial grounding and reasoning in physical environments through RGB-D observations and multi-channel audio. The researchers introduce Neural Intensity Vector (Neural IV) for enhanced directional audio analysis and release SpatialSceneQA, a 61k-sample benchmark for training and evaluation.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning

Researchers propose PEAR, a novel supervised fine-tuning (SFT) method that optimizes language models with downstream reinforcement learning in mind rather than in isolation. The approach uses importance sampling to reweight training data, addressing a critical distribution mismatch between offline SFT and online RL stages, achieving up to 14.6% performance gains on mathematical reasoning benchmarks.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies

Researchers demonstrate that Evolution Strategies (ES) can effectively fine-tune large language models without catastrophic forgetting of prior tasks, contrary to recent concerns. By introducing Anchored Weight Decay (AWD), a regularization technique that constrains optimization toward initial parameters, the work shows ES-based continual learning is viable and computationally efficient compared to reinforcement learning approaches.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

A Predictive Law for On-Policy Self-Distillation From World Feedback

Researchers identify a linear predictive relationship between initial performance gaps and final improvements in on-policy self-distillation (OPSD), a reinforcement learning technique that uses rich world feedback instead of scalar rewards. This predictive law enables practitioners to forecast OPSD outcomes before full training, potentially accelerating RL post-training development and scaling.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

Self-Trained Verification for Training- and Test-Time Self-Improvement

Researchers propose Self-Trained Verification (STV), a novel approach that improves AI reasoning models by training verifiers to catch self-generated errors using reference solutions as supervision. The method doubles accuracy on hard math problems and achieves 14x improvement on scientific reasoning tasks, while also enabling more effective self-training through verifier-in-the-loop training that further boosts performance by 33%.

AIBearisharXiv – CS AI · 1d ago7/10
🧠

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

Researchers conducted 400 autonomous penetration testing runs across four LLM models against a fixed vulnerable target to measure attack consistency. Results show significant variation in exploitation success rates (25-85%) and distinctive failure modes per model, with Claude and Gemini 2.5 Flash-Lite substantially outperforming GPT-4o-mini and Qwen, raising critical questions about LLM reliability in security-critical autonomous operations.

🏢 Anthropic🧠 GPT-4🧠 Claude
AIBullisharXiv – CS AI · 1d ago7/10
🧠

No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

Researchers introduce Single-stage Sparse Retrieval (SSR), a new approach that replaces clustering-based compression with sparse autoencoders for multi-vector retrieval systems. The method achieves 15x faster indexing, 50% lower retrieval latency, and improved accuracy compared to ColBERTv2, addressing critical efficiency bottlenecks in large-scale information retrieval.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows

ProtoMedAgent introduces a framework that combines interpretable prototype networks with privacy-aware AI workflows to generate clinically accurate medical reports without the hallucination issues common in standard RAG systems. The approach achieves 91.2% faithfulness in clinical documentation while protecting patient privacy through k-anonymity and ℓ-diversity constraints.

AIBearisharXiv – CS AI · 1d ago7/10
🧠

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents

Researchers introduce SafeSearch, an automated red-teaming framework that identifies critical vulnerabilities in LLM-based search agents by testing them against 300 adversarial cases spanning misinformation, prompt injection, and other risks. The study reveals that current search agents achieve attack success rates up to 90.5%, with common defenses like reminder prompting providing minimal protection.

🧠 GPT-4
AIBullisharXiv – CS AI · 1d ago7/10
🧠

BitTP: The Lightweight Trajectory Prediction Model with BitLLM for Edge-Devices

Researchers introduce BitTP, a quantization technique that compresses LLM-based trajectory prediction models to 1.58-bit weights while maintaining full-precision activations, enabling deployment on resource-constrained edge devices. The approach not only reduces memory and latency but actually improves prediction accuracy by 14-21% compared to full-precision baselines, demonstrating that strategic quantization can serve as an effective regularizer.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding

Researchers introduce PARCEL, a new vision-language model architecture that reduces computational overhead during inference by dynamically balancing spatial pooling and query-based token compression. The approach outperforms existing methods across 27 benchmarks while maintaining flexibility to deploy at multiple computational budgets without retraining.

AIBearisharXiv – CS AI · 1d ago7/10
🧠

Audio Jailbreaks in Large Audio-Language Models: Taxonomy, Attack-Defense Analysis, and Cost-Aware Evaluation

Researchers have developed a comprehensive taxonomy of jailbreak attacks and defenses for Large Audio Language Models (LALMs), identifying vulnerabilities across semantic, acoustic, signal, and embedding layers. The study reveals that current defenses create tradeoffs between robustness and usability, highlighting the need for cost-aware safety evaluation beyond simple success-rate metrics.

AIBearisharXiv – CS AI · 1d ago7/10
🧠

BioRefusalAudit: Auditing Biosecurity Refusal Depth Using General and Domain-Fine-Tuned Sparse Autoencoders

Researchers introduce BioRefusalAudit, a framework using sparse autoencoders to evaluate the structural integrity of language model biosecurity refusals. The study reveals that five tested models fail to cleanly distinguish hazardous from benign biology, with refusals often disappearing under prompt formatting changes or output constraints, and some models refusing based on legality rather than actual biological hazard.

🧠 Llama
AIBearisharXiv – CS AI · 1d ago7/10
🧠

Evaluating Dataset Watermarking for Fine-tuning Traceability of Customized Diffusion Models: A Comprehensive Benchmark and Removal Approach

Researchers have established the first comprehensive evaluation framework for dataset watermarking in fine-tuned diffusion models, revealing significant vulnerabilities in existing protection methods. While current watermarking techniques show promise in universality and transmissibility, the study demonstrates practical watermark removal methods that can eliminate these protections without degrading model performance, exposing critical gaps in copyright and security safeguards.

AIBearisharXiv – CS AI · 1d ago7/10
🧠

Token Inflation: How Dishonest Providers Can Overcharge for Large Language Model Usage

Researchers demonstrate that LLM providers can systematically inflate token counts billed to users, with hidden reasoning tokens inflatable by up to 1,469% without detection. The core issue stems from a fundamental audit paradox: providers control both the tokenizer and execution, making verification impossible without independent verification mechanisms like trusted execution attestation or cryptographic proofs.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

ESPO: Early-Stopping Proximal Policy Optimization

Researchers propose ESPO, an optimization technique that improves large language model training by detecting and terminating failed reasoning trajectories early rather than forcing completion. The method reduces computational waste by over 20% while achieving superior performance on mathematical reasoning benchmarks compared to standard PPO training.

AINeutralarXiv – CS AI · 1d ago7/10
🧠

BioArc: Discovering Optimal Neural Architectures for Biological Foundation Models

BioArc introduces a neural architecture search framework that systematically discovers optimal model architectures for biological foundation models, moving beyond generic adaptation of NLP and computer vision models. The research identifies design principles and proposes methods to predict architectures for new biological tasks, providing foundational methodology for next-generation biology-focused AI systems.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

LFQ: Logit-aware Final-block Quantization for Boosting the Generation Quality of Low-Bit Quantized LLMs

Researchers introduce Logit-aware Final-block Quantization (LFQ), a technique that improves low-bit quantization of large language models by optimizing the final transformer block to preserve token probability distributions. This advancement addresses quality degradation in generative tasks while maintaining efficiency gains critical for deploying scaled LLMs.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering

Researchers propose BRACS, a training-free framework that reduces hallucinations in vision-language models by monitoring visual grounding during text generation and applying adaptive corrections only when needed. The method achieves significant improvements on hallucination benchmarks while maintaining computational efficiency comparable to baseline decoding speeds.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing

Researchers introduce e-valuator, a method that applies sequential hypothesis testing to convert AI verifier scores into statistically reliable decision rules for evaluating agent trajectories. The framework provides provable false alarm rate control and enables early termination of problematic sequences, offering a model-agnostic approach to improving the reliability of agentic AI systems.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

Croissant Tasks: A Metadata Format for Reproducible Machine Learning Evaluations

Researchers introduce Croissant Tasks, a machine-readable metadata format designed to improve reproducibility in machine learning research by abstracting implementation details into high-level specifications. The format enables autonomous AI agents to generate independent implementations of ML experiments, addressing critical reproducibility challenges that plague modern AI research.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

HARP: Hadamard-Preconditioned Adaptive Rotation Processor for Extreme LLM Quantization

Researchers introduce HARP, a learnable adaptive rotation processor that improves extreme low-bit quantization for large language models by replacing fixed Hadamard transforms with optimizable structured orthogonal processors. The technique maintains full-precision equivalence while achieving better perplexity and accuracy across 2-4 bit quantization settings on models up to 70B parameters, with deployment speeds competitive with standard approaches.

🏢 Perplexity
AIBullisharXiv – CS AI · 1d ago7/10
🧠

SkillsInjector: Dynamic Skill Context Construction for LLM Agents

SkillsInjector introduces a dynamic method for optimizing how large language model agents access and utilize skill libraries. Rather than treating skill selection as static, the approach adaptively determines which skills to include, how many to present, and how to describe them based on task requirements, achieving measurable performance improvements across multiple benchmarks.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

CityGen: Structure-Guided City-Style Synthesis for Cross-City Autonomous Driving

Researchers introduce CityGen, a diffusion-based framework that enables autonomous driving systems to generalize across different cities without labeled training data. The approach uses HD-map guidance and visual prompts to synthesize city-specific driving scenarios, addressing a critical scalability challenge in deploying autonomous vehicles to new geographic regions.

← PrevPage 9 of 666Next →
Filters
Sentiment
Importance
Sort
Stay Updated
Everything combined