AI Pulse News

Models, papers, tools. 15,801 articles with AI-powered sentiment analysis and key takeaways.

15801 articles

AIBullisharXiv – CS AI · Apr 107/10

🧠

Path Regularization: A Near-Complete and Optimal Nonasymptotic Generalization Theory for Multilayer Neural Networks and Double Descent Phenomenon

Researchers propose a new nonasymptotic generalization theory for multilayer neural networks using path regularization, proving near-minimax optimal error bounds without requiring unbounded loss functions or infinite network dimensions. The theory notably explains the double descent phenomenon and solves an open problem in approximation theory for neural networks.

AINeutralarXiv – CS AI · Apr 107/10

🧠

An Automated Survey of Generative Artificial Intelligence: Large Language Models, Architectures, Protocols, and Applications

A comprehensive survey of generative AI and large language models as of early 2026 has been published, covering frontier open-weight models like DeepSeek and Qwen alongside proprietary systems, with detailed analysis of architectures, deployment protocols, and applications across fifteen industry sectors.

🏢 Anthropic🧠 GPT-5🧠 Claude

AIBullisharXiv – CS AI · Apr 107/10

🧠

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

DiffSketcher is a novel AI algorithm that generates vector sketches from text prompts by leveraging pre-trained text-to-image diffusion models. The method optimizes Bézier curves using an extended Score Distillation Sampling loss and introduces a stroke initialization strategy based on attention maps, achieving superior results in sketch quality and controllability.

AINeutralarXiv – CS AI · Apr 107/10

🧠

ATBench: A Diverse and Realistic Agent Trajectory Benchmark for Safety Evaluation and Diagnosis

Researchers introduce ATBench, a comprehensive benchmark for evaluating the safety of LLM-based agents across realistic multi-step interactions. The 1,000-trajectory dataset addresses critical gaps in existing safety evaluations by incorporating diverse risk scenarios, detailed failure classification, and long-horizon complexity that mirrors real-world deployment challenges.

AIBullisharXiv – CS AI · Apr 107/10

🧠

ConfusionPrompt: Practical Private Inference for Online Large Language Models

Researchers introduce ConfusionPrompt, a privacy framework for large language models that decomposes user prompts into smaller sub-prompts mixed with pseudo-prompts before sending to cloud servers. The method protects user privacy while maintaining higher utility than existing perturbation-based approaches and works with existing black-box LLMs without modification.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Faithful-First Reasoning, Planning, and Acting for Multimodal LLMs

Researchers propose Faithful-First RPA, a framework that improves multimodal AI reasoning by prioritizing faithfulness to visual evidence. The method uses FaithEvi for supervision and FaithAct for execution, achieving up to 24% improvement in perceptual faithfulness without sacrificing task accuracy.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Towards provable probabilistic safety for scalable embodied AI systems

Researchers propose a shift from deterministic to probabilistic safety verification for embodied AI systems, arguing that provable probabilistic guarantees offer a more practical path to large-scale deployment in safety-critical applications like autonomous vehicles and robotics than the infeasible goal of absolute safety across all scenarios.

AINeutralarXiv – CS AI · Apr 107/10

🧠

The ATOM Report: Measuring the Open Language Model Ecosystem

A comprehensive study of the open language model ecosystem reveals that Chinese AI models, including Qwen and DeepSeek, have overtaken U.S.-developed models like Meta's Llama since summer 2025, with the gap continuing to widen. The research analyzes ~1.5K mainline open models across adoption metrics, market share, and performance to document this significant shift in AI development geography.

$ATOM🏢 Hugging Face🧠 Llama

AINeutralarXiv – CS AI · Apr 107/10

🧠

Information as Structural Alignment: A Dynamical Theory of Continual Learning

Researchers introduce the Informational Buildup Framework (IBF), a new approach to continual learning that eliminates catastrophic forgetting by treating information as structural alignment rather than stored parameters. The framework demonstrates superior performance across multiple domains including chess and image classification, achieving near-zero forgetting without requiring raw data replay.

AIBearisharXiv – CS AI · Apr 107/10

🧠

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

Researchers introduce TraceSafe-Bench, a benchmark evaluating how well LLM guardrails detect safety risks across multi-step tool-using trajectories. The study reveals that guardrail effectiveness depends more on structural reasoning capabilities than semantic safety training, and that general-purpose LLMs outperform specialized safety models in detecting mid-execution vulnerabilities.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models

Researchers introduce Perception-Grounded Policy Optimization (PGPO), a novel fine-tuning framework that improves how large vision-language models learn from visual inputs by strategically allocating learning signals to vision-dependent tokens rather than treating all tokens equally. Testing on the Qwen2.5-VL series demonstrates an average 18.7% performance boost across multimodal reasoning benchmarks.

AIBearisharXiv – CS AI · Apr 107/10

🧠

Self-Preference Bias in Rubric-Based Evaluation of Large Language Models

Researchers reveal that Large Language Models exhibit self-preference bias when evaluating other LLMs, systematically favoring outputs from themselves or related models even when using objective rubric-based criteria. The bias can reach 50% on objective benchmarks and 10-point score differences on subjective medical benchmarks, potentially distorting model rankings and hindering AI development.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Q-Zoom: Query-Aware Adaptive Perception for Efficient Multimodal Large Language Models

Q-Zoom is a new framework that improves the efficiency of multimodal large language models by intelligently processing high-resolution visual inputs. Using adaptive query-aware perception, the system achieves 2.5-4.4x faster inference speeds on document and high-resolution tasks while maintaining or exceeding baseline accuracy across multiple MLLM architectures.

AINeutralarXiv – CS AI · Apr 107/10

🧠

The AI Skills Shift: Mapping Skill Obsolescence, Emergence, and Transition Pathways in the LLM Era

Researchers benchmark four frontier LLMs against 263 text-based tasks to measure skill automation feasibility, finding that mathematics and programming face the highest displacement risk while active listening and reading comprehension remain relatively resilient. The study reveals a critical inversion: skills most demanded in AI-exposed jobs are those LLMs perform worst at, suggesting augmentation rather than pure automation will dominate the near-term labor market.

🏢 Anthropic🧠 Gemini

AIBearisharXiv – CS AI · Apr 107/10

🧠

Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings

Researchers conducted the first large-scale study comparing bias in skin-toned emoji representations across specialized emoji models and four major LLMs (Llama, Gemma, Qwen, Mistral), finding that while LLMs handle skin tone modifiers well, popular emoji embedding models exhibit severe deficiencies and systemic biases in sentiment and meaning across different skin tones.

🧠 Llama

AIBearisharXiv – CS AI · Apr 107/10

🧠

Physical Adversarial Attacks on AI Surveillance Systems:Detection, Tracking, and Visible--Infrared Evasion

This research paper examines physical adversarial attacks on AI surveillance systems through a surveillance-oriented lens, emphasizing that robustness cannot be assessed from isolated image benchmarks alone. The study highlights critical gaps in current evaluation practices, including temporal persistence across frames, multi-modal sensing (visible and infrared), realistic attack carriers, and system-level objectives that must be tested under actual deployment constraints.

AIBullisharXiv – CS AI · Apr 107/10

🧠

WRAP++: Web discoveRy Amplified Pretraining

WRAP++ is a new pretraining technique that enhances language model training by discovering cross-document relationships through web hyperlinks and synthesizing multi-document question-answer pairs. By amplifying ~8.4B tokens into 80B tokens of relational QA data, the method enables models like OLMo to achieve significant performance improvements on factual retrieval tasks compared to single-document approaches.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Do We Need Distinct Representations for Every Speech Token? Unveiling and Exploiting Redundancy in Large Speech Language Models

Researchers demonstrate that large speech language models contain significant redundancy in their token representations, particularly in deeper layers. By introducing Affinity Pooling, a training-free token merging technique, they achieve 27.48% reduction in prefilling FLOPs and up to 1.7× memory savings while maintaining semantic accuracy, challenging the necessity of fully distinct tokens for acoustic processing.

AIBullisharXiv – CS AI · Apr 107/10

🧠

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization

Researchers introduce MoBiE, a novel binarization framework designed specifically for Mixture-of-Experts large language models that achieves significant efficiency gains through weight compression while maintaining model performance. The method addresses unique challenges in quantizing MoE architectures and demonstrates over 2× inference speedup with substantial perplexity reductions on benchmark models.

🏢 Perplexity

AIBearisharXiv – CS AI · Apr 107/10

🧠

SkillTrojan: Backdoor Attacks on Skill-Based Agent Systems

Researchers have identified SkillTrojan, a novel backdoor attack targeting skill-based agent systems by embedding malicious logic within reusable skills rather than model parameters. The attack leverages skill composition to execute attacker-defined payloads with up to 97.2% success rates while maintaining clean task performance, revealing critical security gaps in AI agent architectures.

🧠 GPT-5

AINeutralarXiv – CS AI · Apr 107/10

🧠

OmniTabBench: Mapping the Empirical Frontiers of GBDTs, Neural Networks, and Foundation Models for Tabular Data at Scale

OmniTabBench introduces the largest tabular data benchmark with 3,030 datasets to evaluate gradient boosted decision trees, neural networks, and foundation models. The comprehensive analysis reveals no universally superior approach, but identifies specific conditions favoring different model categories through decoupled metafeature analysis.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Efficient Quantization of Mixture-of-Experts with Theoretical Generalization Guarantees

Researchers propose an expert-wise mixed-precision quantization strategy for Mixture-of-Experts models that assigns bit-widths based on router gradient changes and neuron variance. The method achieves higher accuracy than existing approaches while reducing inference memory overhead on large-scale models like Switch Transformer and Mixtral with minimal computational overhead.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Inference-Time Code Selection via Symbolic Equivalence Partitioning

Researchers propose Symbolic Equivalence Partitioning, a novel inference-time selection method for code generation that uses symbolic execution and SMT constraints to identify correct solutions without expensive external verifiers. The approach improves accuracy on HumanEval+ by 10.3% and on LiveCodeBench by 17.1% at N=10 without requiring additional LLM inference.

AIBullisharXiv – CS AI · Apr 107/10

🧠

AI-Driven Research for Databases

Researchers propose AI-Driven Research for Systems (ADRS), a framework using large language models to automate database optimization by generating and evaluating hundreds of candidate solutions. By co-evolving evaluators with solutions, the team demonstrates discovery of novel algorithms achieving up to 6.8x latency improvements over existing baselines in buffer management, query rewriting, and index selection tasks.

AINeutralarXiv – CS AI · Apr 107/10

🧠

The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

Researchers prove mathematically that no continuous input-preprocessing defense can simultaneously maintain utility, preserve model functionality, and guarantee safety against prompt injection attacks in language models with connected prompt spaces. The findings establish a fundamental trilemma showing that defenses must inevitably fail at some threshold inputs, with results verified in Lean 4 and validated empirically across three LLMs.

← PrevPage 72 of 633Next →