y0news
#arxiv31 articles
31 articles
AINeutralarXiv – CS AI · 4h ago4
🧠

Planning under Distribution Shifts with Causal POMDPs

Researchers propose a new theoretical framework for AI planning under changing conditions using causal POMDPs (Partially Observable Markov Decision Processes). The framework represents environmental changes as interventions, enabling AI systems to evaluate and adapt plans when underlying conditions shift while maintaining computational tractability.

AINeutralarXiv – CS AI · 4h ago5
🧠

LemmaBench: A Live, Research-Level Benchmark to Evaluate LLM Capabilities in Mathematics

Researchers have developed LemmaBench, a new benchmark for evaluating Large Language Models on research-level mathematics by automatically extracting and rewriting lemmas from arXiv papers. Current state-of-the-art LLMs achieve only 10-15% accuracy on these mathematical theorem proving tasks, revealing a significant gap between AI capabilities and human-level mathematical research.

AIBullisharXiv – CS AI · 4h ago2
🧠

Learning Flexible Job Shop Scheduling under Limited Buffers and Material Kitting Constraints

Researchers developed a deep reinforcement learning approach using heterogeneous graph networks to solve Flexible Job Shop Scheduling Problems with limited buffers and material kitting constraints. The method outperforms traditional heuristics by improving buffer utilization and decision quality through better modeling of complex dependencies in production scheduling.

AIBullisharXiv – CS AI · 4h ago3
🧠

A Minimal Agent for Automated Theorem Proving

Researchers propose a minimal baseline architecture for AI-based theorem proving that achieves competitive performance with state-of-the-art systems while using significantly simpler design. The open-source implementation demonstrates that iterative proof refinement approaches are more sample-efficient and cost-effective than single-shot generation methods.

AIBullisharXiv – CS AI · 4h ago2
🧠

Evidential Neural Radiance Fields

Researchers introduce Evidential Neural Radiance Fields, a new probabilistic approach that enables uncertainty quantification in 3D scene modeling while maintaining rendering quality. The method addresses critical limitations in existing NeRF technology by capturing both aleatoric and epistemic uncertainty from a single forward pass, making neural radiance fields more suitable for safety-critical applications.

AINeutralarXiv – CS AI · 4h ago2
🧠

DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model

Researchers introduce DLEBench, the first benchmark specifically designed to evaluate instruction-based image editing models' ability to edit small-scale objects that occupy only 1%-10% of image area. Testing on 10 models revealed significant performance gaps in small object editing, highlighting a critical limitation in current AI image editing capabilities.

AIBullisharXiv – CS AI · 4h ago2
🧠

See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent

Researchers introduce Sea² (See, Act, Adapt), a novel approach that improves AI perception models in new environments by using an intelligent pose-control agent rather than retraining the models themselves. The method keeps perception modules frozen and uses a vision-language model as a controller, achieving significant performance improvements of 13-27% across visual tasks without requiring additional training data.

AIBullisharXiv – CS AI · 4h ago2
🧠

Preference Packing: Efficient Preference Optimization for Large Language Models

Researchers propose 'preference packing,' a new optimization technique for training large language models that reduces training time by at least 37% through more efficient handling of duplicate input prompts. The method optimizes attention operations and KV cache memory usage in preference-based training methods like Direct Preference Optimization.

AIBullisharXiv – CS AI · 4h ago2
🧠

A Mixed Diet Makes DINO An Omnivorous Vision Encoder

Researchers have developed an 'Omnivorous Vision Encoder' that creates consistent feature representations across different visual modalities (RGB, depth, segmentation) of the same scene. The framework addresses the poor cross-modal alignment in existing vision encoders like DINOv2 by training with dual objectives to maximize feature alignment while preserving discriminative semantics.

AIBullisharXiv – CS AI · 4h ago5
🧠

Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification

Researchers have developed a new method to extract interpretable causal mechanisms from neural networks using structured pruning as a search technique. The approach reframes network pruning as finding approximate causal abstractions, yielding closed-form criteria for simplifying networks while maintaining their causal structure under interventions.

AINeutralarXiv – CS AI · 4h ago4
🧠

Memory Caching: RNNs with Growing Memory

Researchers introduce Memory Caching (MC), a technique that enhances recurrent neural networks by allowing their memory capacity to grow with sequence length, bridging the gap between fixed-memory RNNs and growing-memory Transformers. The approach offers four variants and shows competitive performance with Transformers on language modeling and long-context tasks while maintaining better computational efficiency.

AIBullisharXiv – CS AI · 4h ago4
🧠

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Researchers developed CUDA Agent, a reinforcement learning system that significantly outperforms existing methods for GPU kernel optimization, achieving 100% faster performance than torch.compile on benchmark tests. The system uses large-scale agentic RL with automated verification and profiling to improve CUDA kernel generation, addressing a critical bottleneck in deep learning performance.

AIBullisharXiv – CS AI · 4h ago7
🧠

MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

Researchers introduce MITS (Mutual Information Tree Search), a new framework that improves reasoning capabilities in large language models using information-theoretic principles. The method uses pointwise mutual information for step-wise evaluation and achieves better performance while being more computationally efficient than existing tree search methods like Tree-of-Thought.

AIBullisharXiv – CS AI · 4h ago6
🧠

Automating the Refinement of Reinforcement Learning Specifications

Researchers introduce AutoSpec, a framework that automatically refines reinforcement learning specifications to help AI agents learn complex tasks more effectively. The system improves coarse-grained logical specifications through exploration-guided strategies while maintaining specification soundness, demonstrating promising improvements in solving complex control tasks.

AIBullisharXiv – CS AI · 4h ago5
🧠

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Researchers introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that improves AI reasoning efficiency by helping large reasoning models know when to stop thinking. The approach addresses the problem of redundant, lengthy reasoning chains that don't improve accuracy while reducing computational costs and response times.

AIBullisharXiv – CS AI · 4h ago7
🧠

Stop Unnecessary Reflection: Training LRMs for Efficient Reasoning with Adaptive Reflection and Length Coordinated Penalty

Researchers developed ARLCP, a reinforcement learning framework that reduces unnecessary reflection in Large Reasoning Models, achieving 53% shorter responses while improving accuracy by 5.8% on smaller models. The method addresses computational inefficiencies in AI reasoning by dynamically balancing efficiency and accuracy through adaptive penalties.

AIBullisharXiv – CS AI · 4h ago5
🧠

Context and Diversity Matter: The Emergence of In-Context Learning in World Models

Researchers investigate in-context learning (ICL) in world models, identifying two core mechanisms - environment recognition and environment learning - that enable AI systems to adapt to new configurations. The study provides theoretical error bounds and empirical evidence showing that diverse environments and long context windows are crucial for developing self-adapting world models.

AIBullisharXiv – CS AI · 4h ago6
🧠

Activation Function Design Sustains Plasticity in Continual Learning

Researchers from arXiv demonstrate that activation function design is crucial for maintaining neural network plasticity in continual learning scenarios. They introduce two new activation functions (Smooth-Leaky and Randomized Smooth-Leaky) that help prevent models from losing their ability to adapt to new tasks over time.

$LINK
AIBearisharXiv – CS AI · 4h ago4
🧠

The False Promise of Zero-Shot Super-Resolution in Machine-Learned Operators

Research reveals that machine-learned operators (MLOs) fail at zero-shot super-resolution, unable to accurately perform inference at resolutions different from their training data. The study identifies key limitations in frequency extrapolation and resolution interpolation, proposing a multi-resolution training protocol as a solution.

AIBullisharXiv – CS AI · 4h ago5
🧠

Thompson Sampling via Fine-Tuning of LLMs

Researchers developed ToSFiT (Thompson Sampling via Fine-Tuning), a new Bayesian optimization method that uses fine-tuned large language models to improve search efficiency in complex discrete spaces. The approach eliminates computational bottlenecks by directly parameterizing reward probabilities and demonstrates superior performance across diverse applications including protein search and quantum circuit design.

AIBullisharXiv – CS AI · 4h ago10
🧠

Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation

Researchers developed Speculative Verdict (SV), a training-free framework that improves large Vision-Language Models' ability to reason over information-dense images by combining multiple small draft models with a larger verdict model. The approach achieves better accuracy on visual question answering benchmarks while reducing computational costs compared to large proprietary models.

AIBullisharXiv – CS AI · 4h ago4
🧠

WisPaper: Your AI Scholar Search Engine

WisPaper is a new AI-powered academic search system that combines semantic search capabilities with automated paper validation and organization tools. The system achieved 22.26% recall on TaxoBench and 93.70% validation accuracy, addressing key limitations in current academic search engines by integrating discovery, organization, and monitoring workflows.

Page 1 of 2Next →