🧠

AI

21,449 AI articles curated from 50+ sources with AI-powered sentiment analysis, importance scoring, and key takeaways.

21449 articles

AIBullisharXiv – CS AI · Mar 36/103

🧠

WavefrontDiffusion: Dynamic Decoding Schedule for Improved Reasoning

Researchers introduce WavefrontDiffusion, a new dynamic decoding approach for Diffusion Language Models that improves text generation quality by expanding from finalized positions rather than using fixed blocks. The method achieves state-of-the-art performance on reasoning and code generation benchmarks while maintaining computational efficiency equivalent to existing block-based methods.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Protein Structure Tokenization via Geometric Byte Pair Encoding

Researchers have developed GeoBPE, a new protein structure tokenization method that converts protein backbone structures into discrete geometric tokens, achieving over 10x compression and data efficiency improvements. The approach uses geometry-grounded byte-pair encoding to create hierarchical vocabularies of protein structural primitives that align with functional families and enable better multimodal protein modeling.

AIBullisharXiv – CS AI · Mar 36/104

🧠

When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets

Researchers conducted the first comprehensive analysis of open-source direct preference optimization (DPO) datasets used to align large language models, revealing significant quality variations. They created UltraMix, a curated dataset that's 30% smaller than existing options while delivering superior performance across benchmarks.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Hard-constraint physics-residual networks enable robust extrapolation for hydrogen crossover prediction in PEM water electrolyzers

Researchers developed a hard-constraint physics-residual network (PR-Net) that significantly improves hydrogen crossover prediction in water electrolyzers for green hydrogen production. The AI model achieves 99.57% accuracy and maintains performance when extrapolating beyond training conditions, outperforming traditional neural networks and physics-informed networks.

$NEAR

AIBullisharXiv – CS AI · Mar 36/104

🧠

AIRMap: AI-Generated Radio Maps for Wireless Digital Twins

Researchers developed AIRMap, a deep-learning framework that generates radio maps for wireless network simulation over 100x faster than traditional ray tracing methods. The AI model achieves under 4 dB RMSE accuracy in 4 ms per inference and significantly outperforms traditional simulators when calibrated with field measurements.

$NEAR

AINeutralarXiv – CS AI · Mar 35/104

🧠

Spurious Correlation-Aware Embedding Regularization for Worst-Group Robustness

Researchers propose SCER (Spurious Correlation-Aware Embedding Regularization), a new deep learning approach that improves AI model robustness by regularizing feature representations to suppress spurious correlations. The method demonstrates superior performance in worst-group accuracy across vision and language tasks compared to existing state-of-the-art approaches.

AIBullisharXiv – CS AI · Mar 36/103

🧠

LSPRAG: LSP-Guided RAG for Language-Agnostic Real-Time Unit Test Generation

Researchers developed LSPRAG, a new framework that uses Language Server Protocol backends to help Large Language Models generate unit tests across multiple programming languages in real-time. The system achieved significant improvements in test coverage, with increases up to 213% for Java, 174% for Go, and 31% for Python compared to existing methods.

AINeutralarXiv – CS AI · Mar 36/103

🧠

WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality

Researchers introduced WebDevJudge, a benchmark for evaluating how well AI models can judge web development quality compared to human experts. The study reveals significant gaps between AI judges and human evaluation, highlighting fundamental limitations in AI's ability to assess complex, interactive web development tasks.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems

Researchers introduce SupervisorAgent, a lightweight framework that reduces token consumption in Multi-Agent Systems by 29.68% while maintaining performance. The system provides real-time supervision and error correction without modifying base agent architectures, validated across multiple AI benchmarks.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Fly-CL: A Fly-Inspired Framework for Enhancing Efficient Decorrelation and Reduced Training Time in Pre-trained Model-based Continual Representation Learning

Researchers introduce Fly-CL, a bio-inspired framework for continual representation learning that significantly reduces training time while maintaining performance comparable to state-of-the-art methods. The approach, inspired by fly olfactory circuits, addresses multicollinearity issues in pre-trained models and enables more efficient similarity matching for real-time applications.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Soft-Masked Diffusion Language Models

Researchers introduce soft-masking (SM), a novel approach for diffusion-based language models that improves upon traditional binary masked diffusion by blending mask token embeddings with predicted tokens. Testing on models up to 7B parameters shows consistent improvements in performance metrics and coding benchmarks.

AINeutralarXiv – CS AI · Mar 36/103

🧠

Digital Companionship: Overlapping Uses of AI Companions and AI Assistants

Research analyzing 202 ChatGPT and Replika users reveals emerging patterns of digital companionship, where users engage with AI systems for both task-based assistance and emotional support. The study finds users appreciate both humanlike qualities (emotional resonance) and non-humanlike features (constant availability), but struggle with the psychological tensions of forming attachments to entities they don't consider truly human.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Reliable Fine-Grained Evaluation of Natural Language Math Proofs

Researchers have developed ProofGrader, a new AI system that can reliably evaluate natural language mathematical proofs generated by large language models on a fine-grained 0-7 scale. The system was trained using ProofBench, the first expert-annotated dataset of proof ratings covering 145 competition math problems and 435 LLM solutions, achieving significant improvements over basic evaluation methods.

AINeutralarXiv – CS AI · Mar 36/103

🧠

OBsmith: LLM-Powered JavaScript Obfuscator Testing

Researchers introduce OBsmith, an LLM-powered framework that tests JavaScript obfuscators for correctness bugs that can silently alter program functionality. The tool discovered 11 previously unknown bugs that existing JavaScript fuzzers failed to detect, highlighting critical gaps in obfuscation quality assurance.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Latent Diffusion Model without Variational Autoencoder

Researchers introduce SVG, a new latent diffusion model that eliminates the need for variational autoencoders by using self-supervised representations. The approach leverages frozen DINO features to create semantically structured latent spaces, enabling faster training, fewer sampling steps, and better generative quality while maintaining semantic capabilities.

AIBullisharXiv – CS AI · Mar 36/104

🧠

DISCO: Diversifying Sample Condensation for Efficient Model Evaluation

Researchers introduce DISCO, a new method for efficiently evaluating machine learning models by selecting samples that maximize disagreement between models rather than relying on complex clustering approaches. The technique achieves state-of-the-art results in performance prediction while reducing the computational cost of model evaluation.

AIBullisharXiv – CS AI · Mar 36/104

🧠

TTOM: Test-Time Optimization and Memorization for Compositional Video Generation

Researchers introduce TTOM (Test-Time Optimization and Memorization), a training-free framework that improves compositional video generation in Video Foundation Models during inference. The system uses layout-attention optimization and parametric memory to better align text prompts with generated video outputs, showing strong transferability across different scenarios.

AIBullisharXiv – CS AI · Mar 36/103

🧠

Training Large Language Models To Reason In Parallel With Global Forking Tokens

Researchers developed Set Supervised Fine-Tuning (SSFT) and Global Forking Policy Optimization (GFPO) methods to improve large language model reasoning by enabling parallel processing through 'global forking tokens.' The techniques preserve diverse reasoning modes and demonstrate superior performance on math and code generation benchmarks compared to traditional fine-tuning approaches.

AIBullisharXiv – CS AI · Mar 35/104

🧠

Reference Grounded Skill Discovery

Researchers developed Reference-Grounded Skill Discovery (RGSD), a new AI algorithm that enables high-dimensional agents to learn complex skills by grounding discovery in semantically meaningful reference data. The method successfully taught a simulated humanoid with 359-dimensional observations to imitate and vary behaviors like walking, running, and punching while outperforming traditional imitation learning approaches.

AIBullisharXiv – CS AI · Mar 36/104

🧠

TiTok: Transfer Token-level Knowledge via Contrastive Excess to Transplant LoRA

TiTok is a new framework for transferring LoRA (Low-Rank Adaptation) parameters between different Large Language Model backbones without requiring additional training data or discriminator models. The method uses token-level contrastive learning to achieve 4-10% performance gains over existing approaches in parameter-efficient fine-tuning scenarios.

AINeutralarXiv – CS AI · Mar 36/104

🧠

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark

Researchers introduce EgoNight, the first comprehensive benchmark for nighttime egocentric vision understanding, featuring day-night aligned videos and visual question answering tasks. The benchmark reveals significant performance drops in state-of-the-art multimodal large language models when operating under low-light conditions.

AIBullisharXiv – CS AI · Mar 36/104

🧠

Solving the Granularity Mismatch: Hierarchical Preference Learning for Long-Horizon LLM Agents

Researchers introduce Hierarchical Preference Learning (HPL), a new framework that improves AI agent training by using preference signals at multiple granularities - trajectory, group, and step levels. The method addresses limitations in existing Direct Preference Optimization approaches and demonstrates superior performance on challenging agent benchmarks through a dual-layer curriculum learning system.

AIBullisharXiv – CS AI · Mar 36/104

🧠

DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing

DragFlow introduces the first framework to leverage FLUX's DiT priors for drag-based image editing, addressing distortion issues that plagued earlier Stable Diffusion-based approaches. The system uses region-based editing with affine transformations instead of point-based supervision, achieving state-of-the-art results on benchmarks.

AINeutralarXiv – CS AI · Mar 35/103

🧠

Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness

Researchers introduce C³B (Comics Cross-Cultural Benchmark), a new benchmark to test cultural awareness capabilities in Multimodal Large Language Models using over 2000 comic images and 18000 QA pairs. Testing revealed significant performance gaps between current MLLMs and human performance, highlighting the need for improved cultural understanding in AI systems.

AIBullisharXiv – CS AI · Mar 36/104

🧠

MENLO: From Preferences to Proficiency -- Evaluating and Modeling Native-like Quality Across 47 Languages

Researchers introduce MENLO, a new framework for evaluating native-like quality in large language model responses across 47 languages. The study reveals significant improvements in multilingual LLM performance through reinforcement learning and fine-tuning, though gaps with human judgment persist.

← PrevPage 548 of 858Next →