#lvlm News & Analysis

10 articles tagged with #lvlm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

10 articles

AIBullisharXiv – CS AI · Jun 97/10

🧠

Zero-Shot Learning in Industrial Scenarios: New Large-Scale Benchmark, Challenges and Baseline

Researchers introduce MMIO, a large-scale industrial dataset with 80K+ samples, and RTVP, a refined prompt method for zero-shot defect detection in manufacturing. The work addresses the gap between general-purpose Large Visual Language Models and industrial applications, achieving state-of-the-art performance through improved text-visual prompt interactions and domain adaptation.

AIBullisharXiv – CS AI · Mar 177/10

🧠

EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment

Researchers introduce EcoAlign, a new framework for aligning Large Vision-Language Models that treats alignment as an economic optimization problem. The method balances safety, utility, and computational costs while preventing harmful reasoning disguised with benign justifications, showing superior performance across multiple models and datasets.

AIBullisharXiv – CS AI · Mar 117/10

🧠

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

Researchers developed EyExIn, a new AI framework that addresses critical gaps in large vision language models for medical diagnosis by anchoring them with domain-specific expert knowledge. The system uses dual-stream encoding and deep expert injection to improve accuracy in ophthalmic diagnosis, outperforming existing proprietary systems across four benchmarks.

AIBullisharXiv – CS AI · Mar 37/103

🧠

OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

Researchers have developed OmniCT, a unified AI model that combines slice-level and volumetric analysis for CT scan interpretation, addressing a major limitation in medical imaging AI. The model introduces spatial consistency enhancement and organ-level semantic features, outperforming existing methods across clinical tasks.

AIBullisharXiv – CS AI · May 46/10

🧠

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

Researchers propose Persistent Visual Memory (PVM), a lightweight module that addresses visual signal degradation in Large Vision-Language Models by maintaining consistent visual perception during long text generation. Integrated into Qwen3-VL models, PVM demonstrates measurable accuracy improvements with minimal computational overhead, particularly benefiting complex reasoning tasks.

AIBullisharXiv – CS AI · Mar 176/10

🧠

Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection

Researchers propose 'Two Birds, One Projection,' a new inference-time defense method for Large Vision-Language Models that simultaneously improves both safety and utility performance. The method addresses modality-induced bias by projecting cross-modal features onto the null space of identified bias directions, breaking the traditional safety-utility tradeoff.

AIBullisharXiv – CS AI · Mar 166/10

🧠

Visual-ERM: Reward Modeling for Visual Equivalence

Researchers introduce Visual-ERM, a multimodal reward model that improves vision-to-code tasks by evaluating visual equivalence in rendered outputs rather than relying on text-based rules. The system achieves significant performance gains on chart-to-code tasks (+8.4) and shows consistent improvements across table and SVG parsing applications.

AIBullisharXiv – CS AI · Mar 37/107

🧠

CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers

Researchers have developed CT-Flow, an AI framework that mimics how radiologists actually work by using tools interactively to analyze 3D CT scans. The system achieved 41% better diagnostic accuracy than existing models and 95% success in autonomous tool use, potentially revolutionizing clinical radiology workflows.

AIBullisharXiv – CS AI · Mar 36/104

🧠

ChainMPQ: Interleaved Text-Image Reasoning Chains for Mitigating Relation Hallucinations

Researchers propose ChainMPQ, a training-free method to reduce relation hallucinations in Large Vision-Language Models (LVLMs) by using interleaved text-image reasoning chains. The approach addresses the most common but least studied type of AI hallucination by sequentially analyzing subjects, objects, and their relationships through multi-perspective questioning.

AIBullisharXiv – CS AI · Mar 35/105

🧠

Cross-modal Identity Mapping: Minimizing Information Loss in Modality Conversion via Reinforcement Learning

Researchers developed Cross-modal Identity Mapping (CIM), a reinforcement learning framework that improves image captioning in Large Vision-Language Models by minimizing information loss during visual-to-text conversion. The method achieved 20% improvement in relation reasoning on the COCO-LN500 benchmark using Qwen2.5-VL-7B without requiring additional annotations.