y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#lvlm News & Analysis

8 articles tagged with #lvlm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

8 articles
AIBullisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

EcoAlign: An Economically Rational Framework for Efficient LVLM Alignment

Researchers introduce EcoAlign, a new framework for aligning Large Vision-Language Models that treats alignment as an economic optimization problem. The method balances safety, utility, and computational costs while preventing harmful reasoning disguised with benign justifications, showing superior performance across multiple models and datasets.

AIBullisharXiv โ€“ CS AI ยท Mar 117/10
๐Ÿง 

Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge

Researchers developed EyExIn, a new AI framework that addresses critical gaps in large vision language models for medical diagnosis by anchoring them with domain-specific expert knowledge. The system uses dual-stream encoding and deep expert injection to improve accuracy in ophthalmic diagnosis, outperforming existing proprietary systems across four benchmarks.

AIBullisharXiv โ€“ CS AI ยท Mar 37/103
๐Ÿง 

OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

Researchers have developed OmniCT, a unified AI model that combines slice-level and volumetric analysis for CT scan interpretation, addressing a major limitation in medical imaging AI. The model introduces spatial consistency enhancement and organ-level semantic features, outperforming existing methods across clinical tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 176/10
๐Ÿง 

Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection

Researchers propose 'Two Birds, One Projection,' a new inference-time defense method for Large Vision-Language Models that simultaneously improves both safety and utility performance. The method addresses modality-induced bias by projecting cross-modal features onto the null space of identified bias directions, breaking the traditional safety-utility tradeoff.

AIBullisharXiv โ€“ CS AI ยท Mar 166/10
๐Ÿง 

Visual-ERM: Reward Modeling for Visual Equivalence

Researchers introduce Visual-ERM, a multimodal reward model that improves vision-to-code tasks by evaluating visual equivalence in rendered outputs rather than relying on text-based rules. The system achieves significant performance gains on chart-to-code tasks (+8.4) and shows consistent improvements across table and SVG parsing applications.

AIBullisharXiv โ€“ CS AI ยท Mar 37/107
๐Ÿง 

CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers

Researchers have developed CT-Flow, an AI framework that mimics how radiologists actually work by using tools interactively to analyze 3D CT scans. The system achieved 41% better diagnostic accuracy than existing models and 95% success in autonomous tool use, potentially revolutionizing clinical radiology workflows.

AIBullisharXiv โ€“ CS AI ยท Mar 36/104
๐Ÿง 

ChainMPQ: Interleaved Text-Image Reasoning Chains for Mitigating Relation Hallucinations

Researchers propose ChainMPQ, a training-free method to reduce relation hallucinations in Large Vision-Language Models (LVLMs) by using interleaved text-image reasoning chains. The approach addresses the most common but least studied type of AI hallucination by sequentially analyzing subjects, objects, and their relationships through multi-perspective questioning.

AIBullisharXiv โ€“ CS AI ยท Mar 35/105
๐Ÿง 

Cross-modal Identity Mapping: Minimizing Information Loss in Modality Conversion via Reinforcement Learning

Researchers developed Cross-modal Identity Mapping (CIM), a reinforcement learning framework that improves image captioning in Large Vision-Language Models by minimizing information loss during visual-to-text conversion. The method achieved 20% improvement in relation reasoning on the COCO-LN500 benchmark using Qwen2.5-VL-7B without requiring additional annotations.