MM-LIMA: Less Is More for Alignment in Multi-Modal Datasets
MM-LIMA demonstrates that multimodal large language models can achieve superior performance using only 200 high-quality instruction examples—6% of the data used in comparable systems. Researchers developed quality metrics and an automated data selector to filter vision-language datasets, showing that strategic data curation outweighs raw dataset size in model alignment.