Name | # Original Samples | # Rewritten | # Verified |
---|---|---|---|
CLEVR_Math | 15,000 | 14,748 | 9,771 |
GeoQA170K | 14,019 | 11,745 | 7,794 |
Synthesis | 29,998 | 29,998 | 26,672 |
ArxivQA | 14,992 | 14,810 | 14,109 |
ALLaVA-LAION | 36,977 | 30,191 | 18,123 |
Total | 76,469 |
Dataset Statistics. These datasets cover questions from different domains (math, general) and questions of different types (close-ended, open-ended). For datasets with duplicated images, we select unique images for higher diversity. More data are on the way.
Pipeline Overview. We propose a four-step procedure for data generation: Captioning, Visual-Language CoT Generation, Answer Rewriting and Answer Verification.
We thank the Microsoft Accelerate Foundation Models Research Program for supporting our computing needs.
@misc{vl-thinking2025,
title={VL-Thinking: An R1-Derived Visual Instruction Tuning Dataset for Thinkable LVLMs},
author={Hardy Chen and Haoqin Tu and Hui Liu and Xianfeng Tang and Xinya Du and Yuyin Zhou and Cihang Xie},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/UCSC-VLAA/VL-Thinking}},
}