Fully-Open Vision Encoders • Generative Pretraining
OpenVision 2
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning.
OpenVision 2: A Family of Generative Pretrained Visual Encoders that removes the text encoder and contrastive loss, training with caption-only supervision.
Quick Links (OpenVision)
- Project page: OpenVision
- ArXiv: arXiv:2505.04601
- Code: GitHub
- HF Collection: OpenVision Collection
Quick Links (OpenVision 2)
- Project page: OpenVision 2
- ArXiv: arXiv:2509.01644
- HF Collection: OpenVision 2 Collection
- Dataset: ReCap-DataComp-1B v2
Training speed1.5–2× faster
Memory footprint~1.8× lower
ScaleUp to 1B+ params
BenchmarksOCR/TextVQA↑