Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!
#自然语言处理#An on-premises, OCR-free unstructured data extraction and benchmarking toolkit. (https://idp-leaderboard.org/)
Official repository for VisionZip (CVPR 2025)
#大语言模型#[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
#大语言模型#Scala client for OpenAI API and other major LLM providers
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments
[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
[ICASSP 2024] The official repo for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
#大语言模型#A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.
[NAACL 2025] Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning
This repository collects research papers of large Foundation Models for Scenario Generation and Analysis in Autonomous Driving. The repository will be continuously updated to track the latest update.
#自然语言处理#BobVLM – A 1.5B multimodal model built from scratch and pre-trained on a single P100 GPU capable of image descriptions and moderate question answering. 🤗🎉
Code for VLM4Bio, a benchmark dataset of scientific question-answer pairs used to evaluate pretrained VLMs for trait discovery from biological images.
Official code for CVPR2025 "Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection"