GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

vlm-ocr

Website
Wikipedia
https://static.github-zh.com/github_avatars/bytedance?size=40
bytedance / Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

document-analysislayout-analysisOCRParserpdfpdf-converterpdf-parserPythonvlm-ocr
Python 5.8 k
16 天前
https://static.github-zh.com/github_avatars/vlm-run?size=40
vlm-run / vlmrun-hub

A hub for various industry-specific schemas to be used with VLMs.

人工智能机器视觉etlgenaiJSONmultimodalpydanticvlmvlm-ocr
Python 533
4 个月前
https://static.github-zh.com/github_avatars/video-db?size=40
video-db / ocr-benchmark

Benchmarking Vision-Language Models on OCR tasks in Dynamic Video Environments

arxivbenchmarkeasyocrOCRrapidocrresearch-papervlm-ocrvlms
Python 44
7 个月前
https://static.github-zh.com/github_avatars/OmarSamirz?size=40
OmarSamirz / ImageFromTextGenerator

IFTG (ImageFromTextGenerator) is a Python package that simplifies creating robust datasets for OCR models. Generate images from text, apply over 10 built-in noise effects, and customize fonts and layo...

ImageOCRsynthetictextdata-augmentationdataset-generation图像处理synthetic-datasynthetic-data-generationnoiseoptical-character-recognitionaugmentationvlm-ocr
Python 16
5 个月前
https://static.github-zh.com/github_avatars/Niraya666?size=40
Niraya666 / DocuLingo

DocuLingo is a powerful document parsing tool built with multimodal large language models to enhance RAG (Retrieval Augmented Generation) workflows.

ragvlm-ocr
Python 0
4 个月前
https://static.github-zh.com/github_avatars/Takk8IS?size=40
Takk8IS / CyberTechVLMDetector

The CyberTech VLM Detector is a computer vision system designed to run entirely on edge devices, without requiring cloud access. The system uses vision-language models (VLM) to detect and locate objec...

cameradetectorPythonreadviewvlmvlm-ocrvlms
Python 0
2 个月前