GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

document-data-extraction

Website
Wikipedia
https://static.github-zh.com/github_avatars/NanoNets?size=40
NanoNets / docext

#自然语言处理#An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

documentdocument-analysisextraction大语言模型机器学习自然语言处理OCRragunstructured-datavlmsonpremdocument-data-extractionocr-onpremisellm-ocronprem-ocronprem-visiononpremisetable-extractiondocument-information-extractionocr-benchmark
Python 1.44 k
7 天前
https://static.github-zh.com/github_avatars/ucbepic?size=40
ucbepic / TWIX

TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents

document-data-extractiondocument-processing
Python 192
1 个月前