GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

layout-analysis

Website
Wikipedia
https://static.github-zh.com/github_avatars/opendatalab?size=40
opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

extract-datalayout-analysisOCRParserpdfpdf-converterPythondocument-analysispdf-parserpdf-extractor-llmpdf-extractor-pretrainpdf-extractor-ragai4science
Python 35.05 k
1 天前
https://static.github-zh.com/github_avatars/Layout-Parser?size=40
Layout-Parser / layout-parser

#计算机科学#A Unified Toolkit for Deep Learning Based Document Image Analysis

layout-analysis深度学习object-detectionOCRlayout-parserdetectron2document-layout-analysis机器视觉document-image-processinglayout-detection
Python 5.29 k
10 个月前
https://static.github-zh.com/github_avatars/breezedeus?size=40
breezedeus / Pix2Text

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowerin...

OCRLaTeXPythonPyTorchlayout-analysismath-ocr
Jupyter Notebook 2.46 k
1 个月前
https://static.github-zh.com/github_avatars/UglyToad?size=40
UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdfboxpdfpdf-documentC#netstandardpdf-extractorpdf-document-processorpdf-filesalto-xmlhocrlayout-analysisdocument-analysispage-xmlpdf-generation
C# 2.04 k
14 天前
https://static.github-zh.com/github_avatars/mittagessen?size=40
mittagessen / kraken

OCR engine for all the languages

OCRneural-networksalto-xmlhocrhandwritten-text-recognitionlayout-analysisoptical-character-recognitionpage-xml
Python 835
10 天前
https://static.github-zh.com/github_avatars/kotaro-kinoshita?size=40
kotaro-kinoshita / yomitoku

#计算机科学#Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.

深度学习layout-analysisOCRPythonPyTorch
Python 811
4 天前
https://static.github-zh.com/github_avatars/BobLd?size=40
BobLd / DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

document-layout-analysislayout-analysistable-extractionpdfC#hocrpage-xmlalto-xml
C# 619
2 年前
https://static.github-zh.com/github_avatars/mindspore-lab?size=40
mindspore-lab / mindocr

#计算机科学#A toolbox of ocr models and algorithms based on MindSpore

OCR深度学习text-detectiontext-recognitioncrnndbnetkey-information-extractionlayout-analysislayoutxlmtable-recognition
Python 275
2 个月前
https://static.github-zh.com/github_avatars/RapidAI?size=40
RapidAI / RapidLayout

Analysis of Chinese and English layouts 中英文版面分析

layoutlayout-analysispp-structure
Python 214
24 天前
https://static.github-zh.com/github_avatars/RapidAI?size=40
RapidAI / RapidDoc

📝 针对文档类图像做内容提取,将文档类图像一比一输出到Word或者Txt中,便于进一步使用或处理。后续计划支持输入PDF/图像,输出对应json格式、Txt格式、Word格式和Markdown格式。

layout-analysis
Python 195
7 个月前
https://static.github-zh.com/github_avatars/andreagemelli?size=40
andreagemelli / doc2graph

#自然语言处理#Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.

深度学习document-understandinggeometric-deep-learninggnnkey-information-extractionlayout-analysis自然语言处理table-detectionPyTorch
Jupyter Notebook 123
2 年前
https://static.github-zh.com/github_avatars/ppaanngggg?size=40
ppaanngggg / yolo-doclaynet

YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis

document-analysislayout-analysisultralyticsyoloyolov8
Python 114
3 个月前
https://static.github-zh.com/github_avatars/NormXU?size=40
NormXU / Layout2Graph

An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"

layout-analysis
Python 78
2 年前
https://static.github-zh.com/github_avatars/xushengfeng?size=40
xushengfeng / eSearch-OCR

基于paddleOCR的nodejs库

Node.jsonnxpaddleocrlayout-analysisOCR
TypeScript 73
20 天前
https://static.github-zh.com/github_avatars/JPLeoRX?size=40
JPLeoRX / detectron2-publaynet

#计算机科学#Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

object-detectioninstance-segmentation机器视觉detectron2Python机器学习神经网络document-classificationdocument-layout-analysislayout-analysisdocument-analysisneural-networks人工智能深度学习faster-rcnnPyTorch
Python 49
2 年前
https://static.github-zh.com/github_avatars/MaitySubhajit?size=40
MaitySubhajit / SelfDocSeg

[ICDAR 2023] SelfDocSeg: A self-supervised vision-based approach towards Document Segmentation (Oral)

机器视觉layout-analysisself-supervised-learning
Python 41
2 年前
https://static.github-zh.com/github_avatars/CycloneBoy?size=40
CycloneBoy / pdf_table

A Unified Toolkit for Deep Learning-Based Table Extraction

人工智能document-parsingpdflayout-analysisOCRtabletable-recognition
Python 39
7 个月前
https://static.github-zh.com/github_avatars/dell-research-harvard?size=40
dell-research-harvard / HJDataset

A Large Dataset of Historical Japanese Documents with Complex Layouts

datasetdetectron2Pythonlayout-analysis
Jupyter Notebook 34
3 年前
https://static.github-zh.com/github_avatars/CaseDrive?size=40
CaseDrive / publaynet-models

#计算机科学#Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset

人工智能机器视觉深度学习detectron2document-analysisdocument-classificationdocument-layout-analysisfaster-rcnninstance-segmentationlayout-analysis机器学习神经网络neural-networksobject-detectionPythonPyTorch
Python 28
2 年前
https://static.github-zh.com/github_avatars/BobLd?size=40
BobLd / PdfPigMLNetBlockClassifier

#计算机科学#Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and ...

lightgbmpdfdocument-layout-analysisclassifier机器学习C#pdf-documentpdf-document-processorlayout-analysis
C# 28
5 年前
loading...