page-xml

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdfbox pdf pdf-document C#netstandard pdf-extractor pdf-document-processor pdf-files alto-xml hocr layout-analysis document-analysis page-xml pdf-generation

C# 2.2 k

17 小时前

mittagessen / kraken

OCR engine for all the languages

OCR neural-networks alto-xml hocr handwritten-text-recognition layout-analysis optical-character-recognition page-xml

Python 881

11 天前

BobLd / DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

document-layout-analysis layout-analysis table-extraction pdf C#hocr page-xml alto-xml

C# 624

2 年前

UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

OCR hocr page-xml validation transformation

JavaScript 196

4 个月前

lquirosd / P2PaLA

Page to PAGE Layout Analysis Tool

深度神经网络 handwritten-text-recognition document-layout-analysis page-xml PyTorch pix2pix Generative Adversarial Network 机器视觉 image-segmentation

Python 191

4 年前

cneud / ocr-conversion

Conversions between various OCR formats

alto-xml hocr page-xml OCR

2 年前

qurator-spk / dinglehopper

An OCR evaluation tool

OCR alto-xml page-xml page

Python 66

24 天前

kba / transkribus-to-prima

Convert Transkribus PAGE-XML to standard PAGE-XML

OCR page-xml

Python 12

1 年前

UB-Mannheim / blatt

NLP-helper for OCR-ed pages in PAGE XML format

page-xml

Python 10

9 个月前

VRI-UFPR / page-xml-draw

A powerful CLI tool for visualization and encoding of PAGE-XML files

page-xml 可视化 OpenCV OCR layout-analysis segmentation

Python 6

4 年前

slub / textract2page

Convert AWS Textract JSON to PRImA PAGE XML

OCR page-xml Python

Python 6

7 个月前

Heresta / OCR17plus

Data for layout analysis and HTR.

XML alto-xml page-xml png dataset OCR segmentation

Python 4

4 年前

IMAGO-Catalogues-Jjanes / cataloguesSegmentationOCR

Dataset and models for catalogs' Layout analysis and HTR

OCR segmentation page-xml alto-xml catalog

Python 2

4 年前

qurator-spk / ocrd_repair_inconsistencies

Automatically re-order lines, words and glyphs to become textually consistent with their parents.

OCR page-xml page

Python 2

2 年前

OCR-D / gt_structure_1_4

About The repo gt_structure_1_4 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

ground-truth page-xml repository segmentation

1 年前

OCR-D / gt_structure_1_3

The repo gt_structure_1_3 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

ground-truth repository segmentation page-xml

1 年前

VRI-UFPR / ocrd-page-xml-draw

OCR-D wrapper for page-xml-draw

可视化 segmentation layout-analysis OCR page-xml

Python 0

4 年前

OCR-D / gt_structure_1_2

The repo gt_structure_1_2 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

ground-truth page-xml repository segmentation

1 年前

Lemmbraalemao-DPB / German-Brazilian-Newspapers-Dataset_1

The GBN Dataset consists German-Brazilian historical newspapers, along with their digital and binarized images and ground truth files.

ground-truth OCR page-xml training

5 个月前

OCR-D / gt_structure_1_1

The repo gt_structure_1_1 is part of the OCR-D Ground Truth Structure corpus. Only the structure of the printed page is annotated. The corpus was created as a result of the DFG project OCR-D.

ground-truth page-xml repository segmentation

1 年前

Website
Wikipedia