hocr

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdfbox pdf pdf-document C#netstandard pdf-extractor pdf-document-processor pdf-files alto-xml hocr layout-analysis document-analysis page-xml pdf-generation

C# 2.2 k

17 小时前

manisandro / gImageReader

A Gtk/Qt front-end to tesseract-ocr.

Qt OCR pdf-document C++tesseract-ocr GTK hocr scanner

C++ 1.82 k

6 天前

mittagessen / kraken

OCR engine for all the languages

OCR neural-networks alto-xml hocr handwritten-text-recognition layout-analysis optical-character-recognition page-xml

Python 881

11 天前

BobLd / DocumentLayoutAnalysis

Document Layout Analysis resources repos for development with PdfPig.

document-layout-analysis layout-analysis table-extraction pdf C#hocr page-xml alto-xml

C# 624

2 年前

scribeocr / scribeocr

Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.

OCR proofreading tesseract hocr

JavaScript 214

7 天前

UB-Mannheim / ocr-fileformat

Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

OCR hocr page-xml validation transformation

JavaScript 196

4 个月前

cneud / ocr-conversion

Conversions between various OCR formats

alto-xml hocr page-xml OCR

2 年前

dbmdz / mirador-textoverlay

Text Overlay plugin for Mirador 3

OCR optical-character-recognition hocr alto-xml

JavaScript 58

1 个月前

filak / hOCR-to-ALTO

Convert between Tesseract hOCR and ALTO XML using XSL stylesheets

hocr

XSLT 55

4 个月前

UB-Mannheim / ocr-gt-tools

Ergonomic line-by-line transcription of scanned text.

OCR hocr transcription ground-truth web-interface

JavaScript 54

5 年前

dmi3kno / hocr

Text-to-tibble

OCR tesseract tesseract-ocr R rstats hocr

R 36

5 年前

fakabbir / OCR

Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF

OCR hocr tesseract Python

Python 18

4 年前

macabeus / pyslibtesseract

✏️ Integration of Tesseract for Python using a shared library

tesseract hocr OCR

Python 12

9 年前

GeReV / hocr-editor-ts

A visual hOCR file editor

OCR hocr tesseract-ocr

TypeScript 9

1 年前

iilei / hocr-to-json

OCR hocr

JavaScript 4

3 年前

GeReV / HocrEditor

A visual editor for .hocr files.

hocr tesseract-ocr OCR

C# 4

7 个月前

hadro / new-york-city-directories

Some basic data and text extraction from the New York City Directories

digital-humanities pdfs OCR hocr

8 年前

mayurcybercz / AI-Exam-evaluation

#自然语言处理#CLI-Tool to recognise handwritten text from answer sheets using Tesseract OCR. Using this extracted text to evaluate marks using NLP

tesseract-ocr hocr 自然语言处理命令行界面 JSON Python nltk

Jupyter Notebook 3

7 年前

hadro / brewery-guides

The data for guides to breweries across the United States from 1896 to 1918

hocr data dataset digital-humanities Open Data

8 年前

jlieth / hocr-parser

Python parser for hOCR files using lxml

Python hocr OCR parsing-library

Python 3

5 年前

Website
Wikipedia