GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

document-parsing

Website
Wikipedia
https://static.github-zh.com/github_avatars/PaddlePaddle?size=40
PaddlePaddle / PaddleOCR

PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。

OCRcrnnocrlite数据库chineseocrpdf2markdownpp-ocrpp-structuredocument-parsingchatocrdocument-translationkie
Python 53.75 k
4 天前
docling-project/docling
https://static.github-zh.com/github_avatars/docling-project?size=40
docling-project / docling

Get your documents ready for gen AI

人工智能convertdocumentspdftablesdocument-parserdocument-parsingdocxHTMLMarkdownpdf-converterpdf-to-jsonpdf-to-textpptxxlsx
Python 38.56 k
3 天前
https://static.github-zh.com/github_avatars/Unstructured-IO?size=40
Unstructured-IO / unstructured

#自然语言处理#Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...

深度学习document-parsing机器学习自然语言处理OCRinformation-retrievaldata-pipelinespreprocessingpdf-to-textpdfpdf-to-jsondocument-image-analysisdonutdocument-image-processingdocument-parserdocxlangchain大语言模型
HTML 12.65 k
4 天前
https://static.github-zh.com/github_avatars/run-llama?size=40
run-llama / llama_cloud_services

Knowledge Agents and Management in the Cloud

documentParsingpdfpdf-document-processorpptxstructured-datadocument-parserdocument-parsingdocx-to-markdownpdf-to-excelpdf-to-jsonpdf-to-textppt-to-jsontablesppt-to-markdownpdf-to-markdown
TypeScript 4.14 k
2 天前
enoch3712/ExtractThinker
https://static.github-zh.com/github_avatars/enoch3712?size=40
enoch3712 / ExtractThinker

#自然语言处理#ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

人工智能大语言模型自然语言处理OCRopenaiPythondocument-image-analysisdocument-intelligencedocument-parsingdocument-processinglangchain机器学习pdfpdf-to-text
Python 1.4 k
18 天前
https://static.github-zh.com/github_avatars/NanoNets?size=40
NanoNets / docstrange

#大语言模型#Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

大语言模型MarkdownOCRpdf-to-markdownstructured-data人工智能document-parserdocument-parsingpdf-parserpdf-to-jsontables
Python 546
3 天前
https://static.github-zh.com/github_avatars/edenai?size=40
edenai / edenai-apis

#自然语言处理#Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines

aggregator人工智能API机器视觉document-parsing图像处理machine-translation自然语言处理OCRoptical-character-recognitionpre-trained-modelPythonspeech-recognitionspeech-to-texttext-to-speechvideo-recognition
Python 456
4 天前
https://static.github-zh.com/github_avatars/harishdeivanayagam?size=40
harishdeivanayagam / rowfill

#大语言模型#Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers

documentdocument-parsinglanggraphllama大语言模型NextOCRollamaopenaivisionpdfpdfsunstructuredunstructured-data
TypeScript 363
6 个月前
https://static.github-zh.com/github_avatars/GiftMungmeeprued?size=40
GiftMungmeeprued / document-parsers-list

A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts...

data-pipelinedocument-image-processingdocument-parserdocument-parsinglangchainOCRpdfpdf-to-textpreprocessing
149
2 个月前
https://static.github-zh.com/github_avatars/AdemBoukhris457?size=40
AdemBoukhris457 / Docs_Parsing_Techniques

Jupyter notebooks testing different OCR models for document parsing (Dolphin, MonkeyOCR, Marker, Nanonets, ...)

人工智能genaiOCRdocument-parsing
Jupyter Notebook 63
13 天前
https://static.github-zh.com/github_avatars/papercast-dev?size=40
papercast-dev / papercast

#自然语言处理#A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines...

arxivPythondag自然语言处理pdf-converterpdf-document-processorpipelinedocument-parserdocument-parsingpdf-to-textpodcasttts
Python 52
6 个月前
https://static.github-zh.com/github_avatars/CycloneBoy?size=40
CycloneBoy / pdf_table

A Unified Toolkit for Deep Learning-Based Table Extraction

人工智能document-parsingpdflayout-analysisOCRtabletable-recognition
Python 49
10 个月前
https://static.github-zh.com/github_avatars/Unstructured-IO?size=40
Unstructured-IO / community

#计算机科学#Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

communitydata-pipeline深度学习document-aidocument-parsing机器学习nlp-parsingocr-pythonOpen Source
29
2 年前
https://static.github-zh.com/github_avatars/docling-project?size=40
docling-project / docling4j

Docling4j brings the functionalities of Docling in document understanding to Java® projects

人工智能document-parserdocument-parsingdocument-understandingdocumentsJavapdfpdf-converterpdf-to-json
Java 16
5 个月前
https://static.github-zh.com/github_avatars/aimagelab?size=40
aimagelab / mugat

Official implementation of our ECCVW paper "μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context"

document-parsingOCRtransformer
Python 11
1 年前
https://static.github-zh.com/github_avatars/acenji?size=40
acenji / ats

Applicant Tracking System (ATS): A powerful platform leveraging generative AI and soft-match algorithms to analyze resumes against job descriptions. Built with React and Node.js, it streamlines hiring...

applicant-tracking-systematsdocument-parsinggenerative-aikeyword-extraction自然语言处理Node.jsReactresume-analysissorting-algorithms
JavaScript 8
5 个月前
https://static.github-zh.com/github_avatars/opendataloader-project?size=40
opendataloader-project / opendataloader-pdf

Safe, Open, High-Performance — OpenDataLoader PDF for AI

JSONMarkdownpdf人工智能document-parserdocument-parsingdocumentsHTMLocr-recognitionpdf-converterpdf-to-jsonpdf-to-markdownpdf-to-textrecognitiontablesdataloaderSDK
Java 7
2 天前
https://static.github-zh.com/github_avatars/baughmann?size=40
baughmann / tikara

#自然语言处理#The metadata and text content extractor for almost every file type.

content-extractiondocument-parsingdocument-processingdocximage-to-textJavalanguage-detection大语言模型metadata机器学习自然语言处理OCRpdf-to-textretrieval-augmented-generationtext-extractiontext-mining
Python 4
7 个月前
https://static.github-zh.com/github_avatars/J-sephB-lt-n?size=40
J-sephB-lt-n / pdf-bank-statement-parser

Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data

bankbankingdocument-parsingfinancial-analysispdf-parserpdf-parsingPython
Python 4
10 个月前
https://static.github-zh.com/github_avatars/renswickd?size=40
renswickd / document-parser-collection

This is a collection of various document parsers and hands-on to construct structured data for your RAG applications.

amazon-textractdocument-parsing
Python 3
1 个月前
loading...