GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

document-parsing

Website
Wikipedia
https://static.github-zh.com/github_avatars/PaddlePaddle?size=40
PaddlePaddle / PaddleOCR

PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。

OCRcrnnocrlite数据库chineseocrpdf2markdownpp-ocrpp-structuredocument-parsing
Python 50.46 k
11 小时前
docling-project/docling
https://static.github-zh.com/github_avatars/docling-project?size=40
docling-project / docling

Get your documents ready for gen AI

人工智能convertdocumentspdftablesdocument-parserdocument-parsingdocxHTMLMarkdownpdf-converterpdf-to-jsonpdf-to-textpptxxlsx
Python 31.83 k
2 天前
https://static.github-zh.com/github_avatars/Unstructured-IO?size=40
Unstructured-IO / unstructured

#自然语言处理#Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...

深度学习document-parsing机器学习自然语言处理OCRinformation-retrievaldata-pipelinespreprocessingpdf-to-textpdfpdf-to-jsondocument-image-analysisdonutdocument-image-processingdocument-parserdocxlangchain大语言模型
HTML 11.49 k
2 天前
https://static.github-zh.com/github_avatars/run-llama?size=40
run-llama / llama_cloud_services

Knowledge Agents and Management in the Cloud

documentParsingpdfpdf-document-processorpptxstructured-datadocument-parserdocument-parsingdocx-to-markdownpdf-to-excelpdf-to-jsonpdf-to-textppt-to-jsontablesppt-to-markdownpdf-to-markdown
Python 4.01 k
5 天前
enoch3712/ExtractThinker
https://static.github-zh.com/github_avatars/enoch3712?size=40
enoch3712 / ExtractThinker

#自然语言处理#ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

人工智能大语言模型自然语言处理OCRopenaiPythondocument-image-analysisdocument-intelligencedocument-parsingdocument-processinglangchain机器学习pdfpdf-to-text
Python 1.28 k
6 天前
https://static.github-zh.com/github_avatars/edenai?size=40
edenai / edenai-apis

#自然语言处理#Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines

aggregator人工智能API机器视觉document-parsing图像处理machine-translation自然语言处理OCRoptical-character-recognitionpre-trained-modelPythonspeech-recognitionspeech-to-texttext-to-speechvideo-recognition
Python 449
4 天前
https://static.github-zh.com/github_avatars/harishdeivanayagam?size=40
harishdeivanayagam / rowfill

#大语言模型#Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers

documentdocument-parsinglanggraphllama大语言模型NextOCRollamaopenaivisionpdfpdfsunstructuredunstructured-data
TypeScript 282
3 个月前
https://static.github-zh.com/github_avatars/papercast-dev?size=40
papercast-dev / papercast

#自然语言处理#A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines...

arxivPythondag自然语言处理pdf-converterpdf-document-processorpipelinedocument-parserdocument-parsingpdf-to-textpodcasttts
Python 50
3 个月前
https://static.github-zh.com/github_avatars/CycloneBoy?size=40
CycloneBoy / pdf_table

A Unified Toolkit for Deep Learning-Based Table Extraction

人工智能document-parsingpdflayout-analysisOCRtabletable-recognition
Python 39
7 个月前
https://static.github-zh.com/github_avatars/Unstructured-IO?size=40
Unstructured-IO / community

#计算机科学#Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

communitydata-pipeline深度学习document-aidocument-parsing机器学习nlp-parsingocr-pythonOpen Source
28
2 年前
https://static.github-zh.com/github_avatars/docling-project?size=40
docling-project / docling4j

Docling4j brings the functionalities of Docling in document understanding to Java® projects

人工智能document-parserdocument-parsingdocument-understandingdocumentsJavapdfpdf-converterpdf-to-json
Java 10
3 个月前
https://static.github-zh.com/github_avatars/acenji?size=40
acenji / ats

Applicant Tracking System (ATS): A powerful platform leveraging generative AI and soft-match algorithms to analyze resumes against job descriptions. Built with React and Node.js, it streamlines hiring...

applicant-tracking-systematsdocument-parsinggenerative-aikeyword-extraction自然语言处理Node.jsReactresume-analysissorting-algorithms
JavaScript 4
2 个月前
https://static.github-zh.com/github_avatars/ziming?size=40
ziming / laravel-docparser

Docparser OCR Package for PHP Laravel

document-parsingLaravelOCRPHP
PHP 3
1 个月前
https://static.github-zh.com/github_avatars/J-sephB-lt-n?size=40
J-sephB-lt-n / pdf-bank-statement-parser

Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data

bankbankingdocument-parsingfinancial-analysispdf-parserpdf-parsingPython
Python 3
8 个月前
https://static.github-zh.com/github_avatars/baughmann?size=40
baughmann / tikara

#自然语言处理#The metadata and text content extractor for almost every file type.

content-extractiondocument-parsingdocument-processingdocximage-to-textJavalanguage-detection大语言模型metadata机器学习自然语言处理OCRpdf-to-textretrieval-augmented-generationtext-extractiontext-mining
Python 2
4 个月前
https://static.github-zh.com/github_avatars/rithulkamesh?size=40
rithulkamesh / docproc

#计算机科学#Opinionated and Sophisticated Document Region Analyzer.

pdf-processingdocument-analysistext-extractionPythonOCR机器学习layout-analysiscontent-extractiontext-classificationdata-extractiondocument-parsing
Python 2
2 个月前
https://static.github-zh.com/github_avatars/azzubair01?size=40
azzubair01 / zubairhub

ZubairHub is a Streamlit-based application that integrates various functionalities, including social graph visualization, object detection, document parsing, text extraction, generative AI interaction...

document-parsinggenerative-aiobject-detectionoptical-character-recognitionStreamlit
Python 1
1 个月前
https://static.github-zh.com/github_avatars/anyparser?size=40
anyparser / anyparser_crewai

Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extractio...

人工智能crewaidocument-parserdocument-parsingkagknowledge-graphPythonragretrieval-augmented-generationTypeScript
Python 1
4 个月前
https://static.github-zh.com/github_avatars/cr4yfish?size=40
cr4yfish / docling-js

Parsing Documents to one datatype (Typescript port of Docling) (NOT STARTED!)

document-parsinggenaidocument-parserpdf-converterpdf-to-text
1
7 个月前
https://static.github-zh.com/github_avatars/augustweinbren?size=40
augustweinbren / PhraseSpeaker

PhraseSpeaker: Effortlessly dictate specific sections of text files with macOS's text-to-speech. Perfect for navigating and audibly extracting key content from large documents!

Web Accessibility (a11y)Bashdocument-parsingeducationalmacOSproductivity-toolstext-to-speech
Shell 1
1 年前
loading...