GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

pdf-to-text

Website
Wikipedia
docling-project/docling
https://static.github-zh.com/github_avatars/docling-project?size=40
docling-project / docling

Get your documents ready for gen AI

人工智能convertdocumentspdftablesdocument-parserdocument-parsingdocxHTMLMarkdownpdf-converterpdf-to-jsonpdf-to-textpptxxlsx
Python 34.96 k
4 小时前
https://static.github-zh.com/github_avatars/Unstructured-IO?size=40
Unstructured-IO / unstructured

#自然语言处理#Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...

深度学习document-parsing机器学习自然语言处理OCRinformation-retrievaldata-pipelinespreprocessingpdf-to-textpdfpdf-to-jsondocument-image-analysisdonutdocument-image-processingdocument-parserdocxlangchain大语言模型
HTML 12.14 k
3 天前
https://static.github-zh.com/github_avatars/run-llama?size=40
run-llama / llama_cloud_services

Knowledge Agents and Management in the Cloud

documentParsingpdfpdf-document-processorpptxstructured-datadocument-parserdocument-parsingdocx-to-markdownpdf-to-excelpdf-to-jsonpdf-to-textppt-to-jsontablesppt-to-markdownpdf-to-markdown
TypeScript 4.07 k
7 小时前
enoch3712/ExtractThinker
https://static.github-zh.com/github_avatars/enoch3712?size=40
enoch3712 / ExtractThinker

#自然语言处理#ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

人工智能大语言模型自然语言处理OCRopenaiPythondocument-image-analysisdocument-intelligencedocument-parsingdocument-processinglangchain机器学习pdfpdf-to-text
Python 1.31 k
8 天前
https://static.github-zh.com/github_avatars/Academic-Hammer?size=40
Academic-Hammer / SciTSR

Table structure recognition dataset of the paper: Complicated Table Structure Recognition

table-structure-recognitionpdf-to-text
Python 373
5 年前
https://static.github-zh.com/github_avatars/pd3f?size=40
pd3f / pd3f

#计算机科学#🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

pdftext-extractionpdf-to-textpipeline机器学习OCRlanguage-modelextract-textparsrPython
HTML 323
2 年前
https://static.github-zh.com/github_avatars/shoryasethia?size=40
shoryasethia / markdrop

#大语言模型#A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted tables/images using several LLM clients. And many more functio...

Open Sourcepypi-packageimage-to-text大语言模型pdf-to-markdownpdf-to-texttable-to-textagents
Python 135
1 个月前
https://static.github-zh.com/github_avatars/GiftMungmeeprued?size=40
GiftMungmeeprued / document-parsers-list

A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts...

data-pipelinedocument-image-processingdocument-parserdocument-parsinglangchainOCRpdfpdf-to-textpreprocessing
118
17 天前
https://static.github-zh.com/github_avatars/NanoNets?size=40
NanoNets / ocr-python

OCR library to extract text & tables from PDF files and images. Convert any image or PDF to CSV / TXT / JSON / Searchable PDF.

OCRtesseractpdfPythonpdf-to-jsonpdf-to-textimage-to-text
Jupyter Notebook 110
3 年前
https://static.github-zh.com/github_avatars/nainiayoub?size=40
nainiayoub / pdf-text-data-extractor

PDF text data extraction web app with OCR for scanned documents

pdf-to-textStreamlitstreamlit-webapptext-extractionPythonOCRocr-pythonpdf
Python 88
1 年前
https://static.github-zh.com/github_avatars/datalogics?size=40
datalogics / adobe-pdf-library-samples

Sample code for the Datalogics C++, Java, and .NET interfaces of the Adobe PDF Library

OCRpdfpdf-conversionpdf-converterpdf-documentpdf-generationpdf-libpdf-manipulationpdf-mergerpdf-parserpdf-to-textpdf-toolspdfa
C# 82
2 年前
https://static.github-zh.com/github_avatars/BitMiracle?size=40
BitMiracle / Docotic.Pdf.Samples

C# and VB.NET samples for Docotic.Pdf library

pdf-librarypdf-to-textpdf-signaturepdf-generationextract-textnet-corepdf-manipulationpdf-parserhtml-to-pdf
Visual Basic .NET 78
1 个月前
https://static.github-zh.com/github_avatars/galkahana?size=40
galkahana / pdf-text-extraction

cli for extracting text from PDF files (and maybe possibly tables)

pdfpdf-to-text
C++ 76
2 个月前
https://static.github-zh.com/github_avatars/papercast-dev?size=40
papercast-dev / papercast

#自然语言处理#A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines...

arxivPythondag自然语言处理pdf-converterpdf-document-processorpipelinedocument-parserdocument-parsingpdf-to-textpodcasttts
Python 51
4 个月前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40
mbzuai-oryx / KITAB-Bench

[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding

arabicbenchmarklayout-detectionOCRpdf-to-texttable-detectionvlmsvqa
Python 42
2 个月前
https://static.github-zh.com/github_avatars/iditectweb?size=40
iditectweb / converter

Standalone .NET Converter library, not require Adobe Acrobat component nor Microsoft Office Interop Assemblies, to convert PDF, DOCX, XLSX, HTML, Image, CSV, RTF, TXT in .NET framework

pdf-to-texthtml-to-pdf
C# 40
7 年前
https://static.github-zh.com/github_avatars/seinecle?size=40
seinecle / nocodefunctions-web-app

#自然语言处理#The code base of the front-end of nocodefunctions.com

数据科学Java无代码Web appnetwork-analysis自然语言处理sentiment-analysistopic-modelingdata-processingpdf-to-texttext-mining
Java 40
14 天前
https://static.github-zh.com/github_avatars/shine-jayakumar?size=40
shine-jayakumar / Extract-Data-From-PDF-In-Python

Batch-convert pdf to text, extract data from pdf in python

pdf-converterpdf-to-textpdf-toolspdf-parserpypdf2data-extractionRegular expressiondata-cleaningpdf-to-excelpandas
Python 30
4 年前
https://static.github-zh.com/github_avatars/asika32764?size=40
asika32764 / php-pdf-2-text

Simple PHP PDF to Text class

pdfpdf-to-text
PHP 24
2 年前
https://static.github-zh.com/github_avatars/graphlit?size=40
graphlit / graphlit

#自然语言处理#Graphlit Platform

聊天机器人copilotdata框架大语言模型ragvector-databasedocument-parserinformation-retrieval自然语言处理pdf-to-jsonpdf-to-text
21
1 年前
loading...