集合主题趋势排行榜

pdf-extractor

torakiki / pdfsam

PDFsam, a desktop application to split, merge, mix, rotate PDF files and extract pages

pdf-extractor extract split JavaFX Java merge splitter combine rotate pdf pdf-manipulation

Java 3.97 k

13 天前

UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdfbox pdf pdf-document C#netstandard pdf-extractor pdf-document-processor pdf-files alto-xml hocr layout-analysis document-analysis page-xml pdf-generation

C# 2.2 k

17 小时前

DocumindHQ / documind

Open-source platform for extracting structured data from documents using AI.

人工智能大语言模型 Open Source pdf-extractor developer-tools OCR document-analysis extract-data Parser pdf pdf-converter pdf-extractor-llm

JavaScript 1.41 k

4 个月前

GowenGit / docnet

DocNET is as fast PDF editing and reading library for modern .NET applications

pdf netstandard netcore C#jpeg pdf-document pdf-converter pdf-document-processor pdf-extractor pdf-conversion pdf-files

C# 551

1 年前

pdftables / python-pdftables-api

Python library to interact with https://pdftables.com API

pdf-to-excel pdftables pdf pdf-extractor pdf-converter pdf-conversion

Python 88

7 天前

asepmaulanaismail / pdf-to-txt-python

Simple pdf to text with python using PDFtk and PyPDF2

Python pdf pdftk pypdf2 text-extraction pdf-extractor pdf-to-text

Python 21

2 年前

Siltaar / doc_crawler.py

#网络爬虫#Explore a website recursively and download all the wanted documents (PDF, ODT…)

爬虫下载器 recursive pdf-extractor web-crawler file-download

4 年前

Madgrades / madgrades-extractor

UW-Madison course and grade distribution data extraction tool.

pdf-extractor CSV SQL Java 数据库

Java 16

2 年前

deep-diver / neurips2024

#大语言模型#Read and Listen to NeurIPS 2024 Papers

人工智能 gemini 大语言模型 pdf-extractor vertex-ai

HTML 13

7 个月前

codad5 / pdfz

Your Rust PDF Document Text Extractor

pdf pdf-extractor rabbitmq Rust

Rust 11

7 个月前

talrand / DocnetExtended

DocNetExtended is a small extension library built upon the DocNet library, designed to extract text in a readable order from PDFs

pdf C#netstandard pdf-extractor

C# 10

4 年前

xiaoyao9184 / docker-marker

Docker implementation of the Marker pdf to markdown

Docker Image OCR pdf-extractor

Python 8

9 天前

bytescout / pdf-extractor-sdk-samples

ByteScout PDF Extractor SDK source code samples

pdf-extractor pdf extractor Parser pdf-to-text pdf-to-json pdf-to-excel pdf-files

C# 8

8 个月前

SR-Sujon / llamachirp

#大语言模型#Engage in dynamic conversations with PDFs to extract and comprehend information using locally hosted LLM variants of Ollama by integrating RAG.

聊天机器人大语言模型 ollama Open Source pdf-extractor rag

Python 7

1 年前

hrbrmstr / fish-stocking-pdf-data-wrangling

🐠A fishy example of how to do PDF data wrangling in R

data-wrangling pdf pdf-extractor R

R 7

3 年前

pdftables / go-pdftables-api

Go example of using the PDFTables.com API

pdf-to-excel pdf-extractor pdf-conversion pdf-converter pdf pdftables

Go 6

2 年前

renan-siqueira / python-pdf-tool

This project facilitates the extraction of text from PDF files using various Python libraries. It is designed to be flexible, allowing the choice among different text extraction libraries and supporti...

mit-license pdf pdf-extractor pdf-to-text pypdf2 Python

Python 6

2 年前