pdf-parsing · GitHub Topics

py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

pypdf2 pdf Python pdf-parser pdf-parsing pdf-manipulation pdf-documents help-wanted

Python 9.4 k

3 天前

jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

pdf pdf-parsing table-extraction

Python 8.28 k

2 个月前

galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams

pdf-generation pdf-parsing Node.js pdf-manipulation

C 1.17 k

2 个月前

adithya-s-k / marker-api

Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.

FastAPI pdf-converter pdf-files pdf-parser pdf-parsing API REST API

Python 894

1 年前

drmingler / docling-api

Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) into Markdown. With support for both CPU and GPU processing, it is...

API FastAPI markdown-parser pdf-conversion pdf-converter pdf-parser pdf-parsing pdf-to-markdown

Python 688

6 个月前

jstockwin / py-pdf-parser

A Python tool to help extracting information from structured PDFs.

pdf Parsing pdf-parsing

Python 411

1 个月前

chunyenHuang / hummusRecipe

A powerful PDF tool for NodeJS based on HummusJS.

pdf pdf-files pdf-generation pdf-parsing pdf-manipulation Node.js

JavaScript 349

2 年前

thoqbk / traprange

(Java)A Method to Extract Tabular Content from PDF Files

Java pdf pdfbox Parser pdf-parsing pdf-manipulation pdf-files

HTML 335

2 年前

ck-unifr / pdf_parsing

#大语言模型#PDF解析（文字，章节，表格，图片，参考），基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答，摘要，信息抽取

langchain 大语言模型 pdf pdf-parsing rwkv Python chatglm2-6b information-extraction chatpdf Streamlit

Python 206

2 年前

ScientaNL / pdf-extractor

Node.js module for rendering pdf pages to images, svgs, html files, text files and json metadata

pdf-parsing Node.js image-generation

JavaScript 100

2 年前

iamarunbrahma / pdf-to-markdown

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced info...

document-conversion document-processing information-retrieval pdf-parsing pdf-to-markdown Python rag retrieval-augmented-generation text-extraction pdf-converter

Python 94

10 个月前

rostrovsky / pdf-table

Java utility for parsing PDF tabular data using Apache PDFBox and OpenCV

OpenCV opencv3 pdfbox tables table Java java-library pdf-parsing

Java 80

2 年前

hellpanderrr / linkedin-pdf-parsing

Parsing resumes in a PDF format from linkedIn

linkedin Python pdf-parsing resume-parser

Python 68

9 年前

tuffstuff9 / nextjs-pdf-parser

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

content-extraction filepond Next pdf-parser pdf-parsing

TypeScript 63

2 年前

dipietrantonio / pdf4py

A PDF parser written in Python 3 with no external dependencies.

pdf Parser pdf-parsing Python information-extraction

Python 57

5 年前

abdullahshafiq-20 / ResumeTex

ResumeTex is an AI-powered tool that converts standard PDF resumes into professionally formatted LaTeX documents. This service helps you create elegant, structured resumes without needing to learn LaT...

自动化 developer-tools document-processing Express LaTeX Node.js Open Source pdf-parsing React resume Tailwind CSS TeX

JavaScript 37

12 天前

DQ-Zhang / refchaser

Written in python, for checking reference lists in systematic reviews and literature reviews, helps with reference list searching both backward&forward by extracting references and creating search que...

research-paper text-mining pdf-parsing

Python 23

5 年前

adrienjoly / npm-pdfreader-example

Example of use of pdfreader: parse a PDF résumé

pdf-parsing Example

JavaScript 16

3 年前

malice-plugins / pdf

Malice PDF Plugin

malice Malware pdf 插件 pdf-parsing Docker malware-analysis

Python 16

7 年前

aimaster-dev / chatbot-using-rag-and-langchain

#自然语言处理#Chat with your PDFs using AI! This Streamlit app uses RAG, LangChain, FAISS, and OpenAI to let you ask questions and get answers with page and file references.

聊天机器人 langchain 大语言模型 rag Streamlit 人工智能 chat-ui document-search embeddings faiss 自然语言处理 openai pdf pdf-parsing Python semantic-search vector-store

Python 13

4 个月前