pdf-processing · GitHub Topics

Document chatbot — multiple files, topics, chat windows and chat history. Powered by GPT.

openai TypeScript gpt-3 gpt-4 langchain Mongoose Next openai-api chat 聊天机器人 document-embedding pdf-processing pinecone React Tailwind CSS vectorization

TypeScript 8641

2 年前

allenai / papermage

#自然语言处理#library supporting NLP and CV research on scientific papers

机器视觉机器学习 multimodal 自然语言处理 pdf-processing scientific-papers Python

Python 782

10 个月前

ahmedkhemiri95 / PDFs-TextExtract

Multiple and Large PDF Documents Text Extraction.

pdf Parser 数据科学 Python pdf-processing extract-text pdf-document pypdf2 pdfs

Python 131

7 个月前

Tele-AI / doc-ops-mcp

MCP server for seamless document format conversion and processing

document-conversion document-processing docx-to-pdf file-converter markdown-converter pdf-conversion watermark pdf-processing

TypeScript 99

7 天前

postralai / masquerade

The Privacy Firewall for LLMs

claude mcp pdf-processing 隐私 mcp-server model-context-protocol anonymization

Python 73

1 个月前

aws-samples / document-processing-pipeline-for-regulated-industries

#计算机科学#A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.

机器学习 Amazon Web Services cdk aws-lambda amazon-web-services amazon-textract amazon-dynamodb amazon-s3 amazon-sqs aws-cdk pdf-processing 图像处理 data-analytics data-lineage data-governance

Python 64

4 年前

PSPDFKit / nutrient-dws-client-python

Official Python client library for Nutrient Document Web Services API - PDF processing, OCR, watermarking, and document manipulation with automatic Office format conversion

ocr-python pdf-converter pdf-document-processor pdf-generation pdf-processing Python

Python 54

7 天前

PSPDFKit-labs / nutrient-dws-client-typescript

This library provides a type-safe and ergonomic interface for document processing operations including conversion, merging, compression, watermarking, and text extraction using Nutrient DWS Processor ...

pdf-converter pdf-document-processor pdf-generation pdf-processing TypeScript

TypeScript 35

25 天前

autollama / autollama

#大语言模型#Anthropic's Contextual Retrieval implementation with visual chunk comparison. Preview context enrichment before/after embedding.

人工智能自动化聊天机器人 Docker document-processing embeddings knowledge-base 大语言模型 Node.js openai pdf-processing rag React semantic-search vector-database

HTML 24

13 天前

Govind-S-B / pdf-to-text-chroma-search

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma ...

chromadb pdf-processing similarity-search text-extraction

Python 23

2 年前

ranguy9304 / LangGraphRAG

#自然语言处理#LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP resea...

聊天机器人 information-retrieval langgraph 自然语言处理 openai-api pdf-processing Python rag vector-database web-scraping

Python 15

1 年前

ManasMadan / pdf-actions

A NPM Package built on top of pdf-lib that provides functonalities like merge, rotate, split,download pdf to disk and many more...

pdf pdf-merger React react-component pdf-processing pdf-lib JavaScript npm

JavaScript 14

2 年前

DioCrafts / ai-book-summarizer

#自然语言处理#📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, ...

人工智能自动化 document-analysis knowledge-extraction 机器学习 Markdown 自然语言处理 openai pdf pdf-processing Python study-materials text-analysis

Python 8

8 个月前

Remy2404 / Polymind

Polymind is a powerful multi-modal Telegram bot built with Gemini, DeepSeek, OpenRouter, and over 50 cutting-edge AI models. It offers seamless conversational intelligence, Mermaid diagram rendering, ...

gemini Telegram deepseek-r1 图像处理 voice voice-recognition ai-assistant multi-model openrouter pdf-processing

Python 7

4 天前

ManasMadan / PDFActions

Built with pdf-actions NPM package.

React pdf react-components react-component pdf-merger pdf-lib pdf-processing

JavaScript 7

1 年前

Inc44 / MaTools

An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.

application audio-processing file-management GUI 图像处理 OCR pdf-processing productivity Python Qt Rust speech-recognition video-processing youtube-downloader

Python 6

6 个月前

enesmanan / paper-bold

AI-powered RAG-based tool for summarizing, extracting insights, and answering questions about research papers with high accuracy

gemini-api langchain pdf-processing rag academic-paper

HTML 6

6 个月前

allanninal / document-summarizer

#自然语言处理#The Document Summarizer leverages Hugging Face’s facebook/bart-large-cnn model to transform lengthy documents into concise summaries. Built with ReactJS (Vite) for the frontend and Flask for the backe...

ai-tools Flask huggingface 自然语言处理 pdf-processing React Vite

JavaScript 4

9 个月前