#

document-processing

enoch3712/ExtractThinker
https://static.github-zh.com/github_avatars/enoch3712?size=40
Python 1.4 k
19 天前
https://static.github-zh.com/github_avatars/dhlab-epfl?size=40
Python 379
4 年前
https://static.github-zh.com/github_avatars/ucbepic?size=40

TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents

Python 204
4 个月前
https://static.github-zh.com/github_avatars/formkiq?size=40

A full-featured Document Management Platform / Document Layer for your application, providing storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. Pleas...

Java 141
2 天前
https://static.github-zh.com/github_avatars/iamarunbrahma?size=40

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced info...

Python 94
10 个月前
https://static.github-zh.com/github_avatars/awslabs?size=40

A Python framework for multi-modal document understanding with Amazon Bedrock

Python 94
14 天前
https://static.github-zh.com/github_avatars/parsee-ai?size=40

#大语言模型#Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.

Python 73
18 天前
https://static.github-zh.com/github_avatars/PSPDFKit?size=40

A Model Context Protocol (MCP) server implementation exposes document processing capabilities through natural language, supporting both direct human interaction and AI agent tool calling.

TypeScript 56
2 个月前
https://static.github-zh.com/github_avatars/jmanhype?size=40

#自然语言处理#An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.

Python 45
1 年前
https://static.github-zh.com/github_avatars/aws-solutions?size=40

Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates...

JavaScript 40
4 天前
https://static.github-zh.com/github_avatars/cburschka?size=40

Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)

C++ 39
2 年前
https://static.github-zh.com/github_avatars/abdullahshafiq-20?size=40

ResumeTex is an AI-powered tool that converts standard PDF resumes into professionally formatted LaTeX documents. This service helps you create elegant, structured resumes without needing to learn LaT...

JavaScript 37
13 天前
https://static.github-zh.com/github_avatars/afrozas?size=40

Semantic extraction from conference proceedings.

Python 31
5 年前
https://static.github-zh.com/github_avatars/autollama?size=40
HTML 24
14 天前
https://static.github-zh.com/github_avatars/MBAigner?size=40

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

Python 23
5 年前
loading...
Website
Wikipedia