GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

document-processing

Website
Wikipedia
https://static.github-zh.com/github_avatars/ucbepic?size=40
ucbepic / docetl

#大语言模型#A system for agentic LLM-powered data processing and ETL

dataetl大语言模型Pythondata-pipelineseltworkflowagentssemantic-datallm-datadocument-processing
Python 2.14 k
4 天前
enoch3712/ExtractThinker
https://static.github-zh.com/github_avatars/enoch3712?size=40
enoch3712 / ExtractThinker

#自然语言处理#ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

人工智能大语言模型自然语言处理OCRopenaiPythondocument-image-analysisdocument-intelligencedocument-parsingdocument-processinglangchain机器学习pdfpdf-to-text
Python 1.28 k
6 天前
https://static.github-zh.com/github_avatars/dhlab-epfl?size=40
dhlab-epfl / dhSegment

Generic framework for historical document processing

Tensorflowsegmentationhistorical-dataPythondocument-processing
Python 376
4 年前
https://static.github-zh.com/github_avatars/ucbepic?size=40
ucbepic / TWIX

TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents

document-processing
Python 188
17 天前
https://static.github-zh.com/github_avatars/awslabs?size=40
awslabs / project-lakechain

#自然语言处理#⚡ Cloud-native, AI-powered, document processing pipelines on AWS.

Amazon Web Services机器视觉document-processinggenerative-ai机器学习自然语言处理retrieval-augmented-generationServerlessHacktoberfestaws-cdk
TypeScript 179
3 个月前
https://static.github-zh.com/github_avatars/formkiq?size=40
formkiq / formkiq-core

A full-featured Document Management Platform / Document Layer for your application, providing storage, discovery, processing, and retrieval. Deploys directly into your Amazon Web Services Cloud. Pleas...

amazon-web-servicesAmazon Web Servicescloud-storagedmsdocument-databasedocument-managementdocument-management-systemdocument-processingheadlessServerlessOCRoptical-character-recognition
Java 129
5 天前
https://static.github-zh.com/github_avatars/awslabs?size=40
awslabs / rhubarb

A Python framework for multi-modal document understanding with Amazon Bedrock

amazon-bedrockdocument-processinggenerative-aimulti-modal
Python 90
7 天前
https://static.github-zh.com/github_avatars/iamarunbrahma?size=40
iamarunbrahma / pdf-to-markdown

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced info...

document-conversiondocument-processinginformation-retrievalpdf-parsingpdf-to-markdownPythonragretrieval-augmented-generationtext-extractionpdf-converter
Python 83
7 个月前
https://static.github-zh.com/github_avatars/parsee-ai?size=40
parsee-ai / parsee-core

#大语言模型#Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.

document-processing大语言模型structured-datamultimodal
Python 71
25 天前
https://static.github-zh.com/github_avatars/steindani?size=40
steindani / pandoc-include

An include filter for Pandoc

pandocpandoc-filterMarkdowndocument-processing
Haskell 62
5 年前
https://static.github-zh.com/github_avatars/aws-solutions?size=40
aws-solutions / enhanced-document-understanding-on-aws

Enhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates...

document-analysisdocument-processing
JavaScript 38
10 天前
https://static.github-zh.com/github_avatars/cburschka?size=40
cburschka / lyx

Unofficial mirror of git://git.lyx.org/lyx.git (updates daily. not affiliated with lyx.org.)

mirrordocument-processingLaTeX
C++ 37
2 年前
https://static.github-zh.com/github_avatars/kili-technology?size=40
kili-technology / awesome-datasets

#自然语言处理#A comprehensive list of annotated training datasets classified by use case.

awesome-public-datasets数据集Open Datadatasetdataopen-datasetsannotation自然语言处理entity-extractionnerentity-recognitiondocument-processingOCR
34
3 年前
https://static.github-zh.com/github_avatars/jmanhype?size=40
jmanhype / DSPy-Multi-Document-Agents

#自然语言处理#An advanced distributed knowledge fabric for intelligent document processing, featuring multi-document agents, optimized query handling, and semantic understanding.

人工智能distributed-systemsdocument-processingknowledge-management自然语言处理query-optimizationvector-search
Python 34
10 个月前
https://static.github-zh.com/github_avatars/afrozas?size=40
afrozas / proceedings

Semantic extraction from conference proceedings.

conferencessemanticspaCydocument-processing
Python 31
5 年前
https://static.github-zh.com/github_avatars/abdullahshafiq-20?size=40
abdullahshafiq-20 / ResumeTex

ResumeTex is an AI-powered tool that converts standard PDF resumes into professionally formatted LaTeX documents. This service helps you create elegant, structured resumes without needing to learn LaT...

自动化developer-toolsdocument-processingExpressLaTeXNode.jsOpen Sourcepdf-parsingReactresumeTailwind CSSTeX
JavaScript 31
19 天前
https://static.github-zh.com/github_avatars/MBAigner?size=40
MBAigner / PDFSegmenter

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

pdfdocument-processingPythonlayout-analysisannotationsCSVtable
Python 22
5 年前
https://static.github-zh.com/github_avatars/ucbepic?size=40
ucbepic / BARGAIN

#大语言模型#Low-Cost LLM-Powered Data Processing with Theoretical Guarantees

人工智能datadocument-processing大语言模型
Python 18
1 个月前
https://static.github-zh.com/github_avatars/greed2411?size=40
greed2411 / tokyo

tokyo, a REST API, when given any type of document 📄, Identifies mime-type 🧐. Suggests extension 🦔. Alas Extracts text 💪.

document-processingClojureringmime-types插件extract-textfiletypetext-extraction
Clojure 18
5 年前
https://static.github-zh.com/github_avatars/eklem?size=40
eklem / stopword-trainer

#自然语言处理#A module for creating stopword lists for any language, based on a set of documents.

自然语言处理document-processinginformation-retrieval
JavaScript 15
9 个月前
loading...