GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

document-analysis

Website
Wikipedia
https://static.github-zh.com/github_avatars/opendatalab?size=40
opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

extract-datalayout-analysisOCRParserpdfpdf-converterPythondocument-analysispdf-parserpdf-extractor-llmpdf-extractor-pretrainpdf-extractor-ragai4science
Python 41.04 k
3 小时前
https://static.github-zh.com/github_avatars/bytedance?size=40
bytedance / Dolphin

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

document-analysislayout-analysisOCRParserpdfpdf-converterpdf-parserPython
Python 4.43 k
21 天前
https://static.github-zh.com/github_avatars/UglyToad?size=40
UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdfboxpdfpdf-documentC#netstandardpdf-extractorpdf-document-processorpdf-filesalto-xmlhocrlayout-analysisdocument-analysispage-xmlpdf-generation
C# 2.11 k
4 天前
https://static.github-zh.com/github_avatars/AlibabaResearch?size=40
AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

人工智能documentaimultimodalmultimodal-deep-learningOCR机器视觉vision-language-transformerend-to-end-ocrscene-text-detectionscene-text-detection-recognitionscene-text-recognitiontext-detectiontext-recognitionvision-languagedocumentdocument-analysisdocument-recognitiondocument-understandingdocument-intelligencevision-language-model
C++ 1.75 k
4 个月前
https://static.github-zh.com/github_avatars/NanoNets?size=40
NanoNets / docext

#自然语言处理#An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

documentdocument-analysisextraction大语言模型机器学习自然语言处理OCRragunstructured-datavlmsonpremdocument-data-extractionocr-onpremisellm-ocronprem-ocronprem-visiononpremisetable-extractiondocument-information-extractionocr-benchmark
Python 1.57 k
1 个月前
https://static.github-zh.com/github_avatars/tstanislawek?size=40
tstanislawek / awesome-document-understanding

#自然语言处理#A curated list of resources for Document Understanding (DU) topic

Awesome Lists机器学习information-extractionkey-information-extractiondocument-understandingrobotic-process-automationdocument-analysisdocument-layout-analysisOCR自然语言处理深度学习pdfrpapdf-documentsdocument-intelligenceunstructured-datadocument-ai
1.45 k
2 年前
https://static.github-zh.com/github_avatars/DocumindHQ?size=40
DocumindHQ / documind

Open-source platform for extracting structured data from documents using AI.

人工智能大语言模型Open Sourcepdf-extractordeveloper-toolsOCRdocument-analysisextract-dataParserpdfpdf-converterpdf-extractor-llm
JavaScript 1.36 k
3 个月前
https://static.github-zh.com/github_avatars/Yuliang-Liu?size=40
Yuliang-Liu / Curve-Text-Detector

#计算机科学#This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

深度学习document-analysisobject-detectionscene-text
Jupyter Notebook 648
5 年前
https://static.github-zh.com/github_avatars/wenwenyu?size=40
wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

key-information-extractiondocument-analysisgraph-neural-networksgraph-learningdocument-understanding
Python 563
1 年前
https://static.github-zh.com/github_avatars/ispras?size=40
ispras / dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic...

docdocxodtdocumentsexcelpdftxtOCRscanned-documentstable-recognitionHTMLhtml-parserpdf-parserdocument-analysis
Python 519
3 天前
https://static.github-zh.com/github_avatars/jpWang?size=40
jpWang / LiLT

#自然语言处理#Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)

自然语言处理document-aidocument-analysisdocument-understandinginformation-extractionmultimodal-pre-trained-model
Python 351
3 年前
https://static.github-zh.com/github_avatars/CybercentreCanada?size=40
CybercentreCanada / assemblyline

AssemblyLine 4: File triage and malware analysis

malware-analysismalware-researchmalware-detectionCybersecurityincident-responseMalwareautomation-frameworkcertcyber-securitydocument-analysis框架Pythonsecurity-automation安全
Python 337
6 天前
https://static.github-zh.com/github_avatars/lazyFrogLOL?size=40
lazyFrogLOL / llmdocparser

#自然语言处理#A package for parsing PDFs and analyzing their content using LLMs.

大语言模型自然语言处理OCRragchunkingdocument-analysispdf-parser
Python 272
1 年前
https://static.github-zh.com/github_avatars/pandora-analysis?size=40
pandora-analysis / pandora

Pandora is an analysis framework to discover if a file is suspicious and conveniently show the results

Cybersecuritydocument-analysismalware-detection
Python 265
6 天前
https://static.github-zh.com/github_avatars/masyagin1998?size=40
masyagin1998 / robin

#计算机科学#RObust document image BINarization

PythonOpenCVKerasneural-networks深度学习OCR机器视觉document-analysis
Python 183
1 年前
https://static.github-zh.com/github_avatars/chriswolfvision?size=40
chriswolfvision / local_adaptive_binarization

Local adaptive image binarization

机器视觉document-analysis
C++ 126
2 年前
https://static.github-zh.com/github_avatars/ppaanngggg?size=40
ppaanngggg / yolo-doclaynet

YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis

document-analysislayout-analysisultralyticsyoloyolov8
Python 123
7 天前
https://static.github-zh.com/github_avatars/mirabdullahyaser?size=40
mirabdullahyaser / Retrieval-Augmented-Generation-Engine-with-LangChain-and-Streamlit

#自然语言处理#Powerful web application that combines Streamlit, LangChain, and Pinecone to simplify document analysis. Powered by OpenAI's GPT-3, RAG enables dynamic, interactive document conversations, making it i...

人工智能chat-applicationdocument-analysisgenerative-ailangchainlarge-language-models自然语言处理openai-chatgptquestion-answeringretrieval-augmented-generationStreamlitgpt-3
Python 122
1 年前
https://static.github-zh.com/github_avatars/anisha2102?size=40
anisha2102 / docvqa

#计算机科学#Document Visual Question Answering

visual-question-answering机器视觉深度学习document-analysis
Python 120
5 年前
https://static.github-zh.com/github_avatars/aws-samples?size=40
aws-samples / amazon-textract-transformer-pipeline

Post-process Amazon Textract results with Hugging Face transformer models for document understanding

amazon-textracthuggingface-transformersdocument-analysisOCR
Python 97
8 个月前
loading...