GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

document-analysis

Website
Wikipedia
https://static.github-zh.com/github_avatars/opendatalab?size=40
opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

extract-datalayout-analysisOCRParserpdfpdf-converterPythondocument-analysispdf-parserpdf-extractor-llmpdf-extractor-pretrainpdf-extractor-ragai4science
Python 35.05 k
1 天前
https://static.github-zh.com/github_avatars/UglyToad?size=40
UglyToad / PdfPig

Read and extract text and other content from PDFs in C# (port of PDFBox)

pdfboxpdfpdf-documentC#netstandardpdf-extractorpdf-document-processorpdf-filesalto-xmlhocrlayout-analysisdocument-analysispage-xmlpdf-generation
C# 2.04 k
14 天前
https://static.github-zh.com/github_avatars/AlibabaResearch?size=40
AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

人工智能documentaimultimodalmultimodal-deep-learningOCR机器视觉vision-language-transformerend-to-end-ocrscene-text-detectionscene-text-detection-recognitionscene-text-recognitiontext-detectiontext-recognitionvision-languagedocumentdocument-analysisdocument-recognitiondocument-understandingdocument-intelligencevision-language-model
C++ 1.73 k
2 个月前
https://static.github-zh.com/github_avatars/tstanislawek?size=40
tstanislawek / awesome-document-understanding

#自然语言处理#A curated list of resources for Document Understanding (DU) topic

Awesome Lists机器学习information-extractionkey-information-extractiondocument-understandingrobotic-process-automationdocument-analysisdocument-layout-analysisOCR自然语言处理深度学习pdfrpapdf-documentsdocument-intelligenceunstructured-datadocument-ai
1.42 k
2 年前
https://static.github-zh.com/github_avatars/DocumindHQ?size=40
DocumindHQ / documind

Open-source platform for extracting structured data from documents using AI.

人工智能大语言模型Open Sourcepdf-extractordeveloper-toolsOCRdocument-analysisextract-dataParserpdfpdf-converterpdf-extractor-llm
JavaScript 1.33 k
1 个月前
https://static.github-zh.com/github_avatars/Yuliang-Liu?size=40
Yuliang-Liu / Curve-Text-Detector

#计算机科学#This repository provides train&test code, dataset, det.&rec. annotation, evaluation script, annotation tool, and ranking.

深度学习document-analysisobject-detectionscene-text
Jupyter Notebook 648
5 年前
https://static.github-zh.com/github_avatars/NanoNets?size=40
NanoNets / docext

#自然语言处理#An on-premises, OCR-free unstructured data extraction and benchmarking toolkit. (https://idp-leaderboard.org/)

documentdocument-analysisextraction大语言模型机器学习自然语言处理OCRragunstructured-datavlmstable-extraction
Python 612
5 天前
https://static.github-zh.com/github_avatars/wenwenyu?size=40
wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

key-information-extractiondocument-analysisgraph-neural-networksgraph-learningdocument-understanding
Python 563
1 年前
https://static.github-zh.com/github_avatars/ispras?size=40
ispras / dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic...

docdocxodtdocumentsexcelpdftxtOCRscanned-documentstable-recognitionHTMLhtml-parserpdf-parserdocument-analysis
Python 478
25 天前
https://static.github-zh.com/github_avatars/jpWang?size=40
jpWang / LiLT

#自然语言处理#Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)

自然语言处理document-aidocument-analysisdocument-understandinginformation-extractionmultimodal-pre-trained-model
Python 352
3 年前
https://static.github-zh.com/github_avatars/CybercentreCanada?size=40
CybercentreCanada / assemblyline

AssemblyLine 4: File triage and malware analysis

malware-analysismalware-researchmalware-detectionCybersecurityincident-responseMalwareautomation-frameworkcertcyber-securitydocument-analysis框架Pythonsecurity-automation安全
Python 326
4 天前
https://static.github-zh.com/github_avatars/lazyFrogLOL?size=40
lazyFrogLOL / llmdocparser

#自然语言处理#A package for parsing PDFs and analyzing their content using LLMs.

大语言模型自然语言处理OCRragchunkingdocument-analysispdf-parser
Python 271
10 个月前
https://static.github-zh.com/github_avatars/pandora-analysis?size=40
pandora-analysis / pandora

Pandora is an analysis framework to discover if a file is suspicious and conveniently show the results

Cybersecuritydocument-analysismalware-detection
Python 263
5 天前
https://static.github-zh.com/github_avatars/masyagin1998?size=40
masyagin1998 / robin

#计算机科学#RObust document image BINarization

PythonOpenCVKerasneural-networks深度学习OCR机器视觉document-analysis
Python 181
10 个月前
https://static.github-zh.com/github_avatars/chriswolfvision?size=40
chriswolfvision / local_adaptive_binarization

Local adaptive image binarization

机器视觉document-analysis
C++ 126
2 年前
https://static.github-zh.com/github_avatars/mirabdullahyaser?size=40
mirabdullahyaser / Retrieval-Augmented-Generation-Engine-with-LangChain-and-Streamlit

#自然语言处理#Powerful web application that combines Streamlit, LangChain, and Pinecone to simplify document analysis. Powered by OpenAI's GPT-3, RAG enables dynamic, interactive document conversations, making it i...

人工智能chat-applicationdocument-analysisgenerative-ailangchainlarge-language-models自然语言处理openai-chatgptquestion-answeringretrieval-augmented-generationStreamlitgpt-3
Python 120
1 年前
https://static.github-zh.com/github_avatars/anisha2102?size=40
anisha2102 / docvqa

#计算机科学#Document Visual Question Answering

visual-question-answering机器视觉深度学习document-analysis
Python 119
5 年前
https://static.github-zh.com/github_avatars/ppaanngggg?size=40
ppaanngggg / yolo-doclaynet

YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis

document-analysislayout-analysisultralyticsyoloyolov8
Python 114
3 个月前
https://static.github-zh.com/github_avatars/aws-samples?size=40
aws-samples / amazon-textract-transformer-pipeline

Post-process Amazon Textract results with Hugging Face transformer models for document understanding

amazon-textracthuggingface-transformersdocument-analysisOCR
Python 96
6 个月前
https://static.github-zh.com/github_avatars/monniert?size=40
monniert / docExtractor

(ICFHR 2020 oral) Code for "docExtractor: An off-the-shelf historical document element extraction" paper

document-analysissegmentationhistorical-dataPyTorch
Python 88
2 年前
loading...