PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
Get your documents ready for gen AI
#自然语言处理#Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...
Knowledge Agents and Management in the Cloud
#自然语言处理#ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
#大语言模型#Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.
#自然语言处理#Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines
#大语言模型#Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers
A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested for support of tables, equations, handwriting, two-column layouts, and multi-column layouts...
Jupyter notebooks testing different OCR models for document parsing (Dolphin, MonkeyOCR, Marker, Nanonets, ...)
#自然语言处理#A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines...
A Unified Toolkit for Deep Learning-Based Table Extraction
#计算机科学#Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Docling4j brings the functionalities of Docling in document understanding to Java® projects
Official implementation of our ECCVW paper "μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context"
Applicant Tracking System (ATS): A powerful platform leveraging generative AI and soft-match algorithms to analyze resumes against job descriptions. Built with React and Node.js, it streamlines hiring...
Safe, Open, High-Performance — OpenDataLoader PDF for AI
#自然语言处理#The metadata and text content extractor for almost every file type.
Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data
This is a collection of various document parsers and hands-on to construct structured data for your RAG applications.