PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
Get your documents ready for gen AI
#自然语言处理#Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to...
Knowledge Agents and Management in the Cloud
#自然语言处理#ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
#自然语言处理#Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines
#大语言模型#Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers
#自然语言处理#A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GROBID, LangChain, listen as podcast. Customize your own pipelines...
A Unified Toolkit for Deep Learning-Based Table Extraction
#计算机科学#Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Docling4j brings the functionalities of Docling in document understanding to Java® projects
Applicant Tracking System (ATS): A powerful platform leveraging generative AI and soft-match algorithms to analyze resumes against job descriptions. Built with React and Node.js, it streamlines hiring...
Tool for converting First National Bank (FNB) bank statement PDFs into useful structured data
#自然语言处理#The metadata and text content extractor for almost every file type.
#计算机科学#Opinionated and Sophisticated Document Region Analyzer.
ZubairHub is a Streamlit-based application that integrates various functionalities, including social graph visualization, object detection, document parsing, text extraction, generative AI interaction...
Supercharge your AI workflows by combining Anyparser’s advanced content extraction with Crew AI. With this integration, you can effortlessly leverage Anyparser’s document processing and data extractio...
Parsing Documents to one datatype (Typescript port of Docling) (NOT STARTED!)
PhraseSpeaker: Effortlessly dictate specific sections of text files with macOS's text-to-speech. Perfect for navigating and audibly extracting key content from large documents!