#自然语言处理#Transforms PDF, Documents and Images into Enriched Structured Data
#计算机科学#Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
extract internal monitoring data from application logs for collection in a timeseries database
a library for audio and music analysis
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Provides functions to read and write from/to an object or array using a simple string notation
Extract files from any kind of container formats
#自然语言处理#An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
#自然语言处理#Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
#自然语言处理#Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
#大语言模型#🦜⛏️ Did you say you like data?
A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
#自然语言处理#Stanford Open Information Extraction made simple!
#自然语言处理#北京航空航天大学大数据高精尖中心自然语言处理研究团队对信息抽取领域的调研。包括实体识别,关系抽取,属性抽取等子任务,每类子任务分别对学术界和工业界进行调研。
File Injector is a script that allows you to store any file in an image using steganography
DataTool is a program that lets you extract models, maps, and files from Overwatch.
PHP URI Template (RFC 6570) supports both URI expansion & extraction