#

html-extraction

https://static.github-zh.com/github_avatars/bookieio?size=40

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

HTML 205
1 年前
https://static.github-zh.com/github_avatars/html-extract?size=40

#网络爬虫#Domain-specific language for extracting structured data from HTML documents

C++ 54
4 个月前
https://static.github-zh.com/github_avatars/Whomrx666?size=40

Xtract-html is a tool for extracting HTML display code from a website, which you can also use for your website.

Python 5
7 个月前
https://static.github-zh.com/github_avatars/Whomrx666?size=40

Xtract-htmlV2 is a tool for getting the HTML code from the website you want and is the successor to the previous version

Python 4
7 个月前
https://static.github-zh.com/github_avatars/shmdoc?size=40

Script for extracting units from http://vocab.nerc.ac.uk/collection/P06/current/ to easily add units to the database (This should only be temporarily to demonstrate how units can work)

HTML 0
5 年前
https://static.github-zh.com/github_avatars/9dl?size=40

extracts and saves HTML, CSS, and JavaScript files from a specified URL.

C# 0
1 年前
https://static.github-zh.com/github_avatars/RayenMalouche?size=40

A Java-based server leveraging Apache Tika to extract content and metadata from files (PDF, DOCX, TXT, etc.) in a local files-to-extract directory. Supports HTML (with CSS styling) and text extraction...

Java 0
17 天前
Website
Wikipedia