#

html-extractor

https://static.github-zh.com/github_avatars/bookieio?size=40

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

HTML 205
1 年前
https://static.github-zh.com/github_avatars/cdimascio?size=40

#网络爬虫#Automatically extract the main text content (and more) from an HTML document

Kotlin 118
3 年前
https://static.github-zh.com/github_avatars/zezhix?size=40

基于行块分布函数的通用网页正文抽取算法优化,Python实现

Python 60
6 年前
https://static.github-zh.com/github_avatars/kwaziidev?size=40
Go 15
3 年前
https://static.github-zh.com/github_avatars/JanDC?size=40

PHP library which determines which css is used from html snippets.

PHP 9
6 年前
https://static.github-zh.com/github_avatars/Whomrx666?size=40

Xtract-html is a tool for extracting HTML display code from a website, which you can also use for your website.

Python 5
7 个月前
https://static.github-zh.com/github_avatars/Whomrx666?size=40

Xtract-htmlV2 is a tool for getting the HTML code from the website you want and is the successor to the previous version

Python 4
7 个月前
https://static.github-zh.com/github_avatars/davidmillerpak?size=40

Media Graper is a open source tool for Linux which is developed to extract all the Images, links, Videos from a Webpage.

Shell 2
2 年前
https://static.github-zh.com/github_avatars/the-real-yey?size=40

A simple extractor based on BeatufulSoup, You can use it to iterate through all the HTML files in the website root directory and get the text, placeholders and other text.

Python 0
6 年前
https://static.github-zh.com/github_avatars/MorrisGlr?size=40

HTML‐to‐Anki Enhanced Human Explanation & Reasoning Tool (HEART). A Python CLI that leverages the OpenAI API to transform full UWorld vignettes into AI-enhanced Anki flashcards.

Python 0
3 个月前
https://static.github-zh.com/github_avatars/RayenMalouche?size=40

A Java-based server leveraging Apache Tika to extract content and metadata from files (PDF, DOCX, TXT, etc.) in a local files-to-extract directory. Supports HTML (with CSS styling) and text extraction...

Java 0
17 天前
Website
Wikipedia