GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

content-extraction

Website
Wikipedia
https://static.github-zh.com/github_avatars/mendableai?size=40
mendableai / firecrawl-mcp-server

🔥 Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.

batch-processingclaudecontent-extractiondata-collectionfirecrawlfirecrawl-aillm-toolsmcp-servermodel-context-protocolsearch-apiweb-crawlerweb-scrapingjavascript-rendering
JavaScript 3.99 k
7 天前
https://static.github-zh.com/github_avatars/graphlit?size=40
graphlit / graphlit-mcp-server

Model Context Protocol (MCP) Server for Graphlit Platform

claudecontent-extractiondata-collectionllm-toolsmcp-servermodel-context-protocolsearch-apiunstructured-dataweb-crawlerweb-scraping
TypeScript 342
18 小时前
https://static.github-zh.com/github_avatars/currentslab?size=40
currentslab / extractnet

#计算机科学#A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

content-extractionwebscrapingweb-scrapingtext-miningnews机器学习Python
HTML 287
2 个月前
https://static.github-zh.com/github_avatars/mvasilkov?size=40
mvasilkov / readability2

Readability2 converts HTML to plain text.

JavaScriptreadabilityHTMLplaintextcontent-extraction
TypeScript 108
7 年前
https://static.github-zh.com/github_avatars/tuffstuff9?size=40
tuffstuff9 / nextjs-pdf-parser

Next.js template for seamless PDF parsing using pdf2json and FilePond. Ideal for developers seeking a ready-to-use solution for PDF content extraction in Next.js projects.

content-extractionfilepondNextpdf-parserpdf-parsing
TypeScript 62
2 年前
https://static.github-zh.com/github_avatars/gregors?size=40
gregors / boilerpipe-ruby

Pure ruby implementation of the Boilerpipe content extraction algorithm tuned for online articles

content-extractionwebscrapingnews
Ruby 43
4 年前
https://static.github-zh.com/github_avatars/oiwn?size=40
oiwn / dom-content-extraction

#网络爬虫#DOM Based Content Extraction via Text Density

scrapingcontent-extraction
Rust 34
3 个月前
https://static.github-zh.com/github_avatars/nikitautiu?size=40
nikitautiu / learnhtml

#计算机科学#Web content extraction using machine learning

深度学习HTMLcontent-extraction
HTML 33
4 年前
https://static.github-zh.com/github_avatars/spences10?size=40
spences10 / mcp-jinaai-reader

🔍 Model Context Protocol (MCP) tool for parsing websites using the Jina.ai Reader

content-extractiondocumentation-toolllm-toolsmcpmodel-context-protocoltext-extractionweb-scraping
JavaScript 30
4 个月前
https://static.github-zh.com/github_avatars/pdfix?size=40
pdfix / pdfix_sdk_example_cpp

Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...

pdfuadigital-signaturepdf-converterpdf-manipulationextract-datawatermarkHTMLmetadataconversionconvertertaggingwcagsignpdfcontent-extractionWeb Accessibility (a11y)
C++ 20
5 个月前
https://static.github-zh.com/github_avatars/gdamdam?size=40
gdamdam / sumo

#自然语言处理#Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more

自然语言处理content-extractionnltkentity-recognitionsemantic-analysis
Python 20
7 年前
https://static.github-zh.com/github_avatars/timoteostewart?size=40
timoteostewart / benson

Benson turns a list of URLs into mp3s of the contents of each web page - take control over your reading backlog!

content-extractionweb-scrapingproductivity
Python 14
9 个月前
https://static.github-zh.com/github_avatars/bencmc?size=40
bencmc / youtube_video_summarizer

#自然语言处理#This repository houses a Python application for extracting YouTube video transcripts and summarizing its content.

content-extractiongpt-35-turbonatural自然语言处理openaiPythontext-processingvideo-processingyoutube-apilangchain-pythonStreamlit
Python 14
2 年前
https://static.github-zh.com/github_avatars/zeoagency?size=40
zeoagency / mobile-first-indexing-tool

Mobile First Indexing Tool

搜索引擎优化 (SEO)content-extractionaws-lambdalighthouse
Python 12
3 年前
https://static.github-zh.com/github_avatars/peremenov?size=40
peremenov / seize

Seize is light Node or Browser web-page content extractor inspired by arc90 readability and Safari Reader

content-extractionDocument Object Model (DOM)readabilityextractreader
HTML 12
8 年前
https://static.github-zh.com/github_avatars/LandWhale2?size=40
LandWhale2 / TD-Spider

#网络爬虫#Via Text Density Simple Web Crawler With Go

Goweb-crawlercontent-extractiondata-miningDocument Object Model (DOM)Open Sourcescraping
Go 12
2 年前
https://static.github-zh.com/github_avatars/vakharwalad23?size=40
vakharwalad23 / mark-minion

The Ultimate Web Content Extraction & Conversion Tool for AI/LLM Applications. Convert almost any web content into clean Markdown with intelligent AI processing.

TypeScriptai-poweredcloudflare-workercontent-extractiondocument-processingPuppeteerweb-scraping
TypeScript 9
1 个月前
https://static.github-zh.com/github_avatars/amirthfultehrani?size=40
amirthfultehrani / Youtube-Transcript-Copier

A userscript that adds a button to YouTube video pages for copying the transcript with or without timestamps.

Web Accessibility (a11y)自动化browser-extensionclipboardcontent-extractiondata-extractionUserscriptshelperJavaScriptproductivitytext-extraction工具transcriptutilitiesVideoWebYouTube
JavaScript 8
6 个月前
https://static.github-zh.com/github_avatars/kamjin3086?size=40
kamjin3086 / Crawell

📸 Crawell – 网页图片/正文一键提取、Markdown 转换与批量下载的浏览器扩展,本地化,免费 Crawell browser extension for one-click image & article extraction, Markdown conversion and bulk download – 100 % local processing.

browser-extensionChrome 插件content-extractionedge-extensionFirefox 插件Markdownprivacy-firstReactTailwind CSSTypeScriptweb-scraping
TypeScript 7
2 天前
https://static.github-zh.com/github_avatars/pinkpixel-dev?size=40
pinkpixel-dev / web-scout-mcp

#网络爬虫#A powerful MCP server extension providing web search and content extraction capabilities. Integrates DuckDuckGo search functionality and URL content extraction into your MCP environment, enabling AI a...

ai-assistantai-toolscheeriocontent-extraction爬虫DuckDuckGogoogle-searchmcpmcp-serverweb-crawlerweb-scraperweb-scrapingweb-search
JavaScript 7
1 个月前
loading...