GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub
集合主题趋势排行榜
#

web-crawler

Website
Wikipedia
https://static.github-zh.com/github_avatars/mendableai?size=40
mendableai / firecrawl

#网络爬虫#Firecrawl 是一种 API 服务,它爬取URL并将其转换为清洗过的 markdown 或结构化数据

人工智能爬虫dataMarkdownscraperhtml-to-markdown大语言模型ragscrapingweb-crawlerai-scrapingwebscraping
TypeScript 39.99 k
2 天前
https://static.github-zh.com/github_avatars/ScrapeGraphAI?size=40
ScrapeGraphAI / Scrapegraph-ai

#网络爬虫#Python scraper based on AI

scrapingscraping-pythonautomated-scraper大语言模型人工智能web-crawlerweb-scrapingai-scraping爬虫html-to-markdownMarkdownrag
Python 20 k
2 天前
apify/crawlee
https://static.github-zh.com/github_avatars/apify?size=40
apify / crawlee

#网络爬虫#Crawlee - 一个用于Node.js 开发的网页爬虫和浏览器自动化库

web-scrapingweb-crawlingnpmheadless-chromePuppeteer自动化apifyscrapingcrawling爬虫headlessscraperweb-crawlerJavaScriptNode.jsPlaywrightTypeScript
TypeScript 17.92 k
2 天前
crawlab-team/crawlab
https://static.github-zh.com/github_avatars/crawlab-team?size=40
crawlab-team / crawlab

#网络爬虫#Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

webcrawlerscrapycrawlabspiders-managementGoscrapyd-uispider爬虫webspiderweb-crawlerDockerplatformcrawling-tasks
Go 11.77 k
2 天前
https://static.github-zh.com/github_avatars/ssssssss-team?size=40
ssssssss-team / spider-flow

#网络爬虫#新一代爬虫平台,以图形化方式定义爬虫流程,不写代码即可完成爬虫。

spider爬虫jsoupxpathweb-spiderwebspiderwebcrawlerweb-crawlerspider-flow
Java 10.01 k
2 年前
https://static.github-zh.com/github_avatars/BruceDone?size=40
BruceDone / awesome-crawler

#网络爬虫#A collection of awesome web crawler,spider in different languages

web-crawler爬虫web-scraperspiderscraperAwesome Lists
6.8 k
1 年前
https://static.github-zh.com/github_avatars/adithya-s-k?size=40
adithya-s-k / omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

OCRomniparserparse-serverparser-libraryvision-transformerweb-crawler
Python 6.58 k
4 天前
https://static.github-zh.com/github_avatars/apify?size=40
apify / crawlee-python

#网络爬虫#Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works...

apify自动化beautifulsoup爬虫crawlingheadlessheadless-chromepipPlaywrightPythonscraperscrapingweb-crawlerweb-crawlingweb-scrapingHacktoberfest
Python 5.73 k
3 天前
https://static.github-zh.com/github_avatars/mendableai?size=40
mendableai / firecrawl-mcp-server

Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.

batch-processingclaudecontent-extractiondata-collectionfirecrawlfirecrawl-aillm-toolsmcp-servermodel-context-protocolsearch-apiweb-crawlerweb-scrapingjavascript-rendering
JavaScript 3.44 k
11 天前
https://static.github-zh.com/github_avatars/apache?size=40
apache / nutch

#网络爬虫#Apache Nutch is an extensible and scalable web crawler

Javanutchweb-crawlercrawlinghadoopapache
Java 3.03 k
3 个月前
https://static.github-zh.com/github_avatars/sjdirect?size=40
sjdirect / abot

#网络爬虫#Cross Platform C# web crawler framework built for speed and flexibility. Please star this project! +1.

C#爬虫web-crawlerParsingspiderspiderspluggableUnit testingnetcorenetcore2netcore3netstandard20cross-platform
C# 2.28 k
9 个月前
https://static.github-zh.com/github_avatars/jasonxtn?size=40
jasonxtn / Argus

The Ultimate Information Gathering Toolkit

dns-lookupinformation-gatheringOSINTrecon-toolsreconnaissancevirustotalweb-crawlerwhois-lookup
Python 2.08 k
8 个月前
https://static.github-zh.com/github_avatars/xianhu?size=40
xianhu / PSpider

#网络爬虫#简单易用的Python爬虫框架,QQ交流群:597510560

爬虫spiderPythonproxiesweb-spidermulti-threadingweb-crawlerpython-spidermultiprocessing
Python 1.84 k
3 年前
https://static.github-zh.com/github_avatars/MarginaliaSearch?size=40
MarginaliaSearch / MarginaliaSearch

#搜索#Internet search engine for text-oriented websites. Indexing the small, old and weird web.

搜索引擎no-cloudsmall-webinternet-searchindexerlanguage-processingweb-crawleralt-searchno-ai-used自托管
HTML 1.38 k
1 天前
https://static.github-zh.com/github_avatars/Algebra-FUN?size=40
Algebra-FUN / WeReadScan

扫描“微信读书”已购图书并下载本地PDF的爬虫

Seleniumwereadweb-crawlerbook-downloader
Python 954
2 年前
https://static.github-zh.com/github_avatars/apache?size=40
apache / stormcrawler

#网络爬虫#A scalable, mature and versatile web crawler based on Apache Storm

web-crawlerdistributedJava爬虫
Java 914
6 天前
https://static.github-zh.com/github_avatars/platonai?size=40
platonai / PulsarRPA

#大语言模型#PulsarRPA: An AI-Enabled, Super-Fast, Thread-Safe Browser Automation Solution! 💖

ai-agentsbrowser-automation大语言模型browser-usedom-apidom-manipulationrpaweb-crawlerweb-crawlingweb-scraperweb-scraping
Kotlin 882
3 天前
https://static.github-zh.com/github_avatars/gildas-lormeau?size=40
gildas-lormeau / single-file-cli

#网络爬虫#CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

命令行界面Node.jssingle-fileweb-archivingweb-scraperweb-scrapingarchivingscraping-websites爬虫web-crawlerDenoDockerfile
JavaScript 849
13 天前
https://static.github-zh.com/github_avatars/postmodern?size=40
postmodern / spidr

#网络爬虫#A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

spiderRuby爬虫Webscraperweb-scrapingweb-spiderweb-crawlerweb-scraper
Ruby 818
4 个月前
https://static.github-zh.com/github_avatars/webrecorder?size=40
webrecorder / browsertrix-crawler

#网络爬虫#Run a high-fidelity browser-based web archiving crawler in a single Docker container

爬虫crawlingwarcweb-archivingweb-crawler
TypeScript 802
4 天前
loading...