集合主题趋势排行榜

#

爬虫

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (web spidering).

scrapy/scrapy

scrapy / scrapy

#爬虫框架#一款流行，高效，生态丰富的Python爬虫框架

Python scraping crawling 框架爬虫 Hacktoberfest web-scraping web-scraping-python

Python 58.26 k

3 天前

firecrawl / firecrawl

#网络爬虫#Firecrawl 是一种 API 服务，它爬取URL并将其转换为清洗过的 markdown 或结构化数据

人工智能爬虫 data Markdown scraper html-to-markdown 大语言模型 rag scraping web-crawler ai-scraping webscraping

TypeScript 58.03 k

7 小时前

#网络爬虫#一个Go语言开发命令行视频下载工具

下载器 Go 爬虫 scraper Video 哔哩哔哩 YouTube youku iqiyi tumblr qq download

Go 30.43 k

3 天前

gocolly / colly

#爬虫框架#一个快速优雅的Golang爬虫框架

Go scraper 框架爬虫 scraping crawling spider

Go 24.66 k

1 个月前

jhao104 / proxy_pool

#网络爬虫#Python ProxyPool for web spider

爬虫 proxy spider Redis HTTP

Python 22.8 k

7 个月前

ScrapeGraphAI / Scrapegraph-ai

#网络爬虫#Python scraper based on AI

scraping scraping-python automated-scraper 大语言模型人工智能 web-crawler web-scraping ai-scraping 爬虫 html-to-markdown Markdown rag

Python 21.32 k

1 个月前

apify/crawlee

apify / crawlee

#网络爬虫#Crawlee - 一个用于Node.js 开发的网页爬虫和浏览器自动化库

web-scraping web-crawling npm headless-chrome Puppeteer 自动化 apify scraping crawling 爬虫 headless scraper web-crawler JavaScript Node.js Playwright TypeScript

TypeScript 19.51 k

17 小时前

binux / pyspider

#爬虫框架#python爬虫框架。简单易上手，自带在线编程和任务管理界面

Python 16.82 k

1 年前

codelucas / newspaper

#网络爬虫#一个Python数据采集框架，能自动提取新闻、文章的标题、关键词、作者、摘要、正文等元数据

Python news 爬虫 crawling scraper news-aggregator

HTML 14.78 k

1 个月前

shengqiangzhang / examples-of-web-crawlers

#网络爬虫#一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

爬虫 spider taobao tmall Example Python Selenium pyquery stock fund multithreading WeChat

HTML 14.4 k

3 个月前

projectdiscovery / katana

#网络爬虫#下一代爬虫框架

爬虫 web-spider gocrawler spider-framework 命令行界面 headless

Go 14.2 k

17 小时前

s0md3v / Photon

#夺旗赛 (CTF) 和网络安全资源#Incredibly fast crawler designed for OSINT.

爬虫 spider Python OSINT information-gathering

Python 11.93 k

6 个月前

crawlab-team/crawlab

crawlab-team / crawlab

#网络爬虫#Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台，支持任何语言和框架

webcrawler scrapy crawlab spiders-management Go scrapyd-ui spider 爬虫 webspider web-crawler Docker platform crawling-tasks

Go 11.92 k

1 天前

code4craft / webmagic

#网络爬虫#webmagic是一个开源的Java垂直爬虫框架，目标是简化爬虫的开发流程，让开发者专注于逻辑功能的开发。webmagic的核心非常简单，但是覆盖爬虫的整个流程，也是很好的学习爬虫开发的材料。

爬虫 Java scraping 框架

Java 11.63 k

1 个月前

ssssssss-team / spider-flow

#网络爬虫#新一代爬虫平台，以图形化方式定义爬虫流程，不写代码即可完成爬虫。

spider 爬虫 jsoup xpath web-spider webspider webcrawler web-crawler spider-flow

Java 10.92 k

2 年前

injetlee / Python

#网络爬虫#Python脚本。模拟登录知乎，爬虫，操作excel，微信公众号，远程开机

Python 爬虫 WeChat excel

Python 10.27 k

2 年前

guyueyingmu / avbook

#网络爬虫#AV 电影管理系统， avmoo , javbus , javlibrary 爬虫，线上 AV 影片图书馆，AV 磁力链接数据库，Japanese Adult Video Library,Adult Video Magnet Links - Japanese Adult Video Database

javbus avmoo javlibrary spider 爬虫 Laravel scraper adult magnet-link magnet 数据库 adult-video guzzlehttp

PHP 9.81 k

1 年前

TeamWiseFlow / wiseflow

#网络爬虫#Use LLMs to dig out what you care about from massive amounts of information and a variety of sources daily.

爬虫 information-gathering 大语言模型 scraper

Python 7.8 k

4 天前

loading...

Website
Wikipedia: 维基百科