webscraper · GitHub Topics

#网络爬虫#Self-hosted webscraper.

Open Source 自托管 webscraper Docker helm Kubernetes Playwright Python scraping web-scraper web-scrapers web-scraping webscraping

TypeScript 4.26 k

2 个月前

anaskhan96 / soup

Web Scraper in Go, similar to BeautifulSoup

Go webscraper webscraping beautifulsoup web-scraper html-node

Go 2.21 k

2 年前

any4ai / AnyCrawl

#网络爬虫#AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP results from Google/Bing/Baidu/etc. Native multi-threading for bulk processing.

aitools crawl scrape webscraper ai-scraping data html-to-markdown rag scraping

TypeScript 2.14 k

10 小时前

benibela / xidel

#网络爬虫#Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON docu...

xquery XML HTML JSON xpath 命令行界面 HTTP Web REST API css-selector wget cURL httpie webscraper webscraping scraper datascraping data-processing

Pascal 816

7 个月前

scrapfly / scrapfly-scrapers

#网络爬虫#Scalable Python web scraping scripts for +40 popular domains

crawling Python 爬虫 scraping web-scraping web-scraping-python antibot 自动化 crawling-python datascraping proxies python-scraper scraper scraping-python spider twitter-scraper web-crawler webscraper webscraping

Python 623

2 天前

rootVIII / proxy_requests

a class that uses scraped proxies to make http GET/POST requests (Python requests)

Python requests-module requests proxy proxy-server proxy-list webscraping webscraper recursion HTTP http-proxy python-requests

Python 390

5 年前

salimk / Rcrawler

#网络爬虫#An R web crawler and scraper

R 爬虫 scraper webcrawler webscraping webscraper webscrapping crawlers

R 357

3 年前

onepointAI / onepoint

#大语言模型#An AI assistant tool that integrates coding, writing, and reading functions. For better alternatives see https://monica.im/desktop

人工智能 Electron ChatGPT all-in-one macOS toolkit React webscraper Code reading gpt-35-turbo

TypeScript 315

2 年前

toby-p / rightmove_webscraper.py

Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object

webscraper pandas pandas-dataframe CSV Python 数据科学数据分析 data-mining

Python 272

2 年前

intergalacticalvariable / reader

#网络爬虫#📚 This is an adapted version of Jina AI's Reader for local deployment using Docker. Convert any URL to an LLM-friendly input with a simple prefix http://127.0.0.1:3000/https://website-to-scrape.com/

Docker 大语言模型 proxy rag scraper 自托管 webscraper webscraping website-screenshot website-screenshot-capturer

TypeScript 247

2 个月前

serpapi / lego-ai-parser

#网络爬虫#Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.

人工智能 classification datascience gpt-3 HTML 机器学习 openai Parser Parsing parser-library Python scraper 工具 Web app webscraper webscraping

Python 236

1 年前

TBosak / mkfd

#网络爬虫#RSS feed builder created with Bun🥖 and Hono🔥- builds from webpages, email folders, and REST API calls.

Bun feed Hono RSS TypeScript contributors-welcome help-wanted rss-generator scraper 自托管 webscraper Docker Dockerfile dockerhub

TypeScript 188

3 天前

AliAkhtari78 / SpotifyScraper

#网络爬虫#Spotify Scraper to extract all the information from spotify, download mp3 with cover of the song

webscraping webscraper spotify-downloader spotify-scraping scraper 爬虫 Python 免费

Makefile 182

2 天前

mehmetozkaya / DotnetCrawler

#网络爬虫#DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like Web...

.NET 爬虫 crawling scraping scrapy entity-framework-core ddd-architecture C#webcrawler webscraping webscraper htmlagilitypack

C# 178

3 年前