web-archiving · GitHub Topics

"Your own personal internet archive" (网站存档 / 爬虫)，一个自托管的网站时光机

pocket wget browser-bookmarks pinboard Chromium Firefox backups RSS web-archiving Python wayback-machine youtube-dl 自托管 headless-browser digipres warc

Python 24.95 k

4 个月前

webrecorder / pywb

Core Python Web Archiving Toolkit for replay and recording of web archives

Python web-archiving

JavaScript 1.55 k

1 个月前

Rhizome-Conifer / conifer

Collect and revisit web pages.

web-archiving archives Python Docker warc

Python 1.52 k

8 个月前

webrecorder / archiveweb.page

A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!

Chromium 插件 web-archiving archiving browser-extension warc

TypeScript 1.06 k

14 小时前

gildas-lormeau / single-file-cli

#网络爬虫#CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

命令行界面 Node.js single-file web-archiving web-scraper web-scraping archiving scraping-websites 爬虫 web-crawler Deno Dockerfile

JavaScript 974

3 个月前

bellingcat / auto-archiver

#网络爬虫#Automatically archive links to videos, images, and social media content from Google Sheets (and more).

archive Docker open-source-research Python service scraping web-archiving

Python 931

12 天前

webrecorder / browsertrix-crawler

#网络爬虫#Run a high-fidelity browser-based web archiving crawler in a single Docker container

爬虫 crawling warc web-archiving web-crawler

TypeScript 872

2 天前

Ray-D-Song / web-archive

Free web archiving and sharing service based on Cloudflare. 跑在 Cloudflare 上的免费网页归档和分享工具。

Cloudflare cloudflare-pages d1 免费 Hono 自托管 Serverless web-archive web-archiving

TypeScript 868

3 个月前

webrecorder / replayweb.page

Serverless replay of web archives directly in the browser

web-archiving web-archive wayback-machine warc service-worker

TypeScript 840

2 天前

oduwsdl / ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

IPFS warc web-archiving Python service-worker Docker

Python 643

16 天前

akamhy / waybackpy

Wayback Machine API interface & a command-line tool

internet-archive wayback-machine web-archiving OSINT

Python 547

2 年前

harvard-lil / perma

Indelible links

web-archiving Library

JavaScript 479

18 天前

webrecorder / webrecorder-player

Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)

warc Electron web-archiving

JavaScript 448

5 年前

rahiel / archiveror

Archiveror will help you preserve the webpages you love. 💾

archiving WebExtension browser-extension web-archiving Firefox 插件 Chrome 插件 JavaScript bookmark

JavaScript 446

6 年前

webrecorder / warcio

Streaming WARC/ARC library for fast web archive IO

web-archiving warc Python

Python 430

9 个月前

oduwsdl / archivenow

A Tool To Push Web Resources Into Web Archives

web-archiving internet-archive

Python 422

2 年前

Florents-Tselai / WarcDB

#网络爬虫#WarcDB: Web crawl data as SQLite databases.

crawling SQLite warc 命令行界面数据库 web-archiving

Python 406

1 年前

machawk1 / wail

🐋 Web Archiving Integration Layer: One-Click User Instigated Preservation

web-archiving Python GUI warc pyinstaller

Roff 377

6 个月前

ArchiveBox / archivebox-browser-extension

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

Chrome 插件 Firefox 插件 Svelte archiving browser-extension digipres web-archiving

JavaScript 353

4 个月前

webrecorder / browsertrix

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

archiving cloud warc web-archive web-archiving Kubernetes

TypeScript 328

2 天前