Webrecorder
Web archiving for all!

The Webrecorder Project has developed many tools to help with web archive capture and replay.

There are tools for everyone who is interested in creating or replaying web archives on their own, and more advanced tools for developers, which provide tools for software developers to integrate into other workflows and require a bit more technical expertise.

ArchiveWeb.page is a Chrome extension and standalone desktop app. The extension works in any Chromium based browser (Chrome, Brave, Edge) and allows archiving interactively as you browse. The standalone app provides the same interactive high-fidelity archiving functionality as a desktop app.

See the User guide for more info.

View ArchiveWeb.page Guide

ReplayWeb.page provides a web archive replay system as a single web site (which also works offline), allowing users to view web archives from anywhere, including local computer or even Google Drive.

See the User guide for more info.

View ReplayWeb.page Documentation

pywb toolkit is full-featured, advanced web archiving capture and replay framework for python. It provides command-line tools and an extensible framework for high-fidelity web archive access and capture, including localization and access control. A subset of features provides the basic functionality also known as a 'wayback machine', but pywb includes additional features to create new web archives and to manage existing collections.

View pywb Documentation

Browsertrix Crawler is a simplified browser-based high-fidelity crawling system, designed to run a single crawl in a single Docker container.

Browsertrix Crawler currently requires basic familiarity with a command-line and Docker to run crawls.

View Browsertrix Crawler README

Browsertrix Cloud is our automated high-fidelity web archiving system that can be deployed locally or in the cloud!

Browsertrix Cloud is our open-source cloud-native high-fidelity browser-based crawling system designed to make web archiving easier and more accessible for everyone!

Browsertrix Cloud builds on Browsertrix Crawler and provides a full UI for creating, managing and viewing browser-based crawls.

Read more about Browsertrix Cloud

All Tools

In addition to the above key tools, we maintain a numerous other smaller tools as part of the web archiving ecosystem. Select one of the categories to further filter this list. Take a look at these tools if you are interested in deploying web archiving tools on your, or integrating into other projects.


All currently maintained Webrecorder tools are listed below. Select one of the categories to further filter this list.




archiveweb.page

A Chrome extension and desktop app for capturing and replaying pages directly using a browser

browsertrix-behaviors

A set of automated behaviors for automating interactions with the browser, including generic (playing video, scrolling) and site-specific behaviors, such as for social media

browsertrix-crawler

A self-contained crawling system that runs a high-fidelity crawl in a single Docker container

oldweb.today

An integrated browser emulation system for running in-browser emulators connected to web archives

pywb

The core web archive toolkit, includes web archive replay, access and collection management

pywb-remote-browsers

Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives

remote-desktop-server

A set of Docker contains for VNC and WebRTC streaming. A component for pywb-remote-browsers.

replayweb.page

A serverless web and desktop app for viewing web archives directly in the browser

shepherd

A system Docker containiner orchestration system for launch 'flocks' on Docker contains on-demand. Part of the Remote Browser system.

shepherd-client

A JS frontend for embedding remote browsers in Conifer. Part of the Remote Browser system

wabac.js

A service-worker based web archive replay system. Backend for ReplayWeb.page

wacz-format

A new specification for a portable Web Archive Collection Zip (WACZ) format and python library

warcio

A fast, standalone way to read and write WARC Format commonly used in web archives

warcio.js

A port of python warcio to Javascript. Supports reading/writing WARC files in the browser and in Node.

warcit

A command-line tool to convert on-disk directories of web documents (commonly HTML, web assets and any other data files) into an ISO standard web archive (WARC) files.

wombat.js

The client-side rewriting Javascript rewriting system used in pywb and wabac.js