About Web Archiving

Web archiving can take my forms, but usually involves fully capturing web sites, storing the captured content, and faithly reproducing, or replaying the archived sites. Since the web at its core consists of HTTP requests and responses, this process is often done at the HTTP traffic level to ensure full fidelity.

Traditionally, as pioneered by the Internet Archive and others, capture has been done by a web crawler which would automatically ‘crawl’ sites by making HTTP requests and discovering new links.

Another approach is to capture content through the web browser itself, capturing content right before it gets to the browser (or even right after).

About Webrecorder

Webrecorder project builds tools to specializing in a ‘user-driven’ form of web archiving, where the user is able to direct the archiving process through their browser.

From the beginning, the goal of Webrecorder has been to build quality open source tools enable ‘web archiving for all’ to allow anyone with a browser to create their own web archives, and to accurately replay them at a later time.

The goal of Webrecorder tools is to provide highly accurate capture and replay of web sites, working with a variety of existing storage options and services. The Tools page provides more information on all tools developed as part of Webrecorder.

In addition to advancing web archive capture and replay, the Webrecorder project is focused on advancing open source software development and resarch in the following key areas.

Quality Open Source Web Archiving Tools

Having great open source tools is key in making life easier for those trying to archive the web, both institutions and individuals alike.

The Webrecorder project aims to maintain existing open source tools and develop new ones.

Some tools are aimed at archivists and users, while others are aimed at developers.

The Tools page provides a comprehensive description of available tools.

Making web archiving more accessible via decentralized technologies

Webrecorder project intends to support the creation of lots of small, decentralized archives by individuals and institutions. Rather than supporting a single centralized silo, a key goal remains to support web archives in a variety of environment and storage scenarios. From users’ personal laptops, to google drive, to IPFS, Webrecorder tools will be designed to support creating and accessing web archives wherever they may be found.

The https://replayweb.page/ is a major effort in this direction.

Intersection of web archiving and software emulation

In order to capture and replay web sites as accurately as possible, it may sometimes be necessary to preserve other key components of the experience of browsing the web: the web browser and sometimes the web server.

Other times, archiving the web sites itself may be sufficient or the only available option.

Webrecorder will continue to develop tools that integrate web archiving with emulation via continued development of the remote browser emulation system and other new technologies.

More information on research in this area will be added when available.