I’m really thrilled to announce that Webrecorder has received a two-year, $1.3M open source development grant from the Filecoin Foundation!

The grant will support our mission of developing quality open source web archiving tools for all!

This funding will help us to grow the Webrecorder team and make improvements across the broad Webrecorder ecosystem of tools.

(Check out the jobs page for more info on current and future job postings!)

From the beginning, Webrecorder’s mission has been to support decentralized web archiving that can be performed directly in the browser, where web archives can live anywhere that data can be stored. A key part of enabling decentralized web archiving, is a system of decentralized or distributed storage. The IPFS protocol provides a powerful foundation and the Filecoin storage network can go a long way in making decentralized web archiving a reality.

Dietrich Ayala, IPFS Ecosystem Lead at Protocol Labs, agrees: “Our collaboration with Webrecorder is key to the IPFS and Filecoin mission: making a web that works for the most impacted users in critical situations, and ensuring the safety of the digital human experience for future generations. Webrecorder provides the specs, libraries, tools and services to build bridges between the HTTP web and any of these new technologies, and bring the last 30+ years of the web along too.”

I am very grateful for Filecoin Foundation’s continued support of the Webrecorder project and our web archiving mission, and thankful to everyone who has supported our efforts thus far!

You can also read our project spotlight on the Filecoin Foundation blog.

Highlights from Recent Work

This grant supersedes and expands on our previous open source development grant from Filecoin Foundation, which focused on design and research.

Here are a few highlights from the progress we’ve made in design, research and browser-based tool integration over the last few months:

UX Research with New Design Congress

As part of our previous grant, New Design Congress has been working on extensive UX research around browser-based web archiving. They’ve shared their initial findings in our last community call and a more detailed report is forthcoming! We will continue to collaborate around research, and examine use cases and risks associated with browser-based web archiving.

WACZ Spec + Use Cases Development

We have formalized the WACZ spec, and added additional web archiving related specs, including a spec for the CDXJ format, and a spec for signing WACZ files. The specs are available at: specs.webrecorder.net, thanks in large part to the work of Ed Summers, our technical writer and editor for this effort. The work on the WACZ spec continues, focusing on full-text search and additional metadata, and additional recommendations for WACZ storage on IPFS.

ArchiveWeb.page Browser Integration

Part of our work in making web archives more accessible is to attempt to integrate web archiving directly into browsers. Mauve Software has released an update to their Agregore Browser which includes a proof-of-concept integration of web archiving via ArchiveWeb.page extension.

The Agregore Browser supports IPFS natively as well as many other p2p protocols, and support for browser-based archiving will soon allow users to share web archives they created directly through the browser itself.

Project Goals

This new funding will help us continue these existing efforts, as well as support our software development goals for the Webrecorder ecosystem in several key areas.

We will share a more detailed timeline in the later, but a few high level goals for the next two years include:

  • Browsertrix - Continued development of our open-source, federated cloud-native SaaS service, with support for archival storage of data on IPFS/Filecoin as one option. The service will allow institutions as well as independent communities to be able to create archives on their own.

  • Scalable web archive data model/specification and necessary tooling - Building on the WACZ file format, and implementing a robust data model for storing larger web archive collections, from a single WACZ file to multi-TB or even multi-PB collections. The data model would support encryption, signing and storage of all data on IPFS/Filecoin, and an optional IPFS-based search index.

  • Web archive signing and validation framework - Building tools and specifications for signed and verifiable web archives, to prove identity and authenticity. To support a variety of use cases, this will include multiple approaches, such as PKI-based, DID, and possibly blockchain-based solutions for verifying authenticity. Standalone validator tools that are deployable by independent institutions will also be created.

  • Replay Viewer and Embedding - Continued development of ReplayWeb.page viewer to keep up with complexity of web, including integration with additional CMS/digital preservation systems. Improvements for self-hosting the viewer for loading web archives from IPFS, and implementation of validation features in the viewer.

  • Browser-Based Web Archiving Tooling: Continued development and research around several browser-based archiving approaches, including our ArchiveWeb.page extension and the extension-less archiving via ArchiveWeb.page Express.

Next Steps

The next two years will be an exciting time for Webrecorder, as we expand and continue to build on this previous work, and expand our efforts on Browsertrix, which has gained a lot of use over the last few months. (More details in an upcoming blog post!)

In the short term, we will also be looking for additional help! If you would like to work with Webrecorder on achieving our mission of web archiving for all, please do not hesitate to reach out!

EDIT 2024-05-22: “Browsertrix” was previously referred to here as “Browsertrix Cloud”. This post has been updated to reflect the new name.

Have thoughts or comments on this post? Feel free to start a discussion on our forum!