I’m excited to announce that Webrecorder has received a $100,000 Open Source Development Grant from the Filecoin Foundation to work on standardization and design around creation of browser-based web archives.
The creation of web archives through the browser has been a key goal for the Webrecorder project, and this work will help make this goal closer to reality. The grant will be focused on three strands of work, explained in more detail.
I especially wish to thank the Dietrich Ayala of Protocol Labs for collaborating and supporting this work!
The first area of work will be a more formal definition of the WACZ Format, a format designed to package standard WARC files along with other requisite components, such as CDXJ indexes, page lists and other metadata. (See our previous post about WACZ). We hope this will help more formally define the other formats that make web archives useful such as CDXJ, but which are currently underspecified. The WACZ format allows random access to web archives of any size, making it possible to efficiently retrieve a single page out of a larger collection, thus making it possible to efficiently load web archives from IPFS, Filecoin, as well as any storage that supports random access. In this work, we hope to focus on browser-based web archives first as well as plan for how to store and access much larger crawl-based collections.
On this effort, I will be collaborating with a long-time friend of Webrecorder and colleague Dr. Ed Summers, who will work as a technical writer and designer on the WACZ specification.
UX Research on Privacy-preserving Web Archiving
Suppose browsers could natively create web archives of anything you browse? How do we ensure users privacy is protected, and users are able to make intelligent choice about what to archive and what not to, where to store the data, and with whom to share it. The second strand of this work will focus on critical UX research around privacy-preserving web archiving, threat modeling and UX design taking into account different scenarios for user-based web archiving. We hope to focus on use cases and users outside the traditional web archiving communities, including users with different and high threat risks, due to the nature of their work, such as journalists, human rights researchers, etc…
The UX research will be led by Cade Diehm, along with New Design Congress(NDC), an independent research organization he founded which “recognises all infrastructure as expressions of power, and sees interfaces and technologies as social, economic, political and ecological accelerants.”. I am super excited to be collaborating with Cade and NDC on this effort and supporting much-needed privacy research around new form of web archiving.
Implementation and Browser Integration
Finally, the last strand of work will focus on beginning to integrate the design and research from the other strands into our existing tools. We will likely update tools such as py-wacz and ArchiveWeb.page to support the latest WACZ format specification and UX recommendations.
In this phase, we will also be joined by Mauve, developer specializing in open source decentralized tools and the creator of Agregore Browser, a “minimal web browser for the distributed web” which already natively supports IPFS, hyper:// and other decentralized protocols. Mauve will work to integrate web archiving support into Agregore via our ArchiveWeb.page extension, combining a web browser, built-in web archiving support and native decentralized storage via IPFS.
Taken all together, I hope that this work will make a significant impact on the web archiving field, helping advance not only the technology for web archiving in a more decentralized ways, but also our understanding of how more personalized archiving can empower users (and the risks involved) involved. This grant helps support the core of Webrecorder’s mission of bringing ‘Web Archiving for All’!
I look forward to sharing more updates on this work in the upcoming months!
Have thoughts or comments on this post? Feel free to start a discussion on our forum!