Earlier this year, members of the IIPC (International Internet Preservation Consortium), after an internal survey, recommended the adoption of Webrecorder pywb as the primary replay system for their members’ web archives. Webrecorder and IIPC established a multi-part collaboration to help with this transition and advance the development of pywb.
To meet these goals, I’m excited to announce the launch of an official guide for migrating from OpenWayback to Webrecorder pywb, available at:
This guide was created with input from IIPC members and marks the completion of the first package of the IIPC project on pywb. This guide is now part of the standard pywb documentation and provides examples of various OpenWayback configurations and how they can be adapted to analogous options in pywb. The guide covers updating the index, WARC storage and exclusion systems to run in pywb with minimal changes.
For best results, deployment of OutbackCDX, an open-source standalone web archive indexing system developed by the National Library of Australia, alongside pywb is the recommended setup for managing web archive indexes. See the guide for more details and additional options.
Sample Deployment Configurations
With the guide, pywb now also includes a few working deployments (via Docker Compose) of running pywb with Nginx, Apache and OutbackCDX.
These deployments will be part of the upcoming pywb release and will be updated as pywb and configuration options evolve.
Next on the immediate roadmap for pywb is an upcoming release, which will feature numerous fixes in addition to the guide. (See the pywb CHANGELIST for more details on upcoming and new features)
The next iteration of pywb, which will be released in the first half of 2021, will include improved support for access controls, including a time-based access ‘embargo’, location-based access controls, and improved support for localization, in line with the work outlined in pywb project Package B.
We hope the guide will be useful for those updating from OpenWayback to pywb. We are also looking for input from IIPC members about any use cases for improved access control and localization for the next iteration.
If you have any questions, run into issues, or find anything missing, please send feed feedback via the IIPC mailing lists or directly to Webrecorder, via email or via the forum