
Introducing GovArchive.us & Mirroring Entire Sites with Web Archives
We’re excited to announce the launch of GovArchive.us, a dedicated site for exploring our US Government Web Archive on Browsertrix. The project also introduces a brand new approach for viewing web archives: the ability to host a full-site “mirror” from any web archive, keeping original links intact while hosting them on a new domain.
One example of this is our archived version of the previous usaid.gov website, which is now accessible at usaid.govarchive.us. Unlike traditional web archive replay, this “mirror” archive preserves the original URL structure, making the site as easy to navigate and reference as the original site. For instance, the archived version of a page originally hosted at https://usaid.gov/about-us/mission-vision-values can be viewed at https://usaid.govarchive.us/about-us/mission-vision-values, by simply replacing the domain usaid.gov with usaid.govarchive.us.
We’ve reserved the *.govarchive.us
domain and subdomains to be able to dynamically add more archives of US Government sites from our collections to this system.
What is Available Now?
Here’s a selection of a few ‘mirror’ sites that we have available from govarchive.us. Each mirror is a static site that loads an archived version from our collection, hosted on a dedicated domain:
-
usaid.govarchive.us as a mirror of usaid.gov
-
cdc.govarchive.us as a mirror of cdc.gov
-
fema.govarchive.us as a mirror of fema.gov
-
epa.govarchive.us as a mirror of epa.gov
-
climate.govarchive.us as a mirror of climate.gov.
Check GovArchive.us for an up-to-date list as we add more mirrors from our archives!
Mirroring Sites with Web Archives — Getting Started
This approach can be used by anyone to mirror a dynamic website hosted as a static site powered by web archives!
If you run a particular domain, you can set up a web archive as a static site, and point the domain to the static version of the site instead!
Or, you can host a mirror elsewhere, as we have done. This can be used to migrate off costly or obsolete infrastructure, while still preserving a site at the highest fidelity!
We provide the following template to get started with a single site mirror created from a web archive:
Using the above template, you can host your own web archive mirror entirely on GitHub Pages!
How it works: GovArchive and Wildcard Subdomains
GovArchive.us demonstrates a more complex setup with wildcard subdomains.
We’ve set up a wildcard DNS to point to a static site for any *.govarchive.us
.
(For this, we use Bunny CDN as GitHub pages does not support wildcard subdomains pointing to the same repo.)
Then, we dynamically choose the correct site to mirror in the browser based on the subdomain. A specific Browsertrix collection is chosen based on the current subdomain, allowing for maximum flexibility to add more collections.
Nested subdomains are flatted by replacing the .
with the -
so that more.subdomains.example.gov
would become one-level of subdomain with more-subdomain-example.govarchive.us
so that we can use a wildcard SSL cert easily.
For example, nca2023-globalchange.govarchive.us mirrors nca2023.globalchange.gov.
With GovArchive.us, we also provide a custom banner and loading screen. If the archive is already initialized, it will load right away, otherwise the bootstrap script runs and a loading screen is shown while the service work is being initialized. Finally, the top-level site just provides a landing page index, hosted in a different repo.
As always, whole thing is open source, and further details are available on our GitHub repos:
The replay itself is provided with our low-level browser-based replay engine, wabac.js, which is also used in ReplayWeb.page. (In the future, the mirror capability may be added to ReplayWeb.page itself).
We hope GovArchive.us provides a much needed resource, as well as an example of how web archive-powered site mirrors can be done at scale.
If you need help setting up your own web archive mirror, reach out and we’d be happy to support your efforts!
Comments
Reply on Bluesky to join the conversation.
James R Jacobs @freegovinfo.bsky.social · a day ago
Thanks! this looks really interesting. a new avenue to govinfo in addition to @eotarchive.org EOT collection search at web.archive.org.
0
1
6