High-Fidelity Web Archiving, Built for Scale

Archive entire websites with Browsertrix, a cloud-native web archiving platform from Webrecorder. With high-fidelity crawling and advanced quality assurance tools, Browsertrix empowers you to create, view, curate, and share archives with confidence.

Screenshot of Browsertrix crawling a website with the current pages visible on screen.

Powerful, Observable, Automated Archiving

Create an interactive copy of any website

Capture a website exactly as you see it now, or schedule crawls at regular intervals to capture a site as it evolves.

Intuitive and granular configuration options enable you to set up and run your crawls with ease.

Set up your first Crawl in Browsertrix

Screenshot of part of Browsertrix’s crawl scoping settings with the Crawl Scope set to capture pages in the same directory on webrecorder.net. The max depth of this crawl is set to unlimited.

Watch crawls as they happen

Monitor running crawls in real-time. Diagnose issues and ensure you are capturing exactly the content you want.

Exclude URLs on the fly

Exclude URLs without restarting the entire crawl. Stop runaway crawls from getting bogged down in crawler traps, such as websites that dynamically generate new URLs.

Use Exclusions to scope your crawls

Done Editing EXCLUSION TYPE EXCLUSION TEXT Matches Text Talk: Matches Text Help: Pending Exclusions of 19 1 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. https://en.wikipedia.org/wiki/Help:Your_First_Article https://en.wikipedia.org/wiki/Help:Authority_control https://en.wikipedia.org/wiki/Help:Category https://en.wikipedia.org/wiki/Help:Directory https://en.wikipedia.org/wiki/Help:Menu https://en.wikipedia.org/wiki/Help:HTML_in_wikitext https://en.wikipedia.org/wiki/Help:Wikitext https://en.wikipedia.org/wiki/Help:Searching https://en.wikipedia.org/wiki/Help:Editing https://en.wikipedia.org/wiki/Help:Censorship Queued URLs from 1 to 300 of 6,117 -186 URLs 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. https://en.wikipedia.org/wiki/Special:Statistics https://en.wikipedia.org/wiki/English_language https://en.wikipedia.org/wiki/Skateboarding https://en.wikipedia.org/wiki/Tony_Hawk%27s_Underground https://en.wikipedia.org/wiki/Adventure_game https://en.wikipedia.org/wiki/Activision https://en.wikipedia.org/wiki/Tony_Hawk%27s https://en.wikipedia.org/wikz/bameLude https://en.wikipedia.org/wiki/Neversoft https://en.wikipedia.org/wiki/PlayStation_2 https://en.wikipedia.org/wiki/Xbox_(console) https://en.wikipedia.org/wiki/Blizzard_Albany https://en.wikipedia.org/wiki/Game_Boy_Advance https://en.wikipedia.org/wiki/Mobile_phone https://en.wikipedia.org/wiki/EA_Mobile https://en.wikipedia.org/wiki/Level_(video_games) https://en.wikipedia.org/wiki/Skateboarding_trick https://en.wikipedia.org/wiki/Thunderbirds_(TV_series) https://en.wikipedia.org/wiki/Tony_Hawk%27s_Underground_2 https://en.wikipedia.org/wiki/Markham%27s_storm_petrel +186 URLs Crawl Queue Editor

Get behind paywalls and capture content behind logins

Browsertrix’s innovative crawling system uses real browsers. Login sessions, cookies, and browser preferences (like ad-blocking and language preferences) allow you to crawl the web like a real user.

Crawl as a logged-in user with Browser Profiles

Screenshot of a login form that must be completed before the user is allowed to view content on a website.

Signed, sealed, authenticated

Crawl outputs are digitally signed to ensure a provable chain of custody.

Always on schedule

Schedule workflows to run crawls on a recurring basis and automatically collect snapshots of a website.

World traveler

Crawl from anywhere using proxies to collect region-specific content.

Available on Pro plans

Industry-Leading Quality Assurance Tools

Automatically analyze capture success

Get a better picture of the quality of your crawl. Run crawl analysis to compare screenshots, extracted text, and other page information from your crawl output with data gathered while crawling the live site.

Analyze crawl quality
Severe Inconsistencies Moderate Inconsistencies Good Match Screenshots Text HTML Page Match Analysis Current In Progress 26 / 50 pages analyzed  HTML pages  Non-HTML files captured as pages  Failed Pages 50 0 18 Crawl Results

Collaboratively review key pages

Assess results and give your team a better idea of a crawl’s overall success with Browsertrix’s dedicated review interface.

Problematic pages are highlighted for ease of navigation. Mark key pages as successful or unsuccessful and leave comments for other curators.

Review crawl quality
Screenshot of a crawl being reviewed, showing a discrepancy between the page at crawl time and at replay time

Instantly download crawls

Download your crawls and analyses as WACZ files immediately after runs are complete.

Share curatorial notes

Mark crawls on a five point scale of success to inform other team members about the quality of the crawl.

Integrate manual captures

Use ArchiveWeb.page to import specific pages, assets, or interactions that must be manually captured.

Curated, Shareable Web Archives

Organize and contextualize your archived content

Merge outputs from different crawls into a single collection. Add and track collection metadata together with your team.

Extend your collection by importing content from supported web archive formats, such as WACZ files generated by ArchiveWeb.page.

Create a Collection of archived content

Screenshot of a list of archived items that are present in the Webrecorder archives Browsertrix collection.

Keep it personal, or showcase your collection

Collections are private by default, but can be made public. Easily embed public collections in your own website using ReplayWeb.page.

Embed web archives with ReplayWeb.page

Screenshot of Browsertrix’s collection sharing settings with sharing enabled for this collection. A link to view the collection, and code to embed it is available within the dialog.

Upload existing archives

Import existing WACZ files into your collections and share them with others.

Best-in-class playback

View or embed a single high-fidelity playback of all web pages in your collection.

Collaborative workspace

Invite members from your organization to archive and curate content together.

Get Started Today!

Sign up for a hosted plan to get started with zero setup.

Starter

Everything you need to get started with web archiving.

$30 USD/month

Sign Up for Starter

What’s included

  • 180 minutes of crawling time
  • 100GB of disk space
  • Up to 2,000 pages per crawl

Standard

Our full suite of features, designed to scale with you.

$60 USD/month

Sign Up for Standard

What’s included

  • 360 minutes of crawling time
  • 220GB of disk space
  • Up to 5,000 pages per crawl
  • 2 concurrent crawls

Plus

Higher limits for larger crawling needs.

$120 USD/month

Sign Up for Plus

What’s included

  • 720 minutes of crawling time
  • 500GB of disk space
  • Up to 10,000 pages per crawl
  • 3 concurrent crawls

Pro

Increased crawling limits, storage space, concurrency, and dedicated support.

Schedule a Demo

What’s included

  • More crawling time
  • Additional disk space
  • Custom pages per crawl
  • Custom concurrent crawls
  • Regional proxies
  • Bundle options
  • Extended storage options
  • Optional dedicated support

Self-Hosting

Interested in trying Browsertrix before subscribing, or have specialized hosting requirements?

Browsertrix is open-source and can be set up on your own Kubernetes cluster.


Self-Hosting Support

Professional support with deployment, maintenance, and updates.

For those of us who enjoy deploying Kubernetes, but need some help sometimes.

Headset

Resources

Introduction and reference for using Browsertrix to automate web archiving. User Guide

Instructions on how to install, set up, and deploy self-hosted Browsertrix. Self-Hosting Docs

Support

We regularly read and respond to questions on the Webrecorder community forum. Help Forum

View open issues, submit a change request, or report if something isn’t working as expected. GitHub Issues

FAQ

Can I capture dynamic websites?

Yes! Browsertrix loads each page it captures in a real web browser. This means you can capture content that other tools might miss, like parts of the page that are dynamically loaded using JavaScript. Generally, if it loads in a browser, Browsertrix is able to capture it!

If you’re having an issue archiving dynamic interactions on a site, we might be able to help in the Community Forum

Can I crawl sites behind logins?

Yes! With Browsertrix, you can create browser profiles, which include login sessions, browser settings, and extensions. With an active browser profile, Browsertrix can crawl sites using these logged-in accounts.

We always recommend using dedicated accounts for crawling.

Can I crawl social media?

Generally, yes! Social media sites (Instagram, Facebook, X.com, etc.) are quite complex and therefore difficult to archive. You can use browser profiles to get behind logins, though successful crawling and accurate replay of social media sites is always a moving target.

We always recommend using dedicated accounts for crawling.

Can I upload WARC files?

Currently we only support uploading WACZ files, Webrecorder’s open standard format for portable self-contained web archives. WACZ files contain WARC data and conversion is lossless. To package your existing WARCs into WACZs, we recommend using our py-wacz command-line application.

Can I schedule automated crawls?

Yes, you can select the frequency of automated crawls on a daily, weekly, or monthly basis. Scheduled crawls will automatically launch even if you are not actively logged into Browsertrix.

Can I export my archived items?

Yes! Exporting content from Browsertrix is as easy as clicking the download button in the item’s actions menu. All content is downloaded as WACZ files, which can be unzipped into their WARC files with your favorite ZIP extractor tool. Your content will always be yours to export in standard formats.

I’m allergic to web apps, can I run Browsertrix in my terminal?

You might want to check out Browsertrix Crawler, the command-line application responsible for all crawling operations in Browsertrix. For even bigger jobs that can’t run on your local machine, you can integrate the Browsertrix API into your own scripts.