High-Fidelity Web Archiving, Built for Scale
Archive entire websites with Browsertrix, a cloud-native web archiving platform from Webrecorder. With high-fidelity crawling and advanced quality assurance tools, Browsertrix empowers you to create, view, curate, and share archives with confidence.
Capture a website exactly as you see it now, or schedule crawls at regular intervals to capture a site as it evolves.
Intuitive and granular configuration options enable you to set up and run your crawls with ease.
Monitor running crawls in real-time. Diagnose issues and ensure you are capturing exactly the content you want.
Exclude URLs without restarting the entire crawl. Stop runaway crawls from getting bogged down in crawler traps, such as websites that dynamically generate new URLs.
Browsertrix’s innovative crawling system uses real browsers. Login sessions, cookies, and browser preferences (like ad-blocking and language preferences) allow you to crawl the web like a real user.
Available on Pro plans
Get a better picture of the quality of your crawl. Run crawl analysis to compare screenshots, extracted text, and other page information from your crawl output with data gathered while crawling the live site.
Analyze crawl qualityAssess results and give your team a better idea of a crawl’s overall success with Browsertrix’s dedicated review interface.
Problematic pages are highlighted for ease of navigation. Mark key pages as successful or unsuccessful and leave comments for other curators.
Review crawl qualityMerge outputs from different crawls into a single collection. Add and track collection metadata together with your team.
Extend your collection by importing content from supported web archive formats, such as WACZ files generated by ArchiveWeb.page.
Collections are private by default, but can be made public. Easily embed public collections in your own website using ReplayWeb.page.
Sign up for a hosted plan to get started with zero setup.
Everything you need to get started with web archiving.
$30 USD/month
Our full suite of features, designed to scale with you.
$60 USD/month
Higher limits for larger crawling needs.
$120 USD/month
Increased crawling limits, storage space, concurrency, and dedicated support.
Interested in trying Browsertrix before subscribing, or have specialized hosting requirements?
Browsertrix is open-source and can be set up on your own Kubernetes cluster.
Professional support with deployment, maintenance, and updates.
For those of us who enjoy deploying Kubernetes, but need some help sometimes.
Introduction and reference for using Browsertrix to automate web archiving. User Guide
Instructions on how to install, set up, and deploy self-hosted Browsertrix. Self-Hosting Docs
We regularly read and respond to questions on the Webrecorder community forum. Help Forum
View open issues, submit a change request, or report if something isn’t working as expected. GitHub Issues
Yes! Browsertrix loads each page it captures in a real web browser. This means you can capture content that other tools might miss, like parts of the page that are dynamically loaded using JavaScript. Generally, if it loads in a browser, Browsertrix is able to capture it!
If you’re having an issue archiving dynamic interactions on a site, we might be able to help in the Community Forum
Yes! With Browsertrix, you can create browser profiles, which include login sessions, browser settings, and extensions. With an active browser profile, Browsertrix can crawl sites using these logged-in accounts.
We always recommend using dedicated accounts for crawling.
Generally, yes! Social media sites (Instagram, Facebook, X.com, etc.) are quite complex and therefore difficult to archive. You can use browser profiles to get behind logins, though successful crawling and accurate replay of social media sites is always a moving target.
We always recommend using dedicated accounts for crawling.
Currently we only support uploading WACZ files, Webrecorder’s open standard format for portable self-contained web archives. WACZ files contain WARC data and conversion is lossless. To package your existing WARCs into WACZs, we recommend using our py-wacz command-line application.
Yes, you can select the frequency of automated crawls on a daily, weekly, or monthly basis. Scheduled crawls will automatically launch even if you are not actively logged into Browsertrix.
Yes! Exporting content from Browsertrix is as easy as clicking the download button in the item’s actions menu. All content is downloaded as WACZ files, which can be unzipped into their WARC files with your favorite ZIP extractor tool. Your content will always be yours to export in standard formats.
You might want to check out Browsertrix Crawler, the command-line application responsible for all crawling operations in Browsertrix. For even bigger jobs that can’t run on your local machine, you can integrate the Browsertrix API into your own scripts.