Screenshot of the Page Behavior crawl workflow section in Browsertrix

Create, Use, and Automate Actions With Custom Behaviors in Browsertrix

It is now easier than ever to automate custom page actions in Browsertrix.

By Tessa Walsh / Senior Applications & Tools Engineer

We are thrilled to introduce a new feature available starting in Browsertrix 1.15 that will be very exciting for our Browsertrix power users: support for custom behaviors that let you automate in-page actions while crawling specific websites. You can now easily specify which custom behaviors to use directly in the crawl workflow editor. We’ve also updated our documentation to guide developers and advanced users in creating their own custom behaviors. Plus, we’ve added support for a new type of custom behavior that can be set up right in the Chrome DevTools, with no coding required.

The Story of Behaviors in Browsertrix

A big part of Browsertrix’s promise of high-fidelity web archiving is its ability to automate actions inside real browsers during crawling. This is made possible through Behaviors, code, or JSON documents (more on that later) that specify what actions the browser should take when visiting a web page.

Behaviors themselves aren’t new to Browsertrix. In fact, Browsertrix and Browsertrix Crawler have supported built-in behaviors for several years. A number of these are background behaviors, which quietly run on every web page, constantly checking for changes and taking action when needed. This includes some behaviors that always run on every web page no matter what, like autoplay, which plays video and audio on the page to ensure it is captured. It also includes some behaviors that can be enabled or disabled in Browsertrix, like autoscroll, which scrolls down the page until it hits the end or its timeout is reached.

Another type of built-in behavior that has long been supported in Browsertrix is site-specific behaviors. These only run on particular websites and are designed to perform actions tailored to those sites. This includes our built-in behaviors for social media sites like Instagram, Twitter/X, Facebook, and TikTok. You can find more detailed information about built-in behaviors in the Browsertrix Crawler documentation.

But here’s the exciting part: with this release, creating and using your own custom behaviors is easier than ever. And if you encounter a website that is tricky to crawl because it requires interactivity, you can now create and use your own behaviors immediately!

Browsertrix Support for Behaviors

You’ll now find everything related to behaviors in the new Page Behavior section of the crawl workflow editor. This update combines our new autoclick behavior and support for custom behaviors with existing settings like autoscroll, page timeouts, and delays (which were previously scattered across the workflow editor).

Introducing Autoclick Behavior

One exciting addition in the Page Behavior section is the autoclick behavior. This built-in feature automatically clicks on elements in the page without navigating away. By default, this will click on anchor (<a>) tags, which can be useful for websites that use these anchor links in non-standard ways. For example, some sites use JavaScript in place of the standard href attribute to create a hyperlink, while others use <a> tags in place of <button>s to reveal in-page content.

Need it to click on something other than links, like all the <button> elements on a page? No problem! Just specify a different CSS selector for autoclick directly in the workflow editor.

Custom Behaviors

Want to use new and existing custom behaviors in your crawls? Starting in Browsertrix 1.15, you can now specify custom behaviors to use in crawl workflows by pointing to behavior files that are hosted at any public URL or in a public Git repository. This means you can not only create and use your own custom behaviors in your crawls, but also tap into the Browsertrix community’s shared behaviors.

How To Create Custom Behaviors

Adding support for custom behaviors in Browsertrix is just one part of the solution. We also want to make it easier for you to create them. That’s why we’ve created new documentation that walks you through two ways to build custom behaviors: JavaScript behaviors and JSON Flow behaviors.

Traditionally, custom behaviors in Browsertrix have been written in JavaScript. This approach is still the most flexible and powerful, but it does require coding skills. For developers, our updated documentation covers how to make a JavaScript behavior, including an overview of the expected format, as well as important references and helpful suggestions.

We’ve also added a much more accessible option that doesn’t require writing a single line of code: JSON Flow behaviors. Thanks to Chrome’s built-in DevTools Recorder, you can simply record your actions on a webpage: click around and interact with the content, and when you’re done, export the recording as a JSON file. Upload that file somewhere with a public URL, like a GitHub Gist, Pastebin, or a public Git repository, and you’re ready to go! Just point your crawl workflows to that JSON file and Browsertrix will replay your recorded actions on that page automatically while crawling.

For visual learners, we recommend checking out the following Youtube video, which demonstrates how to use the DevTools Recorder and download your recording as a JSON file:

Browsertrix will even extend some of the actions in your JSON Flow behavior. For example, if it detects that you repeated an action (like clicking “Next” in a paginated list) more than three times, it will keep repeating that step until it can no longer do so successfully.

Of course, for more complex behaviors that involve loops, such as scrolling through and loading comment threads on a social media site, or other complicated actions, JavaScript behaviors will still be the go-to solution. But we are happy to offer a simpler and more accessible way that lowers the barrier to entry for anyone wanting to create custom behaviors. 

Behaviors: one more way Browsertrix makes it easy to capture the web exactly the way you want.