You may have heard of regex but aren’t quite sure how to use it in SEO or if it fits into your particular strategy.
Regular expressions, or “regex,” are like the embedded programming language for text searches that allow you to include complex search strings, partial matches, wildcards, case-insensitive searches, and other advanced instructions.
You can think of them as looking for a pattern, not a specific string.
Therefore, they can help you find whole groups of search results that at first glance might seem to have little in common with each other.
Regex expressions are a language of their own and the first time you see them, they can look quite strange.
In this guide, you will learn common regex operators, how to use more advanced regex filters for SEO, how to use regex in Google Analytics and Google Search Console, and more.
You’ll find examples of regex in action in different ways in SEO as well.
What is the Regex format?
A regular expression typically includes a set of text that will match exactly in search results, along with several operators that act like wildcards to achieve a pattern match rather than an exact text match.
This can include a single-character wildcard, a match of one or more characters, a match of zero or more characters, as well as optional characters, sub-expressions enclosed in parentheses, and “or” functions.
By combining these different processes together, you can build a complex expression that can achieve far-reaching and very specific results.
Common Regex operators
Some examples of common regex operators include:
. Matches a wildcard for any one character.
. * Matches with zero or more characters.
. + Match one or more characters.
d Matches any single numerical number from 0 to 9.
? It is inserted after the character to make it an optional part of the expression.
| The vertical line or “pipe” letter indicates the “or” function.
^ is used to indicate the beginning of a string.
$ is used to indicate the end of a string.
( ) is used to nest a subexpression.
is inserted before the “to get rid of” operator or special character.
g Returns all matches instead of just the first.
i returns case insensitive results.
m Activates multiline mode.
s Activates “dotall” mode.
u Activates full Unicode support.
y Searches at the specified text position (“fixed” mode).
As you can see, together these operators and signs begin to form a complex logical language, giving you the ability to achieve very specific results across large, unordered data sets.
How do you use Regex for SEO?
Regex can be used to explore queries used by different user segments, queries that are popular in certain content areas, queries that direct traffic to specific parts of your site, and more.
In this article, Hamlet Batista showed how to use regex in Python to parse server log files, for example.
And in this, Chris Long shows you how to use regex to extract the position, element, and name of breadcrumbs associated with each URL of your site as part of a scalable keyword search and segmentation process.
Here are some tips from Twitter SEO (you’ll notice it’s a pretty quiet hashtag – add your own examples if you have them!):
Use $slug in a filter to see a list of every page/keyword ending in “slug”. Very important if you have to manage large websites 🖤# Thoughts
– hannes jeremia jaacks (@hannes jaacks) December 31, 2021
– JC Chouinard (ChouinardJC) June 17, 2021
Using Regex in Google Analytics
One of the most popular uses of regex for SEO is in Google Analytics, where regular expressions can be used to set up filters so that you only see the data you want to see.
In this sense, the expression is used to exclude results, rather than to generate a set of comprehensive search results.
For example, if you want to exclude data from IP addresses on your local area network, you can filter 192.168. *. * to remove the full range from 192.168.0.0 to 192.168.255.255.
More advanced Regex SEO filters
As a more complex example, let’s imagine you have two trademarks: regex247 and regex365.
You may want to filter for results that match any combination of URLs that contain these brand names, such as regex247.biz or www.regex365.org.
One way to do this is to use a fairly simple “or” statement:
. * regex247. * | * regex365. *
This will remove all matching URLs from your Analytics data, including subfolder paths and page specific URLs that appear on these domain names.
A word of warning
Note that – similar to your robots.txt file – a poorly written regex can easily filter out most or all of your data by including an unrestricted wildcard match.
The good news is that in many SEO cases the filter is only applied to your data at the reporting stage, and by editing or deleting the regex expression you can regain full visibility of your data.
You can also test regular expressions on a number of online testing tools, to see if they achieve the desired result – allowing you to “sandbox” your regex expressions before letting them scrape your entire dataset.
To create regex filters in Google Analytics, first, go to the type of report you want to generate (for example, Behavior > Website content > All pages or acquisition > all traffic > Source/Average).
Below the graph, above the spreadsheet, find and click on the search box advanced to display advanced filter options.
Here you can include or exclude data based on a specific dimension or metric. In the dropdown menu after selecting your dimension, choose Match RegExp Then enter your expression in the text box.
“or” and “and” in the Google Analytics regular expression
To create an ‘or’ expression in Google Analytics, simply include the pipe character (|vertical stroke symbol) between the appropriate parts of your expression.
Google Analytics regular expressions do not support ‘and’ clauses within a single regex; However, you can just add another filter to achieve this.
Below the first regular expression, just click Add a dimension or metric and enter your next regex. This way you can stack as many expressions as you want and they will be processed as a single boolean “and” statement when filtering your data.
Using Regex in Google Search Console
In 2021, Google Search Console began supporting Re2 regular expression syntax, allowing webmasters to include and exclude data within the user interface.
You’ll find all metacharacters supported by Google Search Console in This RE2 regex syntax reference on github.
At the time of writing, the character limit is 4,096 characters (which is usually enough…).
Examples you can use in Search Console can filter queries that contain a specific brand and variations that users can type in, such as Facebook:
. * facebook. * | face * book. * | fb. * | fbook. * | f*book. *
Filter users who find your website by “commercial” intent terms:
. * (Best | Top | Alternative | Alternative | Versus | Versus | Review *). *
Related: Google Search Console adds new Regex filter options
Why is Regex important for SEO?
Finally, why is all this important?
Well, it’s about taking control of your data and filtering out the parts that don’t help you improve your SEO – be it specific pages or parts of your website, traffic from a particular source or medium, or your local network data.
You can create very simple regex expressions to achieve a basic “include” or “exclude” filter, or write longer expressions that work similar to programming code to achieve very complex and specific results.
And with the right regular expression for each campaign, you can verify that your SEO efforts are achieving your goals, ambitions, and results—a powerful way to prove a positive ROI on your future SEO investments.
- Google Search Console adds new Regex filter options
- The Google Search Analytics API can finally pull discovery data
- Advanced Technical SEO: A Complete Guide
Featured image: Optura Design/Shutterstock