Semantic Keyword Clustering For 10,000+ Keywords [With Script]

Semantic keyword groups can help take your keyword research to the next level.

In this article, you will learn how to use the Google Colaboratory paper that is shared exclusively with the readers of the Search Engine Journal.

This article will walk you through using a Google Colab Sheet, a high-level view of how it works under the hood, and how to make adjustments to suit your needs.

But first, why cluster keywords at all?

Common use cases for keyword grouping

Here are some use cases for keyword grouping.

Faster keyword search:

  • Filter out branded keywords or keywords that have no commercial value.
  • Group related keywords together to create more in-depth articles.
  • Group related questions and answers together to create an FAQ.

Paid search campaigns:

  • Build negative keyword lists for ads that use large datasets faster – stop wasting money on spam keywords!
  • Group similar keywords into ad campaign ideas.

Here is an example script that groups similar questions together, perfect for an in-depth essay!

Screenshot from Microsoft Excel, February 2022

Problems with earlier versions of this tool

If you’ve been following my work on Twitter, you’ll know that I’ve been experimenting with keyword clustering for a while now.

Previous versions of this script were built to the premium level PolyFuzz Library Utilization TF-IDF conformance.

While getting the job done, there were always some scratch combinations that I felt the original score could be improved upon.

Words that share a similar pattern of letters will be grouped even if they are not semantically related.

For example, he was unable to group words such as “bicycle” with “bicycle”.

Previous versions of the script had other issues as well:

  • It did not do well in languages ​​other than English.
  • It created a large number of groups that were not able to come together.
  • There wasn’t a lot of control over how groups were created.
  • The script was limited to approximately 10,000 rows before it timed out due to a lack of resources.

Semantic keyword clustering using deep learning natural language processing (NLP)

Fast-forward four months to the latest edition which has been completely rewritten to take advantage of the latest science in deep learning syntax trappings.

Check out some of these great semantic combos!

Notice that heating, thermal, and warm are included in the same keyword group?

Excel sheet showing an example of semantic keyword groupsScreenshot from Microsoft Excel, February 2022

Or how about wholesale and bulk?

An excel sheet showing another example of semantic keyword groupingScreenshot from Microsoft Excel, February 2022

Dog, dachshund, Christmas and Christmas?

An excel sheet showing another example of semantic keyword grouping.  Turns out the dachshund and dogs were put together.Screenshot from Microsoft Excel, February 2022

It can even group keywords in over a hundred different languages!

Excel sheet showing another example of grouping semantic keywords in FrenchScreenshot from Microsoft Excel, February 2022

New text features versus previous iterations

In addition to semantic keyword grouping, the following improvements have been added to the latest version of this script.

  • Support to aggregate more than 10,000 keywords at a time.
  • Reduce the lack of cluster groups.
  • Ability to choose different pre-trained models (although the default model works just fine!).
  • The ability to choose how closely related groups should be.
  • Choose the minimum number of keywords to use in each group.
  • Auto detect character encoding and CSV delimiters.
  • Multilingual compilation.
  • Works with many popular keyword exports out of the box. (Search Console, AdWords data, or third-party keyword tools like Ahrefs and Semrush).
  • Works with any CSV file with a column called “Keyword”.
  • Easy to use (the script works by inserting a new column called Cluster Name into any loaded keyword list).

How to use the script in five steps (Quick Start)

To get started, you will need Click on this linkthen choose the Open in Colab option as shown below.

How to open google colab from githubScreenshot from Google Colaboratory, February 2022

Change the runtime type to GPU by selecting the show length > Change the runtime type.

Google Collab, How to change settings for GPU usageScreenshot from Google Colaboratory, February 2022

Choose the show length > He runs All from the top navigation bar from within the Google Colaboratory, (or press Ctrl+F9).

How to turn on all cells in Google ColabScreenshot from Google Colaboratory, February 2022

Upload a .csv file containing a column called “Keyword” when prompted.

How to upload a file using Google ColabScreenshot from Google Colaboratory, February 2022

The aggregation should be fairly fast, but it ultimately depends on the number of keywords and the template used.

In general, you should be good to 50,000 keywords.

If you see an out of memory error in Cuda, you are trying to bundle too many keywords at the same time!

(It is worth noting that this script can easily be compiled to run on a local machine without Google Colaboratory restrictions.)

Script output

The script will run and append clusters to your original file with a new column called Cluster Name.

Group names are assigned using the shortest keyword in the group.

For example, the group name for the following keyword group is set as “alpaca socks” because that is the shortest keyword in the group.

Demo of the example output from the script showing the alpaca socks put together Screenshot from Microsoft Excel, February 2022

Once the compilation is complete, a new file is automatically saved, with a compilation in a new column appended to the original file.

How does the master collector work?

This scenario depends on fast compilation algorithm It uses models that are extensively pre-trained on large amounts of data.

This makes it easy to calculate the semantic relationships between keywords using ready-made models.

(You don’t have to be a data scientist to use it!)

In fact, while I made it customizable for those who like to tinker and experiment, I chose some well-balanced defaults that should be reasonable for most people’s use cases.

Different paradigms can be switched in and out of the script according to the requirements, (faster compilation, better multi-language support, better semantic performance, etc.).

After a lot of testing, I found the perfect balance between speed and accuracy with the All-MiniLM-L6-v2 adapter which provided a nice balance between speed and accuracy.

If you prefer to use your own model you can just experiment, you can replace your existing pre-trained model with any of the models listed here or on Hugging Face Model Hub.

Swap in pre-trained models

Switching in forms is as easy as replacing the variable with the name of your preferred adapter.

For example, you can change the default form all-miniLM-L6-v2 to all-mpnet-base-v2 by editing:

adapter = ‘all-miniLM-L6-v2’


adapter = ‘all-mpnet-base-v2.0.0-mod.apk

Here is where you can edit it in a Google Colaboratory paper.

How to choose a sentence converter for keyword groupingScreenshot from Google Colaboratory, February 2022

Trade-off between cluster accuracy and lack of clustering

A common complaint with previous iterations of this script is that it generated a large number of unbundled results.

Unfortunately, it will always be a balancing act of block accuracy against the number of clusters.

A higher group resolution setting will result in a higher number of non-clustered results.

There are two variables that can directly affect the size and accuracy of all clusters:



block accuracy

I set a default value of 85 (/100) for block precision and a minimum block size of 2.

During testing, I found this to be the right place, but feel free to experiment!

Here’s where to set those variables in the script.

How to set minimum sentence size and keyword group accuracyScreenshot from Google Colaboratory, February 2022

That’s it! I hope this keyword collection script is useful for your business.

More resources:

  • Introduction to Python and machine learning for technical SEO
  • 6 SEO Tasks to Automate Using Python
  • Advanced Technical SEO: A Complete Guide

Featured image: Graphic Grid / Shutterstock

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button