SEO

8 Python Libraries For SEO & How To Use Them

Python libraries are a fun and easy way to start learning and using Python for SEO.

The Python library is a collection of useful functions and code that allows you to complete a number of tasks without having to write the code from scratch.

There are more than 100,000 libraries available for use in Python, which can be used for functions from data analysis to video game creation.

In this article, you will find several different libraries that I have used to complete my SEO projects and tasks. All of them are beginner-friendly and you’ll find plenty of documentation and resources to help you get started.

Why are Python libraries useful for SEO?

Each Python library contains functions and variables of all kinds (arrays, dictionaries, objects, etc.) that can be used to perform various tasks.

For SEO, for example, it can be used to automate certain things, predict results, and provide intelligent insights.

It is possible to work with just vanilla Python, but libraries can be used to make tasks easier and faster to write and complete.

Python libraries for SEO tasks

There are a number of Python libraries that are useful for SEO tasks including data analysis, web scraping, and insights visualization.

This is not an exhaustive list, but these are the libraries I find myself using the most for SEO purposes.

panda

Pandas is a Python library used to work with table data. It allows higher level data manipulation where the main data structure is DataFrame.

DataFrames are similar to Excel spreadsheets, however, they are not limited to row and byte limits, and are much faster and more efficient.

The best way to get started with Pandas is to take a simple CSV file of data (crawl your website, for example) and save it inside Python as a DataFrame.

Once this is stored in Python, you can perform a number of different analysis tasks including data aggregation, pivoting, and cleaning.

For example, if I have a full crawl of my website and I want to extract only those pages that are indexable, I’ll use Pandas’ built-in function to include only those URLs in my DataFrame.

import pandas as pd 
df = pd.read_csv('/Users/rutheverett/Documents/Folder/file_name.csv')
df.head
indexable = df[(df.indexable == True)]
indexable

Requests

The following library is called requests and it is used to make HTTP requests in Python.

Requests use various request methods such as GET and POST to make a request, and the results are stored in Python.

One example of this in practice is a simple GET request for a URL, and this will print the status code for the page:

import requests
response = requests.get('https://www.deepcrawl.com') print(response)

You can then use this result to create a decision function, where a status code of 200 means that the page is available but 404 means that the page does not exist.

if response.status_code == 200:
    print('Success!')
elif response.status_code == 404:
    print('Not Found.')

You can also use different requests as headers, which display useful information about the page such as the content type or how long it took to cache the response.

headers = response.headers
print(headers)

response.headers['Content-Type']

There is also the ability to simulate a specific user agent, such as a Googlebot, to extract the response that specific bot will see when crawling the page.

headers = {'User-Agent': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'} ua_response = requests.get('https://www.deepcrawl.com/', headers=headers) print(ua_response)

Nice soup

Beautiful Soup is a library used to extract data from HTML and XML files.

Fun fact: the BeautifulSoup library is actually named after the poem from Alice’s Adventures in Wonderland by Lewis Carroll.

As a library, BeautifulSoup is used to understand web files and is often used for web scraping, as it can convert HTML documents into various Python objects.

For example, you can take a URL and use pretty soup with the requests library to extract the page address.

from bs4 import BeautifulSoup 
import requests
url="https://www.deepcrawl.com" 
req = requests.get(url) 
soup = BeautifulSoup(req.text, "html.parser")
title = soup.title print(title)

Nice soup title

Additionally, using the find_all method, BeautifulSoup enables you to extract specific elements from a page, such as all href links on a page:

url="https://www.deepcrawl.com/knowledge/technical-seo-library/" 
req = requests.get(url) 
soup = BeautifulSoup(req.text, "html.parser")

for link in soup.find_all('a'): 
    print(link.get('href'))

Nice soup all links

put them together

These three libraries can also be used together, with requests used to make an HTTP request to the page from which we want BeautifulSoup to extract information.

We can then convert that raw data into a Pandas DataFrame for further analysis.

URL = 'https://www.deepcrawl.com/blog/'
req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")

links = soup.find_all('a')

df = pd.DataFrame({'links':links})
df

Matplotlib and Seaborn

Matplotlib and Seaborn are two Python libraries that are used to create visualizations.

Matplotlib allows you to create a number of different data visualizations such as bar charts, line graphs, graphs, and even heat maps.

For example, if I wanted to take some Google Trends data to display the most popular queries over 30 days, I could create a bar chart in Matplotlib to visualize all of those queries.

Matplotlib bar graph

Seaborn, built on Matplotlib, provides more visualization patterns such as scatter plots, box charts, and violin charts as well as line and bar graphs.

It differs slightly from Matplotlib in that it uses fewer syntaxes and has built-in default attributes.

One of the ways I’ve used Seaborn is to create line graphs to visualize a log file’s visits to certain parts of a website over time.

Linear plotting Matplotlib

sns.lineplot(x = "month", y = "log_requests_total", hue="category", data=pivot_status)
plt.show()

This particular example takes data from a pivot table, which I was able to create in Python using the Pandas library, which is another way these libraries work together to create an easy-to-understand picture of the data.

advirtools

advirtools It is a library he created Elias Dabbas They can be used to help manage, understand, and make decisions based on the data we have as SEO professionals and digital marketers.

Sitemap analysis

This library allows you to perform a number of different tasks such as downloading, parsing, and parsing XML sitemaps To extract patterns or analyze how often content is added or changed.

Robots.txt analysis

Another interesting thing you can do with this library is use a function for Extract the website’s robots.txt file In a DataFrame, in order to easily understand and parse the set of rules.

You can also run a test within the library to check if a particular user agent is able to fetch specific URLs or folder paths.

URL parsing

Advertools also enables you to do so Parsing and analyzing URLs In order to extract information, better understand analytics and SERP data, and crawl certain groups of URLs.

You can also parse URLs using the library to specify things like the HTTP scheme used, the root path, additional parameters, and query strings.

selenium

Selenium is a Python library generally used for automation purposes. The most common use case is testing web applications.

A common example of selenium flow automation is a script that opens a browser and performs a number of different steps in a specified sequence such as filling out forms or clicking certain buttons.

Selenium uses the same principle as the requests library we covered earlier.

However, it will not only send the request and wait for a response, but will also display the requested web page.

To get started with Selenium, you will need a WebDriver in order to perform interactions with the browser.

Each browser has its own WebDriver; Chrome has ChromeDriver and Firefox has GeckoDriver, for example.

It’s easy to download and set up with your own Python code. Here is a useful article Explanation of the setup process, with an example project.

scrapie

The last library I wanted to cover in this article is Scrapy.

While we can use the Requests module to crawl and extract internal data from a web page, to pass that data and extract useful insights, we also need to combine it with BeautifulSoup.

Scrapy basically allows you to do both things in one library.

Scrapy is also considerably faster and more powerful, completes crawl requests, extracts and parses data in a specified sequence, and allows you to protect data.

Within Scrapy, you can specify a number of instructions such as the name of the domain you wish to crawl, the starting URL, and specific page folders that the spider is allowed or not allowed to crawl.

Scrapy can be used to extract all links on a given page and store them in the output file, for example.

class SuperSpider(CrawlSpider):
   name="extractor"
   allowed_domains = ['www.deepcrawl.com']
   start_urls = ['https://www.deepcrawl.com/knowledge/technical-seo-library/']
   base_url="https://www.deepcrawl.com"
   def parse(self, response):
       for link in response.xpath('//div/p/a'):
           yield {
               "link": self.base_url + link.xpath('.//@href').get()
           }

You can take this a step further and follow links on a web page to extract information from all the pages being linked from the start URL, kind of like small scale replication of Google search links and follow them on the page.

from scrapy.spiders import CrawlSpider, Rule
 
 
class SuperSpider(CrawlSpider):
    name="follower"
    allowed_domains = ['en.wikipedia.org']
    start_urls = ['https://en.wikipedia.org/wiki/Web_scraping']
    base_url="https://en.wikipedia.org"
 
    custom_settings = {
        'DEPTH_LIMIT': 1
    }
 
    def parse(self, response):
        for next_page in response.xpath('.//div/p/a'):
            yield response.follow(next_page, self.parse)
 
        for quote in response.xpath('.//h1/text()'):
            yield {'quote': quote.extract() }

Learn more about these projects, among other examples of projects, here.

Final thoughts

As Hamlet Batista always said, “The best way to learn is by doing.”

I hope that discovering some of the available libraries has inspired you to start learning Python, or to deepen your knowledge.

Contributions of Python from the SEO Industry

Hamlet also likes to share resources and projects from the Python SEO community. To honor his passion for cheering others on, I wanted to share some of the amazing things I’ve seen from the community.

As a great tribute to Hamlet and the SEO Python community he helped nurture, Charlie Warnier He created SEO Pythonistas to collect contributions from amazing Python projects created by the SEO community.

Hamlet’s invaluable contributions to the SEO community are on display.

Moshe Mayavit super creation Great script for log file analysis, and in this post he explains how script works. Visualizations it can display including Google Bot Hits by device, daily hits by response code, response code % total, and more.

Koray Tuğberk GÜBÜR Currently working on a sitemap health checker. He also hosted a webinar on RankSense with Elias Debbas where he shared a script that scores SERPs and analysis algorithms.

It basically scores SERPs with regular time differences, and you can crawl all the landing pages, mix the data, and generate some correlations.

John McAlpine Wrote an article detailing how you can use Python and Data Studio to spy on your competitors.

JC Chouinard books The Complete Guide to Using the Reddit API. With this, you can do things like extract data from Reddit and post to Subreddit.

Rob May She’s working on a new GSC Analytics tool and is building a few new real domains/sites in Wix to benchmark them against a higher level WordPress competitor as they are documented.

Masaaki Okazawa Also share a script that analyzes Google Search Console data using Python.

Christmas Countdown to 2021 SEJ:

  • #12 – The New Google Business Profile: A Complete Guide to Local SEO
  • #11 – How to Automate SEO Keyword Clustering by Search Intent Using Python
  • #10 – Get to Know Google Analytics 4: A Complete Guide
  • #9 – 7 things I wish I knew earlier in my SEO career
  • #8 – A Guide to Optimizing Google News, Top Stories, and Discovery
  • #7 – Keyword Combinations: How to Elevate Your SEO Content Strategy
  • #6 – Advanced Basic Web Basics: A Technical Guide to SEO
  • #5 – How to use Google Sheets for web scraping and campaign building
  • #4 – Google Ads Ultimate Pacing Dashboard (Free Data Studio Template)
  • #3 – 8 Python SEO Libraries and How to Use Them

Featured image: jakkaje879 / Shutterstock

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button