CONTENT NEWS

Automated Content Generation for SEO: GPT-3 Possibilities & Pitfalls

Since the arrival of GPT-3, content generators have multiplied their SEO use cases. A fortnightly update to review new progress in the area of ​​language paradigms seems in order.

First of all, at the end of 2021, the very large language model club has grown significantly.

Each country has attempted to showcase and make its technologies available through research papers and public or private demonstrations.

Here are the main competitors in the race:

  • United States: OpenAI – Turing NLG.
  • China: Wu Dao 2.0 – PanGu-Alpha.
  • South Korea: HyperCLOVA.
  • Israel: A121 (Jurassic-1).
  • Europe: Alpha Alpha.
  • Open source: EleutherAI.

Each model has its strengths and weaknesses.

To test them out, many SEO editors or SEO agencies are now experimenting with these templates.

How do you choose the GPT-3 model?

You might think that the more parameters in the model, the better (editor’s note: the parameter corresponds to the concept learned by artificial intelligence).

But you’d be wrong.

The number one criterion is not at all the number of parameters, because you can get great results with lighter models.

Rather, it is the data that the model has been trained on.

Indeed, for a model to be effective, it must be able to understand a large number of disparate domains.

The first thing to do is learn how to train the model. For GPT-3 the following diagram helps:

Screenshot from GPT-3, October 2021

We can see that GPT-3 is mainly trained using data from:

  • Webarchive between 2016 and 2019.
  • WebText, which is compatible with data retrieval operations on the web.
  • Wikipedia.
  • Books in English (Books 1)
  • Books in other languages ​​(2).

Now, if we look at how open source models are trained, we see that the sources are very different.

Sources based on The Pile Project.Screenshot from Gpt-3, October 2021

It’s all based on The Pile Project, which is a dataset of 825 gigabytes of diverse English texts that are free and publicly available.

Using The Pile, we find very diverse data such as books, GitHub repositories, web pages, discussion journals, and articles in medicine, physics, mathematics, computer science, and philosophy.

In general, it will be important to test the language model on your language and especially on the vocabulary of your website.

Before we look at specific SEO use cases, let’s look at the pitfalls.

Disadvantages of creating GPT-3 SEO content

To create qualitative texts that interest users, it is important to know what pitfalls to avoid.

First of all, whatever model you choose, you must give him good examples as input so that he can imitate them and, above all, respect a certain type of text.

If you ask a language model to create content about “New York plumbers,” the model will go down different and often inappropriate paths:

  • Should I create a made-up directory?
  • Should you create content about the New York plumber?
  • Should a dialogue be held between plumbers in Paris?
  • Maybe a poem about plumbing in New York?

In short, the form will be lost.

Second, language paradigms don’t handle duplicate content at all.

So, no matter what text you create, you’ll have to use a third-party tool to verify that the form hasn’t repeated something you’ve learned – more specifically, that the text doesn’t actually exist and is unique.

There are many tools available to confirm whether or not your content is unique. If not, just recreate the content.

Additionally, content creation templates do not optimize text for search at all.

Again, they are trained in a variety of sources so you will have to tip them with all the semantic tools on the market.

You can also ask them to emphasize keywords, and explain your concepts in more detail.

Finally, the model can invent the data. Indeed, models have a creative environment.

If the form is set to allow high creativity, it can sometimes invent characteristics of an object, for example, which can lead to inconsistencies in your texts.

Content creation use cases for search engine optimization

Over the past 8 months, I’ve been testing many different content creation use cases.

With free templates like GPT-Neo and GPT-J we got very good results in the following use cases:

  • Extraction of concepts (extraction of places, people, etc.).
  • Extract questions from the text.
  • Answers to questions based on text.
  • Generate short texts in English only to create identifying titles or descriptions.

On the other hand, with a paid model like GPT-3 with accessible prices, we were able not only to create long, specific texts but also to create complex text summaries.

Complex text summaries in OpenAI.Screenshot from GPT-3, October 2021

The image is not easy to read but it shows the input examples that were submitted in bold, then the text generated by the form in italics.

The goal here is to create a motivational text to make people want to buy a product.

GPT-3 fully understands the required tone, the length of the text, and the words to be used. Text generated in less than 10 seconds!

This is the generated text:

Surrender to your childhood dreams. Do you still remember your first video game? Thanks to advances in technology, you can live it again. You can play with another player on the road via Nintendo DS, PlayStation or Xbox!

With Nintendo Switch, try fun games like Mario Kart 8 Deluxe – take on your rivals and enjoy magical adventures with Mario, Zelda and other characters. Enjoy an amazing collection of AAA games; Whether you prefer fighting games, adventure games or multiplayer games, PS4 offers a PS Plus membership to suit your needs!

Xbox also features unlimited access to entertainment apps like Netflix and Youtube for very little money. Whether you choose the new PS4, xbox, or Nintendo Switch, it’s easy to connect with family or friends via apps like Skype to have fun with your friends.

Make sure not to miss this opportunity!

Of course, proofreading is still required to improve the text, but this technology saves a lot of time and allows people to focus on problems with high added value.

Let’s move on to two revolutions in content creation.

Two revolutions for SEO

Before, GPT-3 might not have been able to generate certain content because it was not trained to do so, as we saw earlier.

Now, you have the ability to retrain the model.

Just give it between 20 and 100 examples with your instructions as input and the perfect completion you want it to produce as output:

{"prompt":"CTA:nnsubject: smartphonenn words: smartphone, Samsung, mobile phones, Xiaomi, Samsung, Apple, Huawei, Sony, iPhone, intelligent mobile phones, Androidnn ","completion":"Looking for a new smartphone? You've shortlisted a Samsung Galaxy or a Huawei model, but you haven't made a decision yet? Choose a smartphone in our product range and enjoy the best of the new mobile phones at an unbeatable price! You're guaranteed to find a smartphone at the best price, so don't miss out! It's never easy to find the best option among a multitude of smartphones: Xiaomi, Samsung, Apple, Huawei, Sony, or even Honor. Smartphones - intelligent mobile phones - aren't just synonymous with Apple's iPhone, but include many models. Smartphones using the Android operating system are the most common."}

The process is time consuming, but then you have a template optimized for your use case, especially if you need to use specific vocabulary for a particular niche, industry, or theme when optimizing for SEO.

This requires only two lines of Python code. The most time consuming part, of course, is creating this example file.

OpenAI example file.Screenshot from GPT-3, October 2021

Finally, let’s move on to the last topic I’ve been particularly excited about this month: code generation!

In fact, a new technology has been released where we give instructions and the new OpenAI Codex engine is able to generate Python code to solve our problems.

Let’s start by noting that these are minor issues: they cannot replace developers because we will need to provide the AI ​​with all the code prepared as well as all the technical limitations.

On the other hand, from a pedagogical point of view and especially in a no-code approach, it’s great to ask it to connect to a data source (Mysql, Excel, CSV, API, etc.) and generate the correct views in a few seconds.

Bring a one-day NASA log file.Screenshot from GPT-3, October 2021

Here’s a mini example where I’m fetching a NASA log file for August 1, 1995, and asking for a bar graph of the total number of URLs visited per hour.

Then, using a simple text editor, you can see the result by copying and pasting the code.

In order to take the concept of no code even further, I’m setting up a web app where everything will be text driven.

The only limitation in using language models in SEO is your imagination. You can certainly create a complete SEO dashboard this way by breaking down every view you want, step by step.

Language paradigms still have a lot of surprises in store and there are a lot of new uses coming to market.

More resources:

  • How natural language generation is changing the SEO game
  • Do more with less: Create high-quality, automated content
  • Content Marketing: The Ultimate Guide for Beginners


Featured image: Vector Juice / Shutterstock

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button