SEO

Yandex Search Ranking Factors Leak: Insights

The search marketing community is trying to make sense of a leaked Yandex repository containing files that list what look like search ranking factors.

Some may be looking for practical SEO guides but that may not be the real value.

The general agreement is that it will be helpful in gaining a general understanding of how search engines work.

There is a lot to learn

ryan jones (@employee) thinks this leak is a big problem.

He already is Download some machine learning models from Yandex on his machine for testing.

Ryan is convinced there is a lot to learn but it will take a lot more than just examining a list of ranking factors.

Ryan explains:

“Although Yandex is not Google, there is a lot we can learn from this in terms of similarities.

Yandex uses a lot of technologies invented by Google. They refer to PageRank by name, use Map Reduce, BERT, and many other things as well.

Obviously the factors will differ and the weights applied to them will vary as well, but the computer science methods of how to analyze relevancy and anchor text and perform the calculations will be very similar across search engines.

I think we can glean a lot of insights from the ranking factors, but just looking at the leaked list alone is not enough.

When you look at the default weights applied (before ML), there are negative weights that SEO assumes are positive or vice versa.

There are also a lot more ranking factors computed in the code than listed in floating lists of ranking factors.

It appears that this list is just static factors and does not take into account how relevance of a query is calculated or the many dynamic factors that relate to the result set for that query.”

More than 200 arranging workers

It is commonly repeated, based on the leak, that Yandex uses 1923 ranking factors (some say less).

Christophe Semper (LinkedIn profile), founder of Link Research Tools, that friends told him that there are many ranking factors.

Christophe shared:

Watch friends:

  • 275 allocation factors
  • 220 “web novelty” factor
  • 3186 image search operators
  • 2314 video search operators

There is a lot to plan.

Perhaps the most surprising thing for many is that Yandex has hundreds of factors for links.

The point is, it’s a lot more than the 200+ ranking factors Google used to claim.

Even Google’s John Mueller said that Google shied away from more than 200 ranking factors.

So perhaps it helps the search industry to move away from thinking of the Google algorithm in these terms.

Does anyone know the entire Google algorithm?

What is remarkable about the data leak is that the ranking factors were collected and organized in such a simple way.

The leak calls into question the notion that Google’s algorithm is closely guarded and that no one, even at Google, knows the entire algorithm.

Is it possible that a Google spreadsheet has more than a thousand ranking factors?

Christophe Semper questions the idea that no one knows Google’s algorithm.

Christophe commented to Search Engine Journal:

Someone on LinkedIn said they couldn’t imagine “documenting” Google ranking factors quite like this.

But this is how such a complex system should be built. This leak is from a very reliable insider.

Google has code that could also be leaked.

The oft-repeated statement that not even Googlers know what ranking factors always sounded ridiculous to a techie like me.

The number of people who have all the details will be very small.

But it has to be in the code, because the code is what runs the search engine.”

Which parts of Yandex are similar to Google?

Leaked Yandex files give a glimpse into how search engines work.

The data does not show how Google works. But it provides an opportunity to view part of how the search engine (Yandex) ranks the search results.

What is in the data should not be confused with what Google might use.

However, there are interesting similarities between the two search engines.

MatrixNet is not RankBrain

One interesting idea that some are looking for is related to Yandex’s neural network called MatrixNet.

MatrixNet is an older technology introduced in 2009 (archive.org Link to the ad).

Contrary to what some claim, MatrixNet is not Yandex’s version of Google’s RankBrain.

Google RankBrain is a limited algorithm that focuses on understanding the 15% of search queries that Google has not seen before.

A Bloomberg article exposed RankBrain in 2015. The article states that RankBrain was added to the Google algorithm that year, six years after Yandex MatrixNet was introduced (Archive.org screenshot of the article).

The Bloomberg article describes the limited purpose of RankBrain:

“If RankBrain sees a word or phrase it isn’t familiar with, the machine can guess what words or phrases might have the same meaning and filter the result accordingly, making it more efficient at dealing with previously unseen search queries.”

On the other hand, MatrixNet is a machine learning algorithm that does a lot of things.

One of the things it does is categorize a search query and then apply appropriate ranking algorithms to that query.

This is part of what the 2016 English announcement for the 2009 algorithm says:

MatrixNet makes it possible to create a very long and complex classification formula, which takes into account many different factors and their combinations.

Another important feature of MatrixNet is that it allows customizing a ranking formula for a specific class of search queries.

By the way, tweaking the ranking algorithm for music searches, for example, won’t undermine the ranking quality of other types of queries.

The ranking algorithm is like complex machines with dozens of buttons, switches, levers and scales. In general, any single rotation of any one key in a mechanism will result in a sweeping change in the entire machine.

However, MatrixNet allows tuning specific parameters for certain classes of queries without causing an overhaul of the entire system.

In addition, MatrixNet can automatically choose sensitivity to specific ranges of ranking factors.

MatrixNet does a lot more than RankBrain, and it’s clearly not the same.

But the cool thing about MatrixNet is how dynamic the ranking factors are in that it ranks your search queries and applies different factors to them.

MatrixNet is referenced in some of the ranking factor docs, so it’s important to put MatrixNet into the right context so that ranking factors are shown in the right light and make more sense.

It may be useful to read more about the Yandex algorithm to help understand the Yandex leak.

is reading: Artificial intelligence and machine learning algorithms from Yandex

Some Yandex factors match SEO practices

Dominic Woodman (@employee) has some interesting notes about the leak.

Some of the leaked ranking factors match certain SEO practices like changing anchor text:

Alex Burak (@employee) posted a huge thread on Twitter on the topic that contains echoes of their SEO practices.

One such factor that Alex highlights relates to optimizing internal links to reduce the crawl depth of important pages.

John Mueller of Google has long encouraged publishers to make sure important pages are prominently linked.

Mueller discourages burying important pages deep in the structure of the site.

John Mueller shared in 2020:

“So what’s going to happen is we’re going to see that the homepage is really important, and the things linked from the homepage in general are very important as well.

And then… while he’s been away from home, we think that’s probably less important.”

It is important to keep important pages close to the main pages that visitors enter the site.

So if the links point to the home page, the pages linked from the home page will be considered more important.

John Mueller didn’t say crawl depth was a ranking factor. He simply said he flagged Google for important pages.

The Yandex base cited by Alex uses crawl depth from the homepage as a ranking base.

It makes sense to consider the home page as the starting point of importance and then calculate less importance the further one clicks further into the site.

There are also Google research papers with similar ideas (Reasonable model for surfingthe Random Surfer model), which calculated the probability that a random surfer would end up on a given web page just by following links.

Alex found a factor that prioritizes important homepages:

The rule of thumb for SEO has always been to keep important content no more than a few clicks away from the homepage (or from internal pages that attract inbound links).

Yandex Update Vega … related to experience and authority?

Yandex updated its search engine in 2019 with an update called Vega.

The Yandex Vega update featured neural networks trained with subject matter experts.

The goal of the 2019 update was to provide search results with expert and trusted pages.

But search marketers sifting through the documents haven’t yet found anything associated with things like author biographies, which some believe correlates with the experience and authority Google is looking for.

Learn, learn, learn

We are in the early days of the leak and I think it will lead to a greater understanding of how search engines work in general.


Featured image: Shutterstock / san4ezz

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button