Tuesday, October 27, 2015

FAQ: All About The New Google RankBrain Algorithm

google-brain-data2-ss-1920

Yesterday, news emerged that Google was using a machine-learning artificial intelligence system called “RankBrain” to help sort through its search results. Wondering how that works and fits in with Google’s overall ranking system? Here’s what we know about RankBrain.

The information covered below comes from three sources. First, the Bloomberg story that broke the news about RankBrain yesterday (see also our write-up of it). Second, additional information that Google has now provided directly to Search Engine Land. Third, our own knowledge and best assumptions in places where Google isn’t providing answers. We’ll make clear where any of these sources are used, when deemed necessary apart from general background information.

What Is RankBrain?

RankBrain is Google’s name for a machine-learning artificial intelligence system that’s used to help process its search results, as was reported by Bloomberg and also confirmed to us by Google.

What Is Machine Learning?

Machine learning is where a computer teaches itself how to do something, rather than being taught by humans or following detailed programming.

What Is Artificial Intelligence?

True artificial intelligence, or AI for short, is where a computer would be as smart as a human being, at least in the sense of acquiring knowledge both from being taught and from building on what it knows and making new connections.

True AI exists only in science fiction novels, of course. In practice, AI is used to refer to computer systems that are designed to learn and make connections.

How’s AI different from machine learning? In terms of RankBrain, it seems to us they’re fairly synonymous. You may hear them both used interchangeably or with machine learning used to describe the type of artificial intelligence approach being employed.

So RankBrain Is The New Way Google Ranks Search Results?

No. RankBrain is part of Google’s overall search “algorithm,” a computer program that’s used to sort through the billions of pages that it knows about and find the ones deemed most relevant for particular queries.

What’s The Name Of Google’s Search Algorithm?

google-hummingbird1-ss-1920

It’s called Hummingbird, as we reported in the past. For years, the overall algorithm didn’t have a formal name. But in the middle of 2013, Google overhauled that algorithm and gave it a name, Hummingbird.

So RankBrain Is Part Of Google’s Hummingbird Search Algorithm?

That’s our understanding. Hummingbird is the overall search algorithm, just like a car has an overall engine in it. The engine itself may be made up of various parts, such as an oil filter, a fuel pump, a radiator and so on. In the same way, Hummingbird encompasses various parts, with RankBrain being one of the newest.

In particular, we know RankBrain is part of the overall Hummingbird algorithm because the Bloomberg article makes clear that RankBrain doesn’t handle all searches, as only the overall algorithm would.

Hummingbird also contains other parts with names familiar to those in the SEO space, such as PandaPenguin and Payday designed to fight spam, Pigeon designed to improve local results, Top Heavy designed to demote ad-heavy pages, Mobile Friendly designed to reward mobile-friendly pages and Pirate designed to fight copyright infringement.

I Thought The Google Algorithm Was Called “PageRank”

PageRank is part of the overall Hummingbird algorithm that covers a specific way of giving pages credit based on the links from other pages pointing at them.

PageRank is special because it’s the first name that Google ever gave to one of the parts of its ranking algorithm, way back at the time the search engine began in 1998.

What About These “Signals” That Google Uses For Ranking?

Signals are things Google uses to help determine how to rank web pages. For example, it will read the words on a web page, so words are a signal. If some words are in bold, that might be another signal that’s noted. The calculations used as part of PageRank give a page a PageRank score that’s used as a signal. If a page is noted as being mobile-friendly, that’s another signal that’s registered.

All these signals get processed by various parts within the Hummingbird algorithm to ultimately figure out which pages that Google shows in response to various searches.

How Many Signals Are There?

Google has fairly consistently spoken of having more than 200 major ranking signals that are evaluated that, in turn, might have up to 10,000 variations or sub-signals. It more typically just says “hundreds” of factors, as it did in yesterday’s Bloomberg article.

If you want a more visual guide to ranking signals, see our Periodic Table Of SEO Success Factors:

Periodic Table Of SEO Success Factors 2015

It’s a pretty good guide, we think, to general things that search engines like Google use to help rank web pages.

And RankBrain Is The Third-Most Important Signal?

That’s right. From out of nowhere, this new system has become what Google says is the third most-important thing for ranking web pages. From the Bloomberg article:

RankBrain is one of the “hundreds” of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked, Corrado said. In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query, he said.

What Are The First & Second-Most Important Signals?

Google won’t tell us what the first and second most important signals are. We asked. Twice.

It’s annoying and arguably a bit misleading that Google won’t explain the top two. The Bloomberg article was no accident. Google wants some PR about what it considers to be its machine-learning breakthrough.

But to really assess that breakthrough, it’s helpful to know what are the other most important factors that Google uses now, as well as was was knocked behind by RankBrain. That’s why Google should explain these.

By the way, my personal guess is that links remain the most important signal, the way that Google counts up those links in the form of votes. It’s also a terribly aging system, as I’ve covered in my Links: The Broken “Ballot Box” Used By Google & Bing article from the past.

As for the second-most important signal, I’d guess that would be “words,” where words would encompass everything from the words on the page to how Google’s interpreting the words people enter into the search box outside of RankBrain analysis.

What Exactly Does RankBrain Do?

From emailing with Google, RankBrain is mainly used as a way to interpret the searches that people enter to find pages that might not use the exact words they entered.

Didn’t Google Already Have Ways To Find Pages Beyond The Exact Query Entered?

Yes, Google has found pages beyond the exact terms someone enters for a very long time. For example, years and years ago, if you’d entered something like “shoe,” it might not have found pages that said “shoes,” because those are technically two different words. But “stemming” allowed it to get smarter, to understand that shoes is a variation of shoe, just like “running” is a variation of “run.”

Google also got synonym smarts, so that if you searched for “sneakers,” it might understand that you also meant “running shoes.” It even gained some conceptual smarts, to understand that there are pages about “Apple” the technology company versus “apple” the fruit.

And What About The Knowledge Graph?

The Knowledge Graph launched in 2012 was a way that Google grew even smarter about connections between words. Or more important, that it learned how to search for “things not strings,” as Google has described it.

Strings means searching just for strings of letters, such as pages that match the spelling of “Obama.” Things means that instead, Google understands when someone searches for “Obama,” they probably mean US President Barack Obama, an actual person with connections to other people, places and things.

The Knowledge Graph is a database of facts about things in the world and the relationships between them. It’s why you can do a search like “when was the wife of obama born” and get an answer about Michele Obama as below, without ever using her name:

obama wife

So How’s RankBrain Helping?

The things Google already uses to refine queries generally all flows back to some human being somewhere doing work, either having created the stemming lists, the synonym lists or making database connections between things. Sure, there some automation involved. But largely, it depends on human work.

The problem is that Google processes 3 billion searches per day. In 2007, Google said that 20-25% of those queries had never been seen before. In 2013, it brought that number down to 15%, which was used again in yesterday’s Bloomberg article and which Google reconfirmed to us. But 15% of 3 billion is still a huge number of queries never entered by any human searcher — 450 million per day.

Among those can be complex, multi-word queries — also called “long-tail” queries. RankBrain is designed to help better interpret those queries and effectively translate them behind-the-scenes in a way to find the best pages for the searcher.

As Google told us, it can see patterns between seemingly unconnected complex searches to understand how they’re actually similar to each other. This learning in turn allows it to better understand future complex searches and whether they’re related to particular topics. Most important, from what Google told us, it can then associate these groups of searches with results that it thinks searchers will like the most.

Google didn’t provide examples of groups of searches nor give details on how RankBrain guesses at what are the best pages. But the latter is probably because if it can translate an ambiguous search into something more specific, it can then bring back better answers.

How About An Example?

While Google didn’t give groups of searches, the Bloomberg article did have a single example of a search where RankBrain is supposedly helping. Here it is:

What’s the title of the consumer at the highest level of a food chain

To a layperson like myself, “consumer” sounds like a reference to someone who buys something. However, it’s also a scientific term for something that consumes food. There are also levels of consumers in a food chain. That consumer at the highest level? The title — the name — is “predator.”

Entering that query into Google provides good answers even though the query itself sounds pretty odd:

food chain consumer

Now consider how similar the results are for a search like “top level of the food chain,” as shown below:

top_level_of_the_food_chain_-_Google_Search

Imagine that RankBrain is connecting that original long and complicated query to this much shorter one, which is probably more commonly done. It understands that they are very similar. As a result, it can leverage all it knows about getting answers for the more common query to help improve what it provides for the uncommon one.

Let me stress that I don’t know that RankBrain is connecting these two searches. I only know that Google gave the first example. This is simply an illustration of how RankBrain my be used to connect an uncommon search to a common one as a way of improving things.

Can Bing Do This Too With RankNet?

Back in 2005, Microsoft starting using its own machine-learning system called RankNet as part of what became its Bing search engine of today. In fact, the chief researcher and creator of RankNet was recently honored. But over the years, Microsoft has barely talked about RankNet.

You can bet that will likely change. It’s also interesting that when I put the search above, given as an example of how great Google’s RankNet is, into Bing, it gave me good results, including one listing that Google also returned:

What’s_the_title_of_the_consumer_at_the_highest_level_of_a_food_chain_-_Bing

One query doesn’t mean that Bing’s RankNet is as good as Google’s RankBrain or vice versa. Unfortunately, it’s really difficult to come up with a list to do this type of comparison.

Any More Examples?

Google did give us one fresh example: “How many tablespoons in a cup?” Google said that RankBrain favored different results in Australia versus the United States, because the measurements are different.

I tried to test this by searching at Google.com versus Google Australia. I didn’t see much difference, myself. And even without RankBrain, the results would often be different in this way just because of “old fashioned” means of favoring pages from known Australian sites for those using Google Australia.

So Does RankBrain Really Help?

Despite my two examples above being less than compelling as testimony to the greatness of RankBrain, I really do believe that it probably is making a big impact as Google is claiming. The company is fairly conservative with what goes into its ranking algorithm. It does small tests all the time. But it only launches big changes when it has a great degree of confidence.

Integrating RankBrain, to the degree that it’s supposedly the third most important signal, is a huge change. It’s not one that I think Google would do unless it really believed it was helping.

When Did RankBrain Start?

Google told us that there was a gradual rollout of RankBrain in early 2015 and that it’s been fully live and global for a few months now.

What Many Queries Are Impacted?

Google told Bloomberg that a “very large fraction” of queries are being processed by RankBrain. We asked for a more specific figure but were given the same large fraction statement.

Is RankBrain Always Learning?

All learning that RankBrain does is offline, Google told us. It’s given batches of historical searches and learns to make predictions from these.

Those predictions are tested and if proven good, then the latest version of RankBrain goes live. Then the learn offline and test cycle is repeated.

Does RankBrain Do More Than Query Refinement?

Typically, how a query is refined — be it through stemming, synonyms or now RankBrain — has not been considered a ranking factor or signal.

Signals are typically factors that are tied to content, such as the words on a page, the links pointing at a page, whether a page is on a secure server and so on. They can also be tied to a user, such as where a searcher is located at or their search and browsing history.

So when Google talks about RankBrain as the third-most important signal, does it really mean as a ranking signal? Yes. Google reconfirmed to us that there is a component where RankBrain is directly contributing somehow to whether a page ranks well beyond just reformulating a query.

How exactly? Is there some type of RankBrain score that might assess quality? Perhaps, but it seems much more likely that RankBrain is somehow helping Google better classify pages based on the content they contain. RankBrain might somehow be able to better summarize what a page is about than Google’s existing systems have done.

Or not. Google isn’t saying anything other than there’s a ranking component involved.

How Do I Learn More About RankBrain?

Google told us people who want to learn about word “vectors” — the way words and phrases can be mathematically connected — should check out this blog post, which talks about how the system (which wasn’t named RankBrain in the post) learned the concept of capital cities of countries just by scanning news articles:

image00

There’s a longer research paper this is based off of here. You can even play with your own machine learning project using Google’s word2vec tool. In addition, Google has an entire area with its AI and machine learning papers, as does Microsoft.

The post FAQ: All About The New Google RankBrain Algorithm appeared first on Search Engine Land.

No comments:

Post a Comment