The fact that Google frequently uses synonyms to boost search quality is nothing new. But Dan Petrovic brought an interesting example to my attention via Google+ which spawned a dialog that included Bill Slawski, Wissam Dandan and Steven Baker, Principal Software Engineer on the Search Ranking team.
It is conversations like these that make search so enjoyable. Hopefully you agree.
The Query
Dan’s question revolved around the query ‘the dreaming void plot’.
This query returned results for The Temporal Void as well as The Dreaming Void, both books by Peter F. Hamilton. The question was why?
Bold Words
First things first. Bold words in search results usually reflect the query terms. It’s one of the strongest signals of relevance that Google can provide to the user. Your eye naturally gravitates to those bolded words and they reinforce the fact that the result(s) matched your query.
Synonyms
However, Google has also been bolding synonyms when they’re returned in search results. The easiest way to see this is to combine a synonym operator (~) with a negative operator (-).
Here it’s easy to see that fantasy and sleep are bolded and are thus synonyms to dream according to Google. This makes complete sense.
The Diagnosis
Here’s where it gets interesting. The terms dreaming and temporal are not … regular synonyms. By that I mean that if you try the operator scenario above for dreaming you will not see temporal in bold.
A cursory look at your favorite dictionary will also tell you that these are not ‘grammatical’ synonyms.
The next thing I did was conduct a search using the root query: The Dreaming Void. The result did not yield results for The Temporal Void. I then looked at related searches, one of my favorite search features.
Lo and behold the ‘first’ related search is ‘temporal void’. This tells me that Google sees a very strong relationship between these two terms based on query patterns.
The related search for the full ‘the dreaming void plot’ query does not yield any temporal void terms. That’s not entirely unexpected for reasons I won’t go into here for the sake of brevity. Finally, I remove the related filter and then test the query using the new verbatim search.
Poof. All results for ‘The Temporal Void’ disappear. Though obvious, this confirms that the results for ‘The Temporal Void’ are either synonyms or match similar terms.
Query Synonyms
This is what I refer to as a query synonym. The science behind these is actually incredibly interesting and complex. Because synonyms are not just about simple grammar, they’re about language, syntax and context as well.
Wissam Dandan offered this excerpt from a recent Google blog post on search quality changes.
Related query results refinements: Sometimes we fetch results for queries that are similar to the actual search you type. This change makes it less likely that these results will rank highly if the original query had a rare word that was dropped in the alternate query. For example, if you are searching for [rare red widgets], you might not be as interested in a page that only mentions “red widgets.”
Could this be related to Dan’s query? It might. The idea behind related queries is similar to synonyms. (Irony, huh?) The example provided by Google is that it will return results for ‘floral delivery’ when you search for ‘flower shops’. The change above will reduce the likelihood of false positives which may allow Google to increase the use of related query results refinements.
In the case of ‘the dreaming void plot’ there don’t seem to be any rare query terms. In fact, most documents in the content corpus contain all of these words and the word ‘temporal’ as well. There’s a high degree of co-occurrence for the terms ‘dreaming’ and ‘temporal’ which makes sense since they are part of a series of books.
But that’s the thing, what seems easy and straightforward to us is actually quite difficult for a machine.
The Science of Synonyms
Then the always smart Bill Slawski joined the conversation providing more examples of why synonyms are so difficult.
For instance, while we may often consider the words “auto” and “car” to be synonyms, that’s not the case when you set an alarm on “auto.” Even within longer phrases, words that we might consider to be synonyms might not be. So, “automobile” and “car” are synonyms when we search for a [ford car], but not when we search for a [railroad car].
Bill went on to reference a number of patents that describe how Google might approach synonyms and related query refinement, five of which list Steven Baker as a co-inventor.
Search queries improved based on query semantic information
Identifying a synonym with N-gram agreement for a query phrase
Determining query term synonyms within query context
Identifying common co-occurring elements in lists
Longest-common-subsequence detection for common synonyms
Document-based synonym generation
Machine Translation for Query Expansion
While Bill and I sought out other science fiction series that might display this same behavior Steven joined the conversation. While he wasn’t able to provide much detail he did reference his blog post on synonyms.
An irony of computer science is that tasks humans struggle with can be performed easily by computer programs, but tasks humans can perform effortlessly remain difficult for computers. We can write a computer program to beat the very best human chess players, but we can’t write a program to identify objects in a photo or understand a sentence with anywhere near the precision of even a child.
The last statement is a odd sort of synonym for my own SEO philosophy and name of this blog. The post also answered my question as to whether query synonyms are provided the same bold treatment. (They are.)
TL;DR
Google is actively using complex methods to identify synonyms and related queries to improve search results. While this type of query results refinement is usually spot on and unnoticeable it can sometimes be flawed. In those instances, you can remove these results using the verbatim search tool.
The Next Post: The Truth Doesn’t Matter
The Previous Post: The Knuckleball Problem
5 trackbacks/pingbacks
Comments About Query Synonyms
// 4 comments so far.
Dali // December 12th 2011
Hey AJ,
Very brainy discussion of Google Search results. Yes, regarding your sentence “synonyms are not just about simple grammar, they’re about language, syntax and context as well.” So, how does Googlebot learn context? Looks like it’s doing a better job!
It will be nice to see how Googlebot continues to improve its understanding of synonyms to give us humans what we are looking for. 🙂 Though my husband calls me a robot…I think I am human… 🙂
Dali
AJ Kohn // December 13th 2011
Dali,
Googlebot learns context through a variety of complex techniques that combine Natural Language Processing and Machine Learning (among other things). The implementation of Caffeine and the use of MapReduce allows them to use these techniques to much greater effect.
JWC // December 13th 2011
Hi AJ, I feel you are overdoing the essense of a synonym, when google are merely showing “related”.
These could be drawn from a huge number of traffic metrics, and query refinement logs.
Of course, this is not news to you!
AJ Kohn // December 13th 2011
JWC,
Thanks for the comment. And you’re right, there’s definitely a fuzzy line between a highly related result and a synonym. That’s why I refer to these as ‘query synonyms’.
And query refinement or reformulations are a very intriguing source of data. There are a number of thought-provoking papers that address query reformulation. If you combine that type of data with a classification structure and the ability to process all of that data … well, it’s no wonder Google is the leader in search.
Sorry, comments for this entry are closed at this time.
You can follow any responses to this entry via its RSS comments feed.