Google’s SMITH Algorithm Update


If you haven’t heard, Google recently published a research paper on Siamese Multi-Depth Transformer-based Hierarchical (SMITH) Encoder, which allegedly outperforms a similar natural language processing (NLP) algorithm update known as BERT, to better interpret longer search queries.

If you read that and went, “huh?”, you’re not alone. We’re here to help breakdown NLP, and how the BERT algorithm and SMITH algorithm updates help to better understand the context of large documents and search queries. We’ll also compare BERT vs SMITH and explain their differences.

Natural Language Processing (NLP) and Search Engines

First and foremost, what is Natural Language Processing (NLP)? NLP is a branch of artificial intelligence that refers to computers processing of humans using natural language. NLP helps to read, decipher, understand and make sense of the human language in a manner that is valuable.

For SEO, this is especially important. Google’s algorithms look for context clues to help deliver the right results to longer search queries written in human’s natural vocabulary. And as voice search becomes more commonly used, the usage of NLP becomes more critical. With NLP, Google is better able to understand things like the context of words and tone.

The BERT Algorithm

The BERT algorithm update, released in November 2019, was the first big move to help interpret user intent behind large search queries. BERT stands for Bidirectional Encoder Representations from Transformers. The key word here is “bidirectional”, which has to do with how BERT interprets an entire set of words in a sentence or query, rather than the specific order the words are written. Basically, it reads the surrounding words for context cues, versus left to right.

The example Google gives is:

…The word “bank” would have the same context-free representation in “bank account” and “bank of the river.” Contextual models instead generate a representation of each word that is based on the other words in the sentence. For example, in the sentence “I accessed the bank account,” a unidirectional contextual model would represent “bank” based on “I accessed the” but not “account.” However, BERT represents “bank” using both its previous and next context — “I accessed the … account” — starting from the very bottom of a deep neural network, making it deeply bidirectional.

BERT vs SMITH Algorithm 

BERT does have some limitations, though, which is why the SMITH algorithm has been introduced. BERT is limited to only understanding short documents and is not well suited for long-form documents. To put it simply, BERT understands words within a passage, but not full passages within a document.

This is where the SMITH algorithm comes in. In essence, SMITH is able to do with full passages what BERT can do with words. It compares sentences before, after, or even in a separate paragraph to better interpret the document. However, SMITH still relies on BERT to do its work, so the two are not mutually exclusive. SMITH takes the document through the following processes to interpret its meaning:

  • It breaks the document into passages or sentence blocks
  • It processes each sentence block individually
  • A transformer learns the context of each block and turns them into a larger document representation

Is the SMITH Algorithm in Use?

As far as we know, the SMITH algorithm is not in full-effect or any effect at all for that matter. Some speculate that it was rolled out with Google’s December 2020 update, but that hasn’t been validated. All we know for fact at this point is that Google has released the research paper describing its key advantages but hasn’t officially announced its implementation.

What All This Means for You

Now, let’s take things a step back a bit and explain what this all means for you and your website. Are there any immediate changes you need to make? No. The point to keep in mind is that Google rewards content that delivers on search results and satisfies users. While you can’t influence what long-tail queries users are searching for, you do have control over the content on your page. So, write your content naturally (aka, for a human, not a computer) and make it valuable to your audience. Avoid common SEO mistakes like keyword stuffing and be the subject matter expert in whatever it is that’s on your web page. Lastly, maintain a deep understanding of what questions they have and how to address them, and you should be in good shape.

Have questions or want to know more about how you can improve your SEO strategy? Let’s chat!