Understanding the Index Coverage Report

SEO

If you’ve seen a message like this over the past couple of weeks, take a deep breath and keep reading! In this article I’ll break down why you’re seeing coverage issues from Google Search Console and how to go about fixing them.

What is the Index Coverage Report

With the unveiling of the revamped Google Search Console, there are a number of enhanced features to be aware of. One such feature is the Index Coverage Report which outlines how many of your site’s URLs Google has cataloged (or indexed) and will appear on Google search results. This report was formerly found under “Index Status” within the old Search Console interface:

OLD VS NEW

Now, the Index Coverage Report can be found under the “Index” menu, then click “Coverage.”  You’ll be shown which URLs have indexation Errors, Warnings (or Valid with Warnings), are Valid, or Excluded from Google’s index. To properly understand this report, let’s explore what each of these groups mean.

Error: These are pages that are currently not being served to searchers on Google but Google also believes this is not your intention.

Errors can be triggered in many different ways, including something as simple as your robots.txt blocking crawlers from a page you submitted for indexation to something more complicated like a server error (500 level error). The good news? Most of these errors can be fixed relatively easily with help from your webmaster, digital marketing agency, or web development team.

The more common errors we’ve been seeing at Top Floor are “Submitted URL blocked by robots.txt” and “Submitted URL marked ‘noindex.’” These are less sinister errors and come about through submitting a URL for indexation through Google Search Console while either the robots.txt file is disallowing crawling to this URL or that it contains a noindex tag. This is a conflicting signal because your site is both telling search engines not to crawl or index this page, yet at the same time asking for it to be indexed. Google will see this as unintentional and flag it under Error rather than Excluded (or intentional de-indexation).

For other ways to encounter an Error, see below:

  • Server error (5xx): The server could not find the URL when requested and returned a 500 level error.
    • Solution: Talk to your development team to fix this server-side. There’s not much else that another team can do here without the proper experience.
  • Redirect error: There is a problem in the redirect chain associated with this URL. Either there is a redirect loop or a URL in the chain has an 400 or 500 level error.
    • Solution: Crawl the URL, using Screaming Frog or another tool, and identify a 400 or 500 level error in one of the URLs in the redirect chain. If unsure, talk to your digital marketing agency and they check into this for you.
  • Submitted URL blocked by robots.txt: You submitted this page for indexing within Google Search Console, but the page is currently being blocked by the robots.txt file.
    • Solution: Double check your robots.txt file at www.yoursite.com/robots.txt. Make sure a line that starts with “disallow” is not referencing the URL or subfolder with your URL in question. Your digital marketing agency can help identify any problems in the robots.txt.
  • Submitted URL marked ‘noindex’: You submitted this page for indexing, but the page has a ‘noindex’ meta tag or HTTP header.
    • Solution: Evaluate the URL, do you want this page being found when being Google searched? If you do, simply remove the meta tag or HTTP header. If you cannot find the noindex tag by looking at the page source (Control + U), then ask your digital marketing agency or development team.
  • Submitted URL seems to be a Soft 404: You submitted this page for indexing, but Google believes this page is a soft 404.
    • Solution: A Soft 404 error is when a page with 404 error content is displayed, but the status code is not 404, but 200. Examine the page, is it supposed to truly be an error page and the status code is incorrect? Sometimes Google will misinterpret a page with low content as a soft 404. In either case, consider a 301 redirect to take users to a relevant, live page.
  • Submitted URL not found (404): You submitted a 404 error code URL for indexing through Google Search Console.
    • Solution: You do not want error pages in the index as it provides a bad user experience. If this page is not expected to turn back into a live, 200 status page sometime in the future, 301 redirect it to relevant content on your site.

Warning: URLs shown under the Warning, or Valid with Warnings, section are categorized from Google as “Pages…might require your attention, and may or may not have been indexed, according to the specific result”1

So what does this mean? To me, this means Google is unsure of how to handle a URL, but reluctantly kept it indexed. Regardless of each URL’s situation, remove this uncertainty and dig into why each URL is being flagged. From what I’ve seen thus far, it comes down to someone using the robots.txt file as a de-indexation tool. This is incorrect for a couple of reasons.

The robots.txt file is used to set up rules to prevent search engines from crawling certain areas of your site. If a URL is being shown to searchers on Google and you want this to stop, do not add a disallow clause to your robots.txt file as this will hint at Google to stop checking up on this page with its crawlers, spiders, or whatever name you want to give them. To truly take a URL away from Google, or any search engine, give it a noindex tag either in the <head> of the HTML or through a HTTP header.

Although the focus of this article is on Errors and Warnings, let’s quickly cover the Valid and Excluded sections.

Valid: Quite simply, this is a list of URLs that have been successfully indexed. The only QA to be done here is for your XML sitemap and to ensure you’re not indexing something that you don’t want to. To do this, simply click the “Valid” box (remove confusion by only having one box highlighted at a time) and review the Details section.

You’ll be met with two main Detail Types, “Submitted and indexed” and “Indexed, not submitted in sitemap:”

  • Submitted and indexed: This is a list of URLs from your site that Google confirms are successfully indexed.
  • Indexed, not submitted in sitemap: These are indexed URLs that do not appear in your XML sitemap. Depending on your site, there should typically not be a large number of these. If you’d like a URL to be indexed, you should be helping out search engines, and yourself, and add them to your XML sitemap. If unsure how to do this, please reach out to your digital marketing agency and they can QA your sitemap appropriately.

Excluded: These are URLs intentionally left out of Google’s index. This can happen through many different avenues, but Google believes your site took steps to keep these URLs away from being searched. This is by no means a problem and is a part of a healthy site. Internal resources, such as password protected pages or image URLs generated by a WordPress site should be left out of the index as they are a poor landing page experience to searchers online.

Have Questions?

If you’ve read this article and are still unsure how to handle your Coverage issues, please give us a call and our search marketing team will be happy to help you sort it out.