Thursday, October 26, 2017

Tips to troubleshoot your technical SEO

There are lots of articles filled with checklists that tell you what technical SEO items you should review on your website. This is not one of those lists. What I think people need is not another best practice guide, but some help with troubleshooting issues.

info: search operator

Often, [info:http://ift.tt/1MSYKnR] can help you diagnose a variety of issues. This command will let you know if a page is indexed and how it is indexed. Sometimes, Google chooses to fold pages together in their index and treat two or more duplicates as the same page. This command shows you the canonicalized version — not necessarily the one specified by the canonical tag, but rather what Google views as the version they want to index.

If you search for your page with this operator and see another page, then you’ll see the other URL ranking instead of this one in results — basically, Google didn’t want two of the same page in their index. (Even the cached version shown is the other URL!) If you make exact duplicates across country-language pairs in hreflang tags, for instance, the pages may be folded into one version and show the wrong page for the locations affected.

Occasionally, you’ll see this with hijacking SERPs as well, where an [info:] search on one domain/page will actually show a completely different domain/page. I had this happen during Wix’s SEO Hero contest earlier this year, when a stronger and more established domain copied my website and was able to take my position in the SERPs for a while. Dan Sharp also did this with Google’s SEO guide earlier this year.

&filter=0 added to Google Search URL

Adding &filter=0 to the end of the URL in a Google search will remove filters and show you more websites in Google’s consideration set. You might see two versions of a page when you add this, which may indicate issues with duplicate pages that weren’t rolled together; they might both say they are the correct version, for instance, and have signals to support that.

This URL appendix also shows you other eligible pages on websites that could rank for this query. If you have multiple eligible pages, you likely have opportunities to consolidate pages or add internal links from these other relevant pages to the page you want to rank.

site: search operator

A [site:domain.com] search can reveal a wealth of knowledge about a website. I would be looking for pages that are indexed in ways I wouldn’t expect, such as with parameters, pages in site sections I may not know about, and any issues with pages being indexed that shouldn’t be (like a dev server).

site:domain.com keyword

You can use [site:domain.com keyword] to check for relevant pages on your site for another look at consolidation or internal link opportunities.

Also interesting about this search is that it will show if your website is eligible for a featured snippet for that keyword. You can do this search for many of the top websites to see what is included in their featured snippets that are eligible to try and find out what your website is missing or why one may be showing over another.

If you use a “phrase” instead of a keyword, this can be used to check if content is being picked up by Google, which is handy on websites that are JavaScript-driven.

Static vs. dynamic

When you’re dealing with JavaScript (JS), it’s important to understand that JS can rewrite the HTML of a page. If you’re looking at view-source or even Google’s cache, what you’re looking at is the unprocessed code. These are not great views of what may actually be included once the JS is processed.

Use “inspect” instead of “view-source” to see what is loaded into the DOM (Document Object Model), and use “Fetch and Render” in Google Search Console instead of Google’s cache to get a better idea of how Google actually sees the page.

Don’t tell people it’s wrong because it looks funny in the cache or something isn’t in the source; it may be you who is wrong. There may be times where you look in the source and say something is right, but when processed, something in the <head> section breaks and causes it to end early, throwing many tags like canonical or hreflang into the <body> section, where they aren’t supported.

Why aren’t these tags supported in the body? Likely because it would allow hijacking of pages from other websites.

Check redirects and header responses

You can make either of these checks with Chrome Developer Tools, or to make it easier, you might want to check out extensions like Redirect Path or Link Redirect Trace. It’s important to see how your redirects are being handled. If you’re worried about a certain path and if signals are being consolidated, check the “Links to Your Site” report in Google Search Console and look for links that go to pages earlier in the chain to see if they are in the report for the page and shown as “Via this intermediate link.” If they are, it’s a safe bet Google is counting the links and consolidating the signals to the latest version of the page.

For header responses, things can get interesting. While rare, you may see canonical tags and hreflang tags here that can conflict with other tags on the page. Redirects using the HTTP Header can be problematic as well. More than once I’ve seen people set the “Location:” for the redirect without any information in the field and then redirect people on the page with, say, a JS redirect. Well, the user goes to the right page, but Googlebot processes the Location: first and goes into the abyss. They’re redirected to nothing before they can see the other redirect.

Check for multiple sets of tags

Many tags can be in multiple locations, like the HTTP Header, the <head> section and the sitemap. Check for any inconsistencies between the tags. There’s nothing stopping multiple sets of tags on a page, either. Maybe your template added a meta robots tag for index, then a plugin had one set for noindex.

You can’t just assume there is one tag for each item, so don’t stop your search after the first one. I’ve seen as many as four sets of robots meta tags on the same page, with three of them set to index and one set as noindex, but that one noindex wins every time.

Change UA to Googlebot

Sometimes, you just need to see what Google sees. There are lots of interesting issues around cloaking, redirecting users and caching. You can change this with Chrome Developer Tools (instructions here) or with a plugin like User-Agent Switcher. I would recommend if you’re going to do this that you do it in Incognito mode. You want to check to see that Googlebot isn’t being redirected somewhere — like maybe they can’t see a page in another country because they’re being redirected based on the US IP address to a different page.

Robots.txt

Check your robots.txt for anything that might be blocked. If you block a page from being crawled and put a canonical on that page to another page or a noindex tag, Google can’t crawl the page and can’t see those tags.

Another important tip is to monitor your robots.txt for changes. There may be someone who does change something, or there may be unintentional issues with shared caching with a dev server, or any number of other issues — so it’s important to keep an eye on changes to this file.

You may have a problem with a page not being indexed and not be able to figure out why. Although not officially supported, a noindex via robots.txt will keep a page out of the index, and this is just another possible location to check.

Save yourself headaches

Any time you can set up any automated testing or remove points of failure — those things you just know that someone, somewhere will mess up — do it. Scale things as best you can because there’s always more work to do than resources to do it. Something as simple as setting a Content Security Policy for upgrade-insecure-requests when going to HTTPS will keep you from having to go tell all of your developers that they have to change all these resources to fix mixed content issues.

If you know a change is likely to break other systems, weigh the outcomes of that change with the resources needed for it and the chances of breaking something and resources needed to fix the system if that happens. There are always trade-offs with technical SEO, and just because something is right doesn’t mean it’s always the best solution (unfortunately), so learn how to work with other teams to weigh the risk/reward of the changes you’re suggesting.

Summing up

In a complex environment, there may be many teams working on projects. You might have multiple CMS systems, infrastructures, CDNs and so on. You have to assume everything will change and everything will break at some point. There are so many points of failure that it makes the job of a technical SEO interesting and challenging.

The post Tips to troubleshoot your technical SEO appeared first on Search Engine Land.

No comments:

Post a Comment