Go Premium for a chance to win a PS4. Enter to Win


SEO, SSL, and Canonical URL Tags

Posted on 2016-11-25
Medium Priority
Last Modified: 2016-11-28
Hi all -
I have an ecommerce site where I noticed that *none* of the common XML Sitemap generators found any pages. The site is https://bachelorettepartystation.com and here's a typical Product Page:  https://bachelorettepartystation.com/bachelorette-party/bachelorette-bash-in-progress-3-quot-button
A credible source of mine took a look and came up with this:
1) All of my "http://" pages are redirected to their "https://" equivalent (via .htaccess)
2) My "canonical" URL tags setup to point the "http://" version of your pages (not the https version)
and he gave me a workaround and a tool to generate an XML sitemap.
I can get an XML Sitemap now, BUT this makes me think that the Search Engines won't find these pages either.
Should I correct items #1 and #2 above for SEO purposes, or do these things not matter to the Search Engines?
Question by:bleggee
  • 2
  • 2
LVL 65

Accepted Solution

btan earned 2000 total points
ID: 41902168
XML Sitemap is useful for SEO as bot will crawl your site with those declared pages. It helps in indexing the site for SEO e.g. sitemap will ease Google to find your pages when it crawls your website because all your pages could be ranked, not only your website as a domain.

Note that once the generator has created your sitemap, you need to upload it to the root of your domain e.g. www.yoursite.com/sitemap.xml.

Also having a sitemap on your site passes more data to search engines. So it also:

Lists all URLs from your site. And this includes pages that would not have been foundable by search engines

Gives engines page priority and thus crawl priority. You can add a tag on your XML sitemap saying which pages are the most important. Bots will thus first focus on this priority pages.
In your case, your sitemap should only contain the actual, existing pages. Google would not index a page that does not even exist but redirects somewhere like you mentioned. Any redirect in fact has the function to get Google to index another page instead. Typically Google will not index those pages due to both the redirect and the canonical and will not add any value.

In short, below are some principles
- You can pick a canonical (preferred) URL for each of your pages, but don't specify different URLs as canonical for the same page.
(e.g. one URL in a sitemap and a different URL for that same page using rel="canonical")
- Your sitemap should only contain current pages you want indexed on your site, not the pages that you're redirecting to another site.
(e.g. if Google can't find the link on your site proper, it won't index it from the sitemap anyway)
- URLs included in xml sitemaps must use the same protocol and subdomain as the sitemap itself. This means that https urls located in an http sitemap should not be included in the sitemap. This also means that urls on sample.domain.com cannot be located in the sitemap on www.domain.com. So on and so forth.

Suggest you run your existing site against the ywo below and see the findings and suggestion to beef up website performance and SEO factors. These help to keep the site healthy and friendlier to the browser and bots crawling the pages in your site.


Another useful tool (not free) for testing sitemap is SEO spider from Screaming Frog (see item 8 and 9  in https://www.screamingfrog.co.uk/10-features/). It will then load your sitemap and begin crawling the urls it contains. In real-time, you can view the results of the crawl. And if you have Graph View up and running during the crawl, you can visually graph the results as the crawler collects data.
Once you have the XML file saved to your computer, go to the ‘Mode’ menu in Screaming Frog and select ‘List’. Then, click on ‘Select File’ at the top of the screen, choose your file and start the crawl. Once the spider has finished crawling, you’ll be able to find any redirects, 404 errors, duplicated URLs and more .... in the ‘Internal’ tab.

Author Comment

ID: 41902371
Great info, btan, thank you!   A couple of clarifications ...

1. So I should use all http:// in the sitemap (even though the .htaccess forces all pages to https). Is that correct?

2. In the case of using Mod Rewrite, I would want the rewritten URL's in the Sitemap ... meaning I should use the "User Friendy" version of the URL for Sitemap purposes, for example using "http://example.com/red_sports_cars" and not "http://example.com/pagenm?stuff_after_the_question_mark"
LVL 65

Expert Comment

ID: 41902413
1. yes to be consistent as advised in my earlier post.
2. yes that will be fine and this is analogous to canonical url too.

Author Comment

ID: 41902534
Great, thanks again!
LVL 18

Expert Comment

by:Lucas Bishop
ID: 41904524
Should I correct items #1 and #2 above for SEO purposes, or do these things not matter to the Search Engines?

Yes, as a matter of best practice, you should.

Currently you're 301 redirecting the pages that you're specifying as the canonical source. A 301 redirect means the page has permanently moved. When Google goes to the "new" location, you're canonical tag specifies the source as a page that has moved permanently.  For the health of your site, you should set this up correctly. This is technically a poor implementation of the canonical tag.

You really should update the canonical urls so they point to the https pages.

Also, make sure the sitemap references the https pages. When I view the sitemap, it looks like most of the urls are to the http version of the site:

Also, register both the http and https properties in Google Search Console, so you can control which version is indexed.

Featured Post

Ready for your healthcare security check-up?

In the past few years, healthcare organizations have become a prime target for advanced attacks. Does your organization have what it needs to defend itself? Schedule your healthcare security check-up today and download our free Healthcare Security Resource Kit today!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Original post  on Monitis Blog. Web performance monitoring is broken into two camps: passive and active. Passive monitoring is defined as looking at real-world historical performance by monitoring actual log-ins, site hits, clicks, requests for …
Although a lot of people devote their energy toward marketing for specific industries, there are some basic principles that can be applied to any sector imaginable. We’ll look at four steps to take and examine how those steps were put into action fo…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

927 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question