Hi all -
I have an ecommerce site where I noticed that *none* of the common XML Sitemap generators found any pages. The site is and here's a typical Product Page:
A credible source of mine took a look and came up with this:
1) All of my "http://" pages are redirected to their "https://" equivalent (via .htaccess)
2) My "canonical" URL tags setup to point the "http://" version of your pages (not the https version)
and he gave me a workaround and a tool to generate an XML sitemap.
I can get an XML Sitemap now, BUT this makes me think that the Search Engines won't find these pages either.
Should I correct items #1 and #2 above for SEO purposes, or do these things not matter to the Search Engines?
btanConnect With a Mentor Exec ConsultantCommented:
XML Sitemap is useful for SEO as bot will crawl your site with those declared pages. It helps in indexing the site for SEO e.g. sitemap will ease Google to find your pages when it crawls your website because all your pages could be ranked, not only your website as a domain.

Note that once the generator has created your sitemap, you need to upload it to the root of your domain e.g.

Also having a sitemap on your site passes more data to search engines. So it also:

Lists all URLs from your site. And this includes pages that would not have been foundable by search engines

Gives engines page priority and thus crawl priority. You can add a tag on your XML sitemap saying which pages are the most important. Bots will thus first focus on this priority pages.
In your case, your sitemap should only contain the actual, existing pages. Google would not index a page that does not even exist but redirects somewhere like you mentioned. Any redirect in fact has the function to get Google to index another page instead. Typically Google will not index those pages due to both the redirect and the canonical and will not add any value.

In short, below are some principles
- You can pick a canonical (preferred) URL for each of your pages, but don't specify different URLs as canonical for the same page.
(e.g. one URL in a sitemap and a different URL for that same page using rel="canonical")
- Your sitemap should only contain current pages you want indexed on your site, not the pages that you're redirecting to another site.
(e.g. if Google can't find the link on your site proper, it won't index it from the sitemap anyway)
- URLs included in xml sitemaps must use the same protocol and subdomain as the sitemap itself. This means that https urls located in an http sitemap should not be included in the sitemap. This also means that urls on cannot be located in the sitemap on So on and so forth.

Suggest you run your existing site against the ywo below and see the findings and suggestion to beef up website performance and SEO factors. These help to keep the site healthy and friendlier to the browser and bots crawling the pages in your site.

Another useful tool (not free) for testing sitemap is SEO spider from Screaming Frog (see item 8 and 9  in It will then load your sitemap and begin crawling the urls it contains. In real-time, you can view the results of the crawl. And if you have Graph View up and running during the crawl, you can visually graph the results as the crawler collects data.
Once you have the XML file saved to your computer, go to the ‘Mode’ menu in Screaming Frog and select ‘List’. Then, click on ‘Select File’ at the top of the screen, choose your file and start the crawl. Once the spider has finished crawling, you’ll be able to find any redirects, 404 errors, duplicated URLs and more .... in the ‘Internal’ tab.
bleggeeAuthor Commented:
Great info, btan, thank you!   A couple of clarifications ...

1. So I should use all http:// in the sitemap (even though the .htaccess forces all pages to https). Is that correct?

2. In the case of using Mod Rewrite, I would want the rewritten URL's in the Sitemap ... meaning I should use the "User Friendy" version of the URL for Sitemap purposes, for example using "" and not ""
btanExec ConsultantCommented:
1. yes to be consistent as advised in my earlier post.
2. yes that will be fine and this is analogous to canonical url too.
bleggeeAuthor Commented:
Great, thanks again!
Lucas BishopClick TrackerCommented:
Should I correct items #1 and #2 above for SEO purposes, or do these things not matter to the Search Engines?

Yes, as a matter of best practice, you should.

Currently you're 301 redirecting the pages that you're specifying as the canonical source. A 301 redirect means the page has permanently moved. When Google goes to the "new" location, you're canonical tag specifies the source as a page that has moved permanently.  For the health of your site, you should set this up correctly. This is technically a poor implementation of the canonical tag.

You really should update the canonical urls so they point to the https pages.

Also, make sure the sitemap references the https pages. When I view the sitemap, it looks like most of the urls are to the http version of the site:

Also, register both the http and https properties in Google Search Console, so you can control which version is indexed.
