Link to home
Create AccountLog in
Avatar of TrueBlue
TrueBlueFlag for United States of America

asked on

need proper doc type for xml file

Hi!

Any specific suggestions as to the best doc type for this page:
http://www.topsecurityinc.com/sitemap.xml
Avatar of Amick
Amick
Flag of United States of America image

As it is, it appears to be sitemap protocol compliant and properly formed xml.  Are you having a problem that you're trying to address?
<?xml version="1.0" encoding="UTF-8" ?>

which is at the top of the file is the proper DOCTYPE for it.  
Avatar of TrueBlue

ASKER

I used the below listed tool and it said that I was missing a doctype for the sitemap.

http://www.htmlhelp.com/tools/validator/

•Line 1, character 1:
<?xml version="1.0" encoding="UTF-8" ?>
^Error: character ï not allowed in prolog

The validator at w3.org  (the web standards group) reports:
Schema validating with XSV 3.1-1 of 2007/12/11 16:20:05
•Target: http://www.topsecurityinc.com/sitemap.xml (Real name: http://www.topsecurityinc.com/sitemap.xml 
Length: 12457 bytes
Last Modified: Tue, 25 Jan 2011 18:24:39 GMT Server: Microsoft-IIS/6.0)
• docElt: {http://www.sitemaps.org/schemas/sitemap/0.9}urlset

•Validation was strict, starting with type [Anonymous]
• schemaLocs: http://www.sitemaps.org/schemas/sitemap/0.9 -> http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd 
The schema(s) used for schema-validation had no errors
No schema-validity problems were found

See for yourself at:
http://www.w3.org/2001/03/webdata/xsv?docAddrs=http%3A%2F%2Fwww.topsecurityinc.com%2Fsitemap.xml&style=xsl

I suspect that the validator at htmlhelp.com is simply incomplete. You are standards compliant and there is really no need to worry.
One thing I noticed about your file is that, when viewed byte by byte, the first three bytes are
EF BB BF. This may be what is causing htmlhelp.com's validator to complain. These characters don't show up when the file is viewed as text. I was able to eliminate the leading three bytes by opening sitemap.xml in a text editor, and copying the text into a new document.  This probably isn't too important, but it does account for the prolog error message.

•Line 1, character 1:
<?xml version="1.0" encoding="UTF-8" ?>
^Error: character ï not allowed in prolog

Those characters are the Unicode Byte Order mark http://en.wikipedia.org/wiki/Byte_order_mark .  Note that Firefox, IE8, Chrome, Safari, and Opera open that page without problems.  Firefox and opera tell you that there is no style sheet associated with it and Chrome and Safari display just the text without the tags.
Amick:
I found the same thing in a hex editor, but I deleted the first three bytes. Then saved the file and they returned. So I changed them to 20 saved but when I reopened the file they were back.
I even cut and paste from the old page to a new page and get the same three bytes.
Could you post the file where you removed them?
ASKER CERTIFIED SOLUTION
Avatar of Dave Baldwin
Dave Baldwin
Flag of United States of America image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account