Link to home
Start Free TrialLog in
Avatar of Leo Torres
Leo TorresFlag for United States of America

asked on

Parsing XML files

I am trying to parse a file but since this file has names of other countries and people from Other country all the accents are causing the   to fail when I try to open it in SSIS or Excel fails due to these characters

Here is the xml file try to open it in Excel and it will fail..
http://xml.pinnaclesports.com/pinnacleFeed.aspx?

How can I read this file in SSIS so I an load to table.. That is my goal!
Avatar of Jim Horn
Jim Horn
Flag of United States of America image

<knee-jerk reaction>
If your data source has multiple collations (i.e. languages, accents), then your best bet would be to pump it in SSIS into a staging table that has all nvarchar's for your language columns.  

varchar() can handle one collation and is 1-byte per character, nvarchar() can handle all collations and is 2-bytes per character.

I can't speak to Excel.  It might be a good idea to add the Excel TA to this question, as there are some wicked good Excel experts on EE, and one of them may have come across this issue.

Good luck.
Jim
Avatar of Leo Torres

ASKER

OK so How do I run the code against this File..
Using of correct unicode on the XML will easier to solve the issue.
OK Dushan911, but I how do you accomplish that.. .. is there something I have to add to page or what?
Please try to specify it on the XML file like on below example with the correct encoding.

<?xml version="1.0" encoding="UTF-8"?>
When I open that XML with Excel I see this:
User generated image
but it otherwise opens OK.  Can you show us which data fails?

Rob.
Yes I put this on the first line of the file
<?xml version="1.0" encoding="UTF-8"?>

Nothing it still failed

I dont know what version you used of excel but it failed to open in 2007 and 2010
The data that is failing is the accents for sure once I removed them the file opened ..

Problem I cant move those accents in an automated fashion. i dont want to have to go in every day and have to find and replace accents
I used Excel 2007.  What I did was create a file called Test.html with this code in it:
<HTML>
<HEAD>
<TITLE>Test links</TITLE>
</HEAD>
<BODY>
<br><br>
<a href="http://xml.pinnaclesports.com/pinnacleFeed.aspx?">XML Link</a>
</BODY>
</HTML>

Open in new window


Then launch the file, right-click the link, and select Save Target As.  Then I could open the XML file in Excel 2007 by right-clicking it, selecting Open With, and choosing Microsoft Excel.

Regards,

Rob.
I hope you closed this opened XML tag properly.
UTF-8 is only a one standard. Please refer the correct encoding standard according you requirement.

http://www.w3schools.com/xml/xml_encoding.asp
Rob, Thanks

I did as you said but excel 2010 still wont open it
But 2007 open it..

 any why ssis xml source can open?
xmlfail3.GIF
xmlfail2.GIF
Hi, from the XML document's text, I removed the content between these two sections:

<?xml version="1.0" encoding="utf-8"?>

<!--
IMPORTANT CHANGE - June, 2008

and it opened fine in 2007 again (except for the aforementioned dialog).  If you remove that entire !DOCTYPE section from the XML file, does it open in 2010?  If so, we could write a script to remove that and open it in 2010.

Regards,

Rob.
Thanks rob your Spot on.. After I removed that I was able to open it in Excel 2010.. I thought once I could open it in excel SSIS would be able to open it since they were both complaining about the same thing.

But it wont generate an XSD file.
Any ideas.. See Error below..
xmlfail4.GIF
From what I've read, SSIS does not support the mixed content model of XML files, so we will need to transform the mixed content into non-mixed content.  An example of the different between the two is shown here:
http://opendocumentfellowship.com/introduction/odf_vs_oxml_part_II

I will need to spend some time to have a look through the XML source to see where the mixed content is.

I'll get back to you.

Rob.
Thank you!
ASKER CERTIFIED SOLUTION
Avatar of RobSampson
RobSampson
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you! Sorry About delay
No problem. Did you end up just removing the moneyline_value tags?