Remove illegal XML characters from file

I receive a  large XML file (around 500mb) the file the XML is not particularly well formed and can contain illegal characters even in the CDATA brackets such as a no-break space (hex code A0) - to resolve the problem of being able to process the XML file, I have some VBA code that loops through the file and rewrites each line to another file, removing any illegal characters in the process - this works fine but is slow as the file is 500mb. Is there away I can search the whole XML file for specific characters without rewriting the whole file.
MrDavidThornAsked:
Who is Participating?
 
MaxOvrdrv2Connect With a Mentor Commented:
ok i think i see what's going on... in that case, i suggest the following (pseudo-code):

1) Read file from file system
2) Upload raw content to SQL
3) Set new record as @Not Fixed@ (you might need a new field in your DB), so that it's not showing anywhere you ultimately would use it
4) use SQL to get the data (instead of the file system, should be quicker)
5) Replace bad characters using RegExp
6) Update the record (that should also be faster in SQL instead of file system)

Hope this helps.
0
 
MaxOvrdrv2Commented:
if you put the file in a string, then you can search it using regular expressions.
0
 
MrDavidThornAuthor Commented:
that what I do currently line by line, but thought there may be an easier option
0
Cloud Class® Course: SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

 
MaxOvrdrv2Commented:
well, not really... you can't for example edit the file directly one the file system. The best you can do is run a regexp against the contents and modify where appropriate, then re-save /overwrite the old file.

Why is it such a big XML file? is it a sitemap?
0
 
MrDavidThornAuthor Commented:
Im trying to import the file in to SQL server, and can use .net, or tsql solutions.
0
 
MaxOvrdrv2Commented:
7) set record to @Fixed@ so that it now shows up where you need it to, properly formed.
0
 
MrDavidThornAuthor Commented:
sorry Max already tried using the SQL statements instead of rewriting file, - works but takes around 60 x longer, having searched pretty much extensively I think my current solution is the one recommended.
0
 
MaxOvrdrv2Commented:
really? strange... it is much quicker for me to parse through / get a DB record than it is to open a file and parse through it from there... but ok.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.