?
Solved

Remove illegal XML characters from file

Posted on 2011-05-11
8
Medium Priority
?
1,211 Views
Last Modified: 2012-05-11
I receive a  large XML file (around 500mb) the file the XML is not particularly well formed and can contain illegal characters even in the CDATA brackets such as a no-break space (hex code A0) - to resolve the problem of being able to process the XML file, I have some VBA code that loops through the file and rewrites each line to another file, removing any illegal characters in the process - this works fine but is slow as the file is 500mb. Is there away I can search the whole XML file for specific characters without rewriting the whole file.
0
Comment
Question by:MrDavidThorn
  • 5
  • 3
8 Comments
 
LVL 10

Expert Comment

by:MaxOvrdrv2
ID: 35737316
if you put the file in a string, then you can search it using regular expressions.
0
 

Author Comment

by:MrDavidThorn
ID: 35737332
that what I do currently line by line, but thought there may be an easier option
0
 
LVL 10

Expert Comment

by:MaxOvrdrv2
ID: 35737354
well, not really... you can't for example edit the file directly one the file system. The best you can do is run a regexp against the contents and modify where appropriate, then re-save /overwrite the old file.

Why is it such a big XML file? is it a sitemap?
0
Granular recovery for Microsoft Exchange

With Veeam Explorer for Microsoft Exchange you can choose the Exchange Servers and restore points you’re interested in, and Veeam Explorer will present the contents of those mailbox stores for browsing, searching and exporting.

 

Author Comment

by:MrDavidThorn
ID: 35737362
Im trying to import the file in to SQL server, and can use .net, or tsql solutions.
0
 
LVL 10

Accepted Solution

by:
MaxOvrdrv2 earned 2000 total points
ID: 35737395
ok i think i see what's going on... in that case, i suggest the following (pseudo-code):

1) Read file from file system
2) Upload raw content to SQL
3) Set new record as @Not Fixed@ (you might need a new field in your DB), so that it's not showing anywhere you ultimately would use it
4) use SQL to get the data (instead of the file system, should be quicker)
5) Replace bad characters using RegExp
6) Update the record (that should also be faster in SQL instead of file system)

Hope this helps.
0
 
LVL 10

Expert Comment

by:MaxOvrdrv2
ID: 35737399
7) set record to @Fixed@ so that it now shows up where you need it to, properly formed.
0
 

Author Comment

by:MrDavidThorn
ID: 35737690
sorry Max already tried using the SQL statements instead of rewriting file, - works but takes around 60 x longer, having searched pretty much extensively I think my current solution is the one recommended.
0
 
LVL 10

Expert Comment

by:MaxOvrdrv2
ID: 35737713
really? strange... it is much quicker for me to parse through / get a DB record than it is to open a file and parse through it from there... but ok.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

IntroductionWhile developing web applications, a single page might contain many regions and each region might contain many number of controls with the capability to perform  postback. Many times you might need to perform some action on an ASP.NET po…
Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
We’ve all felt that sense of false security before—locking down external access to a database or component and feeling like we’ve done all we need to do to secure company data. But that feeling is fleeting. Attacks these days can happen in many w…
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video. If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, …

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question