Convert Pipe Delimited text file to XML

weklica
weklica used Ask the Experts™
on
Below is an example of a pipe-delimited text file I have.  I want to get this data into MySQL.  If I can get it converted to XML, then I can accomplish the rest.  So, any thoughts on how this could be converted to XML?  I expect a vbs script that I could execute via batch file perhaps?

The code strip below is actually number one copied three times.  The real 2, 3 and 4 may containt different data elements, but the point is that it is all pipe delimited.

Could java crank this out to XML?
1|MRNUID|SSN|LASTNAME|FIRSTNAME|W|DOB|FIRST|LAST|W|Stree Address||City |State|zipcode||areacode|phonenumber|N|age|||||||
2|MRNUID|SSN|LASTNAME|FIRSTNAME|W|DOB|FIRST|LAST|W|Stree Address||City |State|zipcode||areacode|phonenumber|N|age|||||||
3|MRNUID|SSN|LASTNAME|FIRSTNAME|W|DOB|FIRST|LAST|W|Stree Address||City |State|zipcode||areacode|phonenumber|N|age|||||||
4|MRNUID|SSN|LASTNAME|FIRSTNAME|W|DOB|FIRST|LAST|W|Stree Address||City |State|zipcode||areacode|phonenumber|N|age|||||||

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
What is the format required for XML data? I mean what should be the tags names?

Author

Commented:
For simplicity, you could just call it field1, field2, field3.  It isn't too important as long as it is consistent.  I plan on running it to another process to get it into MySQL database anyway.  Assuming it is easier just to mlae them up??  I can edit it afterwards if I want to. Have specific names.  

Will this work?  If not, I can generate a very specific list for you.  Thanks
If you have phpMyAdmin installed on your server then you can use it to import data from the CSV file (pipe delimited) into your MySQL database table.
Success in ‘20 With a Profitable Pricing Strategy

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
Automatically?  I want it to monitor a queue or something.  As long as there is a way to never have to touch this then it would be a fine solution.  Is automated csv or pipe edlimited import an option?
Glanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015
Commented:
Well in case you don't get the other option mentioned worked out, here is some VbScript that shoudl get you the XML you mentioned.
DIM file
DIM data
DIM fso
DIM path
DIM lines
DIM columns

path = "C:\test.csv"   ' Replace this with your path

SET fso = CreateObject("Scripting.FileSystemObject")
SET file = fso.OpenTextFile(path, 1)

data = file.ReadAll

file.Close
SET file = Nothing

lines = Split(data, vbNewLine)

SET file = fso.OpenTextFile(path, 2)

file.WriteLine("<docRoot>")

For i = 0 To Ubound(lines) - 1
    columns = Split(lines(i), "|")
    
    file.WriteLine(vbTab & "<record" & i + 1 & ">")

    For j = 0 To Ubound(columns) - 1
        file.WriteLine(vbTab & vbTab & "<field" & j + 1 & ">" & columns(j) & "</field" & j + 1 & ">")
    Next
    
    file.WriteLine(vbTab & "</record" & i + 1 & ">")
Next

file.WriteLine("</docRoot>")

file.Close
SET file = Nothing

Open in new window

Author

Commented:
This is what I was hoping for!  I will test it as soon as possible (1-2 days).
ǩa̹̼͍̓̂ͪͤͭ̓u͈̳̟͕̬ͩ͂̌͌̾̀ͪf̭̤͉̅̋͛͂̓͛̈m̩̘̱̃e͙̳͊̑̂ͦ̌ͯ̚d͋̋ͧ̑ͯ͛̉Glanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015

Commented:
Cool. As posted, the elements are

    <docRoot>
        <record#>
            <field##></field##>
        </record#>
    </docRoot>

where # and ## correspond to the appropriate record/field number.

Author

Commented:
Absolutely perfect!  Brilliant!  Thank you very much!

Author

Commented:
Kaufmed,

One more question. Is there any way you know of to get each record into its OWN XML file?  Essentially, this will create hundreds of Record numbers and one long XML file.  I want to dump this file into a program that cannot handle looping or batching through the xml file.  Could I have each Record# essentially be its own xml file?
ǩa̹̼͍̓̂ͪͤͭ̓u͈̳̟͕̬ͩ͂̌͌̾̀ͪf̭̤͉̅̋͛͂̓͛̈m̩̘̱̃e͙̳͊̑̂ͦ̌ͯ̚d͋̋ͧ̑ͯ͛̉Glanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015

Commented:
Certainly. This should do it:
DIM file
DIM data
DIM fso
DIM path
DIM lines
DIM columns
DIM output

path = "C:\test.csv"   ' Replace this with your path
output = "C:\record"   ' Replace this with the path and the filename (without extension) for
                       '   resulting files

SET fso = CreateObject("Scripting.FileSystemObject")
SET file = fso.OpenTextFile(path, 1)

data = file.ReadAll

file.Close
SET file = Nothing

lines = Split(data, vbNewLine)

For i = 0 To Ubound(lines) - 1
    columns = Split(lines(i), "|")
    
    SET file = fso.CreateTextFile(output & i & ".xml")
    file.WriteLine("<record" & i + 1 & ">")

    For j = 0 To Ubound(columns) - 1
        file.WriteLine(vbTab & "<field" & j + 1 & ">" & columns(j) & "</field" & j + 1 & ">")
    Next
    
    file.WriteLine("</record" & i + 1 & ">")
    file.Close
    SET file = Nothing
Next

Open in new window

ǩa̹̼͍̓̂ͪͤͭ̓u͈̳̟͕̬ͩ͂̌͌̾̀ͪf̭̤͉̅̋͛͂̓͛̈m̩̘̱̃e͙̳͊̑̂ͦ̌ͯ̚d͋̋ͧ̑ͯ͛̉Glanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015

Commented:
The above results in a series of files in the C folder named "record0", "record1", ...

Author

Commented:
Once again.  PERFECT!  Hate to be overly annoying, but I do have one final iteration that would be very helpful as well.  As indicated above, there are 4 lines to a patient.  Sometimes 5, but almost always 4.  So, is there a way to make one xml file per patient?  i.e., one xml file containing the 4 or 5 rows based on the matching MRNUID?  

I would be more than happy to open a new case and forward it to you, but you seem to be so good in this area I would hate to post it and have you not see it to potentially respond.  Thanks again!
ǩa̹̼͍̓̂ͪͤͭ̓u͈̳̟͕̬ͩ͂̌͌̾̀ͪf̭̤͉̅̋͛͂̓͛̈m̩̘̱̃e͙̳͊̑̂ͦ̌ͯ̚d͋̋ͧ̑ͯ͛̉Glanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015

Commented:
Are the MRNUID's for a single patient back to back, or are they scattered throughout the file?

Author

Commented:
Both:

You would find perhaps hundreds of lines per file.  

The MRNUID for one particular study would be back to back...i.e., lines 1,2,3,4,(5) would be for one patient.  However, it is possible the same patient would have two or three studies on the same day.  if this is the case, one MRNUID could be 1,2,3,4 and a group in lines say 100+ could have another group of 4 or 5 lines.  Attached is an image of a fictitious example.  You will see 1|000xxxxx, 2|000xxxx
The four or five lines are prefixed with 1,2,3,4  If there are five lines it will be prefixed with 1,2,3,4,4 - or 1,2,3,3,4, etc. etc,. You will see what I mean in the example.

Not sure if this helps or not or if it is even possible.  If it doesn't work out then I can try and concatenate the files from your last examples or something.  I don't want to be overly bothersome, but your examples so far have been gold!

Thanks
Screen-shot-2010-01-29-at-3.14.4.png
ǩa̹̼͍̓̂ͪͤͭ̓u͈̳̟͕̬ͩ͂̌͌̾̀ͪf̭̤͉̅̋͛͂̓͛̈m̩̘̱̃e͙̳͊̑̂ͦ̌ͯ̚d͋̋ͧ̑ͯ͛̉Glanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015

Commented:
Questions:

    1) Does each file have *all* records for a patient, or just the 4/5 records for each study?
    2) Is it safe to assume that each record of the 4/5 for a particular study will have the same "000xxxxx"
ǩa̹̼͍̓̂ͪͤͭ̓u͈̳̟͕̬ͩ͂̌͌̾̀ͪf̭̤͉̅̋͛͂̓͛̈m̩̘̱̃e͙̳͊̑̂ͦ̌ͯ̚d͋̋ͧ̑ͯ͛̉Glanced up at my screen and thought I had coded the Matrix...  Turns out, I just fell asleep on the keyboard.
Most Valuable Expert 2011
Top Expert 2015

Commented:
If the answer to (1) is "all", then how would the records for one patient be recognized? Would it be the field mentioned in (2)?

Author

Commented:
The file represents, say, a hundred studies.  we get 1 file per day.  each study gets a chunk of four or five lines.  ,1,2,3,4 - 1,2,3,3,4 - 1,2,3,4,4 above would represent 3 studies.  Two of these studies may or may not belong to the same patient.  If they are for the same patient, then the 000xxxx would indicate this as 000xxx is another way to identify a particular patient.  each number is unique, but unique only to patient ... not to each study.  So, yes it is safe to assume that each record (4-5 lines of data) would have the same "000xxxx".

for uniqueness ... I suppose we could use two fields from line(s) that are preceded with the number 4|.

See attached.  field one is 4, field two is MRNUID, field nine is a unique alpha-numeric combo specific to each study.  So, perhaps we could match the previous 1's, 2's,3's,4's (prefixes) and group them together in a single XML as somehow with the help of field 9's uniqueness????

Sorry that this has turned into a pain.  if too troublesome, just advise and i will go with what I have already received via this post. Thanks
Screen-shot-2010-01-29-at-4.03.0.png

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial