asked on

Convert Pipe Delimited text file to XML

Below is an example of a pipe-delimited text file I have. I want to get this data into MySQL. If I can get it converted to XML, then I can accomplish the rest. So, any thoughts on how this could be converted to XML? I expect a vbs script that I could execute via batch file perhaps?

The code strip below is actually number one copied three times. The real 2, 3 and 4 may containt different data elements, but the point is that it is all pipe delimited.

Could java crank this out to XML?

1|MRNUID|SSN|LASTNAME|FIRSTNAME|W|DOB|FIRST|LAST|W|Stree Address||City |State|zipcode||areacode|phonenumber|N|age|||||||
2|MRNUID|SSN|LASTNAME|FIRSTNAME|W|DOB|FIRST|LAST|W|Stree Address||City |State|zipcode||areacode|phonenumber|N|age|||||||
3|MRNUID|SSN|LASTNAME|FIRSTNAME|W|DOB|FIRST|LAST|W|Stree Address||City |State|zipcode||areacode|phonenumber|N|age|||||||
4|MRNUID|SSN|LASTNAME|FIRSTNAME|W|DOB|FIRST|LAST|W|Stree Address||City |State|zipcode||areacode|phonenumber|N|age|||||||

Open in new window

AJAY CHADHA

What is the format required for XML data? I mean what should be the tags names?

weklica

ASKER

For simplicity, you could just call it field1, field2, field3. It isn't too important as long as it is consistent. I plan on running it to another process to get it into MySQL database anyway. Assuming it is easier just to mlae them up?? I can edit it afterwards if I want to. Have specific names.

Will this work? If not, I can generate a very specific list for you. Thanks

AJAY CHADHA

If you have phpMyAdmin installed on your server then you can use it to import data from the CSV file (pipe delimited) into your MySQL database table.

weklica

ASKER

Automatically? I want it to monitor a queue or something. As long as there is a way to never have to touch this then it would be a fine solution. Is automated csv or pipe edlimited import an option?

ASKER CERTIFIED SOLUTION

kaufmed

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

weklica

ASKER

This is what I was hoping for! I will test it as soon as possible (1-2 days).

kaufmed

Cool. As posted, the elements are

<docRoot>
<record#>
<field##></field##>
</record#>
</docRoot>

where # and ## correspond to the appropriate record/field number.

weklica

ASKER

Absolutely perfect! Brilliant! Thank you very much!

weklica

ASKER

Kaufmed,

One more question. Is there any way you know of to get each record into its OWN XML file? Essentially, this will create hundreds of Record numbers and one long XML file. I want to dump this file into a program that cannot handle looping or batching through the xml file. Could I have each Record# essentially be its own xml file?

kaufmed

Certainly. This should do it:

DIM file
DIM data
DIM fso
DIM path
DIM lines
DIM columns
DIM output

path = "C:\test.csv"   ' Replace this with your path
output = "C:\record"   ' Replace this with the path and the filename (without extension) for
                       '   resulting files

SET fso = CreateObject("Scripting.FileSystemObject")
SET file = fso.OpenTextFile(path, 1)

data = file.ReadAll

file.Close
SET file = Nothing

lines = Split(data, vbNewLine)

For i = 0 To Ubound(lines) - 1
    columns = Split(lines(i), "|")
    
    SET file = fso.CreateTextFile(output & i & ".xml")
    file.WriteLine("<record" & i + 1 & ">")

    For j = 0 To Ubound(columns) - 1
        file.WriteLine(vbTab & "<field" & j + 1 & ">" & columns(j) & "</field" & j + 1 & ">")
    Next
    
    file.WriteLine("</record" & i + 1 & ">")
    file.Close
    SET file = Nothing
Next

Open in new window

kaufmed

The above results in a series of files in the C folder named "record0", "record1", ...

weklica

ASKER

Once again. PERFECT! Hate to be overly annoying, but I do have one final iteration that would be very helpful as well. As indicated above, there are 4 lines to a patient. Sometimes 5, but almost always 4. So, is there a way to make one xml file per patient? i.e., one xml file containing the 4 or 5 rows based on the matching MRNUID?

I would be more than happy to open a new case and forward it to you, but you seem to be so good in this area I would hate to post it and have you not see it to potentially respond. Thanks again!

kaufmed

Are the MRNUID's for a single patient back to back, or are they scattered throughout the file?

weklica

ASKER

Both:

You would find perhaps hundreds of lines per file.

The MRNUID for one particular study would be back to back...i.e., lines 1,2,3,4,(5) would be for one patient. However, it is possible the same patient would have two or three studies on the same day. if this is the case, one MRNUID could be 1,2,3,4 and a group in lines say 100+ could have another group of 4 or 5 lines. Attached is an image of a fictitious example. You will see 1|000xxxxx, 2|000xxxx
The four or five lines are prefixed with 1,2,3,4 If there are five lines it will be prefixed with 1,2,3,4,4 - or 1,2,3,3,4, etc. etc,. You will see what I mean in the example.

Not sure if this helps or not or if it is even possible. If it doesn't work out then I can try and concatenate the files from your last examples or something. I don't want to be overly bothersome, but your examples so far have been gold!

Thanks
Screen-shot-2010-01-29-at-3.14.4.png

kaufmed

Questions:

1) Does each file have *all* records for a patient, or just the 4/5 records for each study?
2) Is it safe to assume that each record of the 4/5 for a particular study will have the same "000xxxxx"

kaufmed

If the answer to (1) is "all", then how would the records for one patient be recognized? Would it be the field mentioned in (2)?

weklica

ASKER

The file represents, say, a hundred studies. we get 1 file per day. each study gets a chunk of four or five lines. ,1,2,3,4 - 1,2,3,3,4 - 1,2,3,4,4 above would represent 3 studies. Two of these studies may or may not belong to the same patient. If they are for the same patient, then the 000xxxx would indicate this as 000xxx is another way to identify a particular patient. each number is unique, but unique only to patient ... not to each study. So, yes it is safe to assume that each record (4-5 lines of data) would have the same "000xxxx".

for uniqueness ... I suppose we could use two fields from line(s) that are preceded with the number 4|.

See attached. field one is 4, field two is MRNUID, field nine is a unique alpha-numeric combo specific to each study. So, perhaps we could match the previous 1's, 2's,3's,4's (prefixes) and group them together in a single XML as somehow with the help of field 9's uniqueness????

Sorry that this has turned into a pain. if too troublesome, just advise and i will go with what I have already received via this post. Thanks
Screen-shot-2010-01-29-at-4.03.0.png