tonelm54
asked on
splitting a sting into elements
Ive got an XML file which was written by a program and I cant get anything to read it, as its 12Gb in size I dont want to manually go through line by line trying to correct it. Ive managed to figure out phasing the parts I need, however Im stuck when it comes down to the individual elements.
Ive got a line such as:-
And I want to be able to split the line up into an array such as:-
But as the row sometimes has different elements I wanted someway of doing this, but cant figure it out.
I thought I could load each line into a simpleXML_Load_String and extract the data, but becuase the row isnt closed it complains about it. If I manually fix the field to:-
Any ideas?
Ive got a line such as:-
<field name="createdby" value="0fb646e4-5590-e611-80e8-1458d 05b422c" lookupentity="systemuser" lookupentityname="User 27" />
And I want to be able to split the line up into an array such as:-
field name="createdby"
value="0fb646e4-5590-e611-80e8-1458d05b422c"
lookupentity="systemuser"
lookupentityname="User 27"
But as the row sometimes has different elements I wanted someway of doing this, but cant figure it out.
I thought I could load each line into a simpleXML_Load_String and extract the data, but becuase the row isnt closed it complains about it. If I manually fix the field to:-
<field name="createdby" value="0fb646e4-5590-e611-Its happy, but I really dont want to go through a 12Gb file and fix manually each line.80e8-1458d 05b422c" lookupentity="systemuser" lookupentityname="User 27">NO Value</field>
Any ideas?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@Jonathan: Still waiting to see if there is more test data, but for this sample, there is nothing that needs to be "fixed." It's a perfectly valid XML document -- it just has all of its information in the attributes, not between opening and closing tags. It doesn't even need to be turned into an array -- foreach() can iterate over the attributes.
In terms of processing a 12GB file, while that may be possible, in theory, with a 64-bit machine, it seems unlikely and will probably have to be taken in smaller bites
In terms of processing a 12GB file, while that may be possible, in theory, with a 64-bit machine, it seems unlikely and will probably have to be taken in smaller bites
Yeah, that was my thought, too - I mentioned that near the end of my first comment:
I was just thrown off by the remark that changing it to: "<field...>NO Value</field>" would work for him. If that's the case, maybe he was using some kind of poorly-built custom XML parser that didn't understand self-closing tags.
Since the /> is technically a valid ending, my assumption is that your original issues are either related to file size or to XML that doesn't confirm to a WSDL or some other XML rule.
I was just thrown off by the remark that changing it to: "<field...>NO Value</field>" would work for him. If that's the case, maybe he was using some kind of poorly-built custom XML parser that didn't understand self-closing tags.
ASKER
Sorry, I was unable to supply any test data, however the project has been cancelled so I dont need this anymore.
Thank you for your support anyways
Thank you for your support anyways
Open in new window
You can potentially save over 30 bytes for every GUID in that field, so if your XML file is full of these things, it might help. Also, you could potentially swap out long tag names with shortened versions, like <f>...</f> instead of <field>...</field> to further compress the XML file while preserving its structure (you'd just have to update any mappings that referenced "field" and change it to "f", for example).