Viclyn
asked on
I need a parsing script...or a command with awk, not sure
This is an XML file exported from my text. I need a script to parse it...as an example below this XML file
<sms protocol="0" address="4053234432" date="1388706261627" type="2" subject="null" body="I'm heading home" toa="null" sc_toa="null" service_center="null" read="1" status="-1" locked="0" date_sent="null" readable_date="Jan 2, 2014 5:44:21 PM" contact_name="Mom" />
<sms protocol="0" address="9726768860" date="1388786728946" type="1" subject="null" body="Tony_new_number - How are you?" toa="null" sc_toa="null" service_center="null" read="1" status="-1" locked="0" date_sent="null" readable_date="Jan 3, 2014 4:05:28 PM" contact_name="Tony Comp" />
<sms protocol="0" address="9726768860" date="1388786847009" type="1" subject="null" body="Tony_new_number - Fine.just anticipating getting back. I'm ready. Although I'll miss her." toa="null" sc_toa="null" service_center="null" read="1" status="-1" locked="0" date_sent="null" readable_date="Jan 3, 2014 4:07:27 PM" contact_name="Tony Comp" />
I would like it to read like the following
4053234432 "I'm heading home" Jan 2, 2014 5:44:21 "Mom"
9726768860 "Tony_new_number - How are you?" "Jan 3, 2014 4:05:28 PM" ""Tony Comp"
"9726768860" "Tony_new_number - Fine.just anticipating getting back. I'm ready. Although I'll miss her." "Jan 3, 2014 4:07:27 PM" "Tony Comp"
So, I basically just need the:
The fields: address="xxx" body="xxxx" readable_date="xxxxx" and contact_name="xxx"
Thanks for any help
<sms protocol="0" address="4053234432" date="1388706261627" type="2" subject="null" body="I'm heading home" toa="null" sc_toa="null" service_center="null" read="1" status="-1" locked="0" date_sent="null" readable_date="Jan 2, 2014 5:44:21 PM" contact_name="Mom" />
<sms protocol="0" address="9726768860" date="1388786728946" type="1" subject="null" body="Tony_new_number - How are you?" toa="null" sc_toa="null" service_center="null" read="1" status="-1" locked="0" date_sent="null" readable_date="Jan 3, 2014 4:05:28 PM" contact_name="Tony Comp" />
<sms protocol="0" address="9726768860" date="1388786847009" type="1" subject="null" body="Tony_new_number - Fine.just anticipating getting back. I'm ready. Although I'll miss her." toa="null" sc_toa="null" service_center="null" read="1" status="-1" locked="0" date_sent="null" readable_date="Jan 3, 2014 4:07:27 PM" contact_name="Tony Comp" />
I would like it to read like the following
4053234432 "I'm heading home" Jan 2, 2014 5:44:21 "Mom"
9726768860 "Tony_new_number - How are you?" "Jan 3, 2014 4:05:28 PM" ""Tony Comp"
"9726768860" "Tony_new_number - Fine.just anticipating getting back. I'm ready. Although I'll miss her." "Jan 3, 2014 4:07:27 PM" "Tony Comp"
So, I basically just need the:
The fields: address="xxx" body="xxxx" readable_date="xxxxx" and contact_name="xxx"
Thanks for any help
lex/bison seems more appropriate to dig randomly ordered fields you have.
ASKER
I don't think it's very random. There are hundreds of entries, and they take on the format:
<sms
protocol=""
address=""
date=""
type=""
subject=""
body=""
toa=""
sc_toa=""
service_center=""
read=""
status=""
locked=""
date_sent=""
readable_date=""
contact_name=""
/>
I just want these four field:
address=""
body=""
readable_date=""
contact_name=""
<sms
protocol=""
address=""
date=""
type=""
subject=""
body=""
toa=""
sc_toa=""
service_center=""
read=""
status=""
locked=""
date_sent=""
readable_date=""
contact_name=""
/>
I just want these four field:
address=""
body=""
readable_date=""
contact_name=""
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
My answer does assume a couple of things:
- that each entry is on one line, as in your example text
- that none of the fields contain double-quote characters. What happens if they do?
- that each entry is on one line, as in your example text
- that none of the fields contain double-quote characters. What happens if they do?
ASKER
Very nice. Thanks