Solved

XML - reformat with end node name </XXXX> for each

Posted on 2014-12-23
5
90 Views
Last Modified: 2015-01-17
Hi

I need to add the end name to each node ( I think that is what its called) in the XML file.

I was able to do this using XML:TreeBuilder but I found out I can install any Perl Modules on the PC
So not modules.  Although I do have XML::Smart  XML::Simple all ready on the PC. Dont know if that helps.

Example

Before

<JOB APPLICATION="TEST_00002" APPL_TYPE="OS" >
      <RULE_BASED_CALENDARS NAME="*"/>
      <QUANTITATIVE NAME="TEST99990" ONFAIL="R" ONOK="R" QUANT="1"/>
      <QUANTITATIVE NAME="TEST99991" ONFAIL="R" ONOK="R" QUANT="1"/>
    </JOB>


needs to be

<JOB APPLICATION="TEST_00002" APPL_TYPE="OS" >
      <RULE_BASED_CALENDARS NAME="*"/></RULE_BASED_CALENDARS>
      <QUANTITATIVE NAME="TEST99990" ONFAIL="R" ONOK="R" QUANT="1"/></QUANTITATIVE>
      <QUANTITATIVE NAME="TEST99991" ONFAIL="R" ONOK="R" QUANT="1"/></QUANTITATIVE>
    </JOB>



Thanks
0
Comment
Question by:mikeysmailbox1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
5 Comments
 
LVL 26

Accepted Solution

by:
wilcoxon earned 334 total points
ID: 40515187
Why?  The first is perfectly valid XML?  Also, the second is invalid XML (you need to remove the / (changing QUANT="1"/> to QUANT="1">)).

This should do what you want provided each XML element is on one line (and not split across lines).  If it gives weird results, try reversing $1 and $2 (I always forget which order they go in when nested).

perl -i.bak -pe 's{(<(\w+)\b[^>]+)/>}{$1></$2>}g' input.xml

Open in new window

0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 40515193
If you are unfamiliar with XML, the following two lines are equivalent:

<QUANTITATIVE NAME="something"/>
<QUANTITATIVE NAME="something></QUANTITATIVE>

The /> at the end of the first line acts as a shortcut to avoid having to do the second line.  Further, the first line is the preferred form (you only need an explicit end tag if the element itself has a value such as <QUANTITATIVE>something</QUANTITATIVE>).
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 166 total points
ID: 40515199
perl -i.bak -pe 's#(<(\w+)[^>]*/>)(</$2>)?#$1</$2>#g' file.xml
0
 
LVL 26

Assisted Solution

by:wilcoxon
wilcoxon earned 334 total points
ID: 40537951
Both ozo's and my answer have minor problems.

Mine:
Will not work if the start tag does not have any attributes (I used + instead of *).
Will produce invalid XML if there is already an end tag (it will cause there to be two end tags).  However, the XML was already invalid if it both had <tag/> and </tag> anyway.

Ozo's:
Will produce invalid XML if there is not an end tag (the /> closer is left in as well as adding an end tag) which is the primary case you are asking about.

Here's a combined regex that fixes all issues I see:
perl -i.bak -pe 's{(<(\w+)[^>]*)/>(?:</$2>)?}{$1></$2>}g' input.xml

Open in new window

0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

632 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question