Help with correcting a SED or AWK statement to set XML element to off

Here is my data:

====(input.xml) ==============

<extractors>
   <extractor>
      <extractor_name>PS_VENDOR</extractor_name>
      <field>
         <name>SETID</name>
      </field>
      <field>
         <name>NAME1</name>
      </field>
   </extractor>
   <extractor>
      <extractor_name>PS_VENDOR_LOC</extractor_name>
      <field>
         <name>LOCID</name>
      </field>
      <field>
         <name>NAME1</name>
      </field>
      </field>
   </extractor>
</extractors>

==================

What I am trying to do is replace the <field> tag with <field mode="off"> for the NAME1 field in the first extractor (for PS_VENDOR) node.  The problem I am having is the script I have below always replaces the first <field> tag it comes to.  In this case, if I ran it, it would put the mode="off" on the SETID.

I'd prefer to stick with a SED solution, but if there is one that works with AWK, please suggest those also.

sed "   :t
        /<extractor>/,/<\/extractor>/ {       # For each line between <extractor> and the first </extractor> tag
                 /<\/extractor>/!  {          # If we are not at the </extractor> tag
                         N;                   # add the Next line to the pattern space
                         b t;                 # and branch (loop back) to the :t label.
                 }                            # The </extractor> tag has been reached.
         /<extractor_name>PS_VENDOR<\/extractor_name>/ s/\(<field>\)\(.*NAME1<\/name>\)/<field mode=\"off\">\2/
        }" input.xml > output.xml

Open in new window

LVL 13
jrramAsked:
Who is Participating?
 
Maciej SsysadminCommented:

sed '
/<extractor_name>PS_VENDOR<\/extractor_name>/,/<\/extractor>/ {
   /<field>/,/<\/field>/ {
      :t
      /<\/field>/! {
         N
         b t
      }
      s/<field>\(.*NAME1\)/<field mode="off">\1/
   }
}
' input.xml > output.xml

Open in new window

0
 
ozoCommented:
awk 'BEGIN{RS="\0";}match($0,"<field>[[:space:]]*<name>NAME1<"){$0=substr($0,1,RSTART+5) " mode=off" substr($0,RSTART+6)}1' input.xml
0
 
jrramAuthor Commented:
Thanks for the response ozo,

The one thing I don't see in your script is where you check to make sure we are updating the correct <extractor> node?  What if I wanted to update the NAME1 in the 2nd extractor node?

This is what line 7 (/<extractor_name>PS_VENDOR<\/extractor_name>/) in my original code does.  It verifies the extractor section before trying to make the update.
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

 
jrramAuthor Commented:
Thanks olkit, this works perfectly.  I have a similar question that'll post below and award 200 more points if you can help me with it.
0
 
jrramAuthor Commented:
Here is my data:

====(input.xml) ==============

<entity>
        <name>AP_VENDOR</name>
</entity>
<entity>
        <name>AP_REVERSALS</name>
</entity>
<entity>
        <name>AP_HARDWARE</name>        
</entity>
==================

What I am trying do is toggle the status of a particular entity based on the entity's name (in between <name></name> tags.

For example, in this scenario, I want to set the status to <entity mode="batch"> for the entity which has the name AP_VENDOR in it.  But, for all of my attempts below, every entity is being updated.

This is the result I'm getting:

====RESULT ==============

<entity mode="batch">
        <name>AP_VENDOR</name>
</entity>
<entity mode="batch">
        <name>AP_REVERSALS</name>
</entity>
<entity mode="batch">
        <name>AP_HARDWARE</name>        
</entity>
==================


STATUS=" mode=\"batch\""
 
=== Attempt #1 ==
 
sed -i "
/<entity/,/<name>AP_VENDOR/ {
   /<entity>/ {
 
      s/\(<entity\).*\(>\)/\1$STATUS\2/
   }
}
" input.xml
 
=== Attempt #2 ==
 
sed -i "
/<entity/,/<name>AP_VENDOR<\/name>/ {
   /<entity.*>/ {
 
      s/\(<entity\).*\(>\)/\1$STATUS\2/
   }
}
" input.xml
 
=== Attempt #3 ==
 
sed -i "
/<entity.*/,/<name>AP_VENDOR<\/name>/ {
   /<entity.*>/ {
 
      s/\(<entity\).*\(>\)/\1$STATUS\2/
   }
}
" input.xml
 
=== Attempt #4 ==
 
sed -i "
/<entity.*>/,/<name>AP_VENDOR<\/name>/ {
   /<entity.*>/ {
 
      s/\(<entity\).*\(>\)/\1$STATUS\2/
   }
}
" input.xml
 
=== Attempt #5 ==
 
sed -i "
/<entity.*>/,/<name>AP_VENDOR<\/name>/ {
   /<entity.*>/ {
 
      /AP_VENDOR/ s/\(<entity\).*\(>\)/\1$STATUS\2/
   }
}
" input.xml

Open in new window

0
 
jrramAuthor Commented:
One thing to note:  The reason I used the .* for the entity tag is b/c it might not always be just <entity>.  It could be <entity mode="off">, <entity run="once">, etc.  

The point is I don't alway know what the entity tag will contain, but I know I want to replace anything between <entity" and ">" with the STATUS.
0
 
Maciej SsysadminCommented:
So, what should be the result of changing <entity run="once">?

Should it be <entity run="once" mode="batch"> or <entity mode="batch">?
0
 
Maciej SsysadminCommented:
Sorry, I see - it should be <entity mode="batch"> (you wrote to delete everything after entity and before ending >.
Appropriate script below).
#!/bin/sh
 
INPUT=./input2.xml
OUTPUT=./output2.xml
 
NAME=AP_VENDOR
#NAME=AP_REVERSALS
#NAME=AP_HARDWARE
MODE=batch
 
sed '
/<entity/,/<\/entity>/ {
   :t
   /<\/entity>/! {
      N
      b t
   }
   s/<entity[^>]*\(.*name.*'${NAME}'\)/<entity mode=\"'${MODE}'\"\1/
}
' ${INPUT} > ${OUTPUT}

Open in new window

0
 
jrramAuthor Commented:
Thanks oklit.  I need to make the script a little more dynamic since it won't always be used with the same tags so instead of hardcoding the tags I was working with, I put them in variables and tried to use them in the script. I've done it this way in previous scripts I've used, but it doesn't appear to work w/ your solution.

Attempt #1 works, which is your script with just the variable names changed, but my attempt #2 did not.  Can you help with this and why did you have to execute the variables in your previous post in another shell ${}?
#!/bin/bash
 
nodeTag="entity"
nodeUniqueIDTag="name"
 
ENTITYNAME="AP_VENDOR"
MODE=" mode=\"off\""
 
=== Attempt #1 ==
 
sed -i '
/<entity/,/<\/entity>/ {
   :t
   /<\/entity>/! {
      N
      b t
   }
   s/<entity[^>]*\(.*name.*'${ENTITYNAME}'<\)/<entity'${MODE}'\1/
}
' $1
 
=== Attempt #2 ==
 
sed -i '
/<$nodeTag/,/<\/$nodeTag>/ {
   :t
   /<\/$nodeTag>/! {
      N
      b t
   }
   s/<$nodeTag[^>]*\(.*$nodeUniqueIDTag.*'${ENTITYNAME}'<\)/<$nodeTag'${MODE}'\1/
}
' $1

Open in new window

0
 
Maciej SsysadminCommented:
You have two options:
1. surround your variables ($nodeTag and $nodeUniqueIDTag) with ' '
2. remove opening and closing ' ', and replace them with "". Also, remove ' ' from '${ENTITYNAME}' and '${MODE}' (do not replace these ones with "").
0
 
jrramAuthor Commented:
You rock.  It works perfectly for me now.

And this is the cool thing about Linux, there is always multiple ways to do something and something new to learn.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.