Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Help with correcting a SED or AWK statement to set XML element to off

Posted on 2009-02-08
11
Medium Priority
?
434 Views
Last Modified: 2012-05-06
Here is my data:

====(input.xml) ==============

<extractors>
   <extractor>
      <extractor_name>PS_VENDOR</extractor_name>
      <field>
         <name>SETID</name>
      </field>
      <field>
         <name>NAME1</name>
      </field>
   </extractor>
   <extractor>
      <extractor_name>PS_VENDOR_LOC</extractor_name>
      <field>
         <name>LOCID</name>
      </field>
      <field>
         <name>NAME1</name>
      </field>
      </field>
   </extractor>
</extractors>

==================

What I am trying to do is replace the <field> tag with <field mode="off"> for the NAME1 field in the first extractor (for PS_VENDOR) node.  The problem I am having is the script I have below always replaces the first <field> tag it comes to.  In this case, if I ran it, it would put the mode="off" on the SETID.

I'd prefer to stick with a SED solution, but if there is one that works with AWK, please suggest those also.

sed "   :t
        /<extractor>/,/<\/extractor>/ {       # For each line between <extractor> and the first </extractor> tag
                 /<\/extractor>/!  {          # If we are not at the </extractor> tag
                         N;                   # add the Next line to the pattern space
                         b t;                 # and branch (loop back) to the :t label.
                 }                            # The </extractor> tag has been reached.
         /<extractor_name>PS_VENDOR<\/extractor_name>/ s/\(<field>\)\(.*NAME1<\/name>\)/<field mode=\"off\">\2/
        }" input.xml > output.xml

Open in new window

0
Comment
Question by:jrram
  • 6
  • 4
11 Comments
 
LVL 85

Expert Comment

by:ozo
ID: 23586344
awk 'BEGIN{RS="\0";}match($0,"<field>[[:space:]]*<name>NAME1<"){$0=substr($0,1,RSTART+5) " mode=off" substr($0,RSTART+6)}1' input.xml
0
 
LVL 13

Author Comment

by:jrram
ID: 23586444
Thanks for the response ozo,

The one thing I don't see in your script is where you check to make sure we are updating the correct <extractor> node?  What if I wanted to update the NAME1 in the 2nd extractor node?

This is what line 7 (/<extractor_name>PS_VENDOR<\/extractor_name>/) in my original code does.  It verifies the extractor section before trying to make the update.
0
 
LVL 23

Accepted Solution

by:
Maciej S earned 2000 total points
ID: 23588510

sed '
/<extractor_name>PS_VENDOR<\/extractor_name>/,/<\/extractor>/ {
   /<field>/,/<\/field>/ {
      :t
      /<\/field>/! {
         N
         b t
      }
      s/<field>\(.*NAME1\)/<field mode="off">\1/
   }
}
' input.xml > output.xml

Open in new window

0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 13

Author Comment

by:jrram
ID: 23601167
Thanks olkit, this works perfectly.  I have a similar question that'll post below and award 200 more points if you can help me with it.
0
 
LVL 13

Author Comment

by:jrram
ID: 23601534
Here is my data:

====(input.xml) ==============

<entity>
        <name>AP_VENDOR</name>
</entity>
<entity>
        <name>AP_REVERSALS</name>
</entity>
<entity>
        <name>AP_HARDWARE</name>        
</entity>
==================

What I am trying do is toggle the status of a particular entity based on the entity's name (in between <name></name> tags.

For example, in this scenario, I want to set the status to <entity mode="batch"> for the entity which has the name AP_VENDOR in it.  But, for all of my attempts below, every entity is being updated.

This is the result I'm getting:

====RESULT ==============

<entity mode="batch">
        <name>AP_VENDOR</name>
</entity>
<entity mode="batch">
        <name>AP_REVERSALS</name>
</entity>
<entity mode="batch">
        <name>AP_HARDWARE</name>        
</entity>
==================


STATUS=" mode=\"batch\""
 
=== Attempt #1 ==
 
sed -i "
/<entity/,/<name>AP_VENDOR/ {
   /<entity>/ {
 
      s/\(<entity\).*\(>\)/\1$STATUS\2/
   }
}
" input.xml
 
=== Attempt #2 ==
 
sed -i "
/<entity/,/<name>AP_VENDOR<\/name>/ {
   /<entity.*>/ {
 
      s/\(<entity\).*\(>\)/\1$STATUS\2/
   }
}
" input.xml
 
=== Attempt #3 ==
 
sed -i "
/<entity.*/,/<name>AP_VENDOR<\/name>/ {
   /<entity.*>/ {
 
      s/\(<entity\).*\(>\)/\1$STATUS\2/
   }
}
" input.xml
 
=== Attempt #4 ==
 
sed -i "
/<entity.*>/,/<name>AP_VENDOR<\/name>/ {
   /<entity.*>/ {
 
      s/\(<entity\).*\(>\)/\1$STATUS\2/
   }
}
" input.xml
 
=== Attempt #5 ==
 
sed -i "
/<entity.*>/,/<name>AP_VENDOR<\/name>/ {
   /<entity.*>/ {
 
      /AP_VENDOR/ s/\(<entity\).*\(>\)/\1$STATUS\2/
   }
}
" input.xml

Open in new window

0
 
LVL 13

Author Comment

by:jrram
ID: 23601596
One thing to note:  The reason I used the .* for the entity tag is b/c it might not always be just <entity>.  It could be <entity mode="off">, <entity run="once">, etc.  

The point is I don't alway know what the entity tag will contain, but I know I want to replace anything between <entity" and ">" with the STATUS.
0
 
LVL 23

Expert Comment

by:Maciej S
ID: 23601667
So, what should be the result of changing <entity run="once">?

Should it be <entity run="once" mode="batch"> or <entity mode="batch">?
0
 
LVL 23

Assisted Solution

by:Maciej S
Maciej S earned 2000 total points
ID: 23601799
Sorry, I see - it should be <entity mode="batch"> (you wrote to delete everything after entity and before ending >.
Appropriate script below).
#!/bin/sh
 
INPUT=./input2.xml
OUTPUT=./output2.xml
 
NAME=AP_VENDOR
#NAME=AP_REVERSALS
#NAME=AP_HARDWARE
MODE=batch
 
sed '
/<entity/,/<\/entity>/ {
   :t
   /<\/entity>/! {
      N
      b t
   }
   s/<entity[^>]*\(.*name.*'${NAME}'\)/<entity mode=\"'${MODE}'\"\1/
}
' ${INPUT} > ${OUTPUT}

Open in new window

0
 
LVL 13

Author Comment

by:jrram
ID: 23603074
Thanks oklit.  I need to make the script a little more dynamic since it won't always be used with the same tags so instead of hardcoding the tags I was working with, I put them in variables and tried to use them in the script. I've done it this way in previous scripts I've used, but it doesn't appear to work w/ your solution.

Attempt #1 works, which is your script with just the variable names changed, but my attempt #2 did not.  Can you help with this and why did you have to execute the variables in your previous post in another shell ${}?
#!/bin/bash
 
nodeTag="entity"
nodeUniqueIDTag="name"
 
ENTITYNAME="AP_VENDOR"
MODE=" mode=\"off\""
 
=== Attempt #1 ==
 
sed -i '
/<entity/,/<\/entity>/ {
   :t
   /<\/entity>/! {
      N
      b t
   }
   s/<entity[^>]*\(.*name.*'${ENTITYNAME}'<\)/<entity'${MODE}'\1/
}
' $1
 
=== Attempt #2 ==
 
sed -i '
/<$nodeTag/,/<\/$nodeTag>/ {
   :t
   /<\/$nodeTag>/! {
      N
      b t
   }
   s/<$nodeTag[^>]*\(.*$nodeUniqueIDTag.*'${ENTITYNAME}'<\)/<$nodeTag'${MODE}'\1/
}
' $1

Open in new window

0
 
LVL 23

Assisted Solution

by:Maciej S
Maciej S earned 2000 total points
ID: 23603178
You have two options:
1. surround your variables ($nodeTag and $nodeUniqueIDTag) with ' '
2. remove opening and closing ' ', and replace them with "". Also, remove ' ' from '${ENTITYNAME}' and '${MODE}' (do not replace these ones with "").
0
 
LVL 13

Author Comment

by:jrram
ID: 23603576
You rock.  It works perfectly for me now.

And this is the cool thing about Linux, there is always multiple ways to do something and something new to learn.
0

Featured Post

Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…
Suggested Courses

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question