• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1057
  • Last Modified:

Help with a 'grep' statement

I need help with a grep statement.  Suppose I have the below XML code snippet stored in a variable called 'indicator'.  I am using the grep statement below in a loop to extract the conditions one at a time.

export condition=`echo $indicator | grep -o "<condition cid=\"\$COND_NUM\">*.*<\/condition>"`

On first pass of the loop, COND_NUM will equal 2 so I'm expecting to only get this condition, but everything gets returned.  I think the problem is b/c I am using the *.*<\/condition> in the grep statement and its recognizing the second <\/condition> at the end of the file instead of the first one it comes to.

How can I modify my grep statement to only get the first condition?
<condition cid="1">
        <description>TRN.MERCHANT_NAME1 = substr(VEN.SCRUB_NAME1,1,length(TRN.MERCHANT_NAME1)))</description>
 
        <change_sql>UPDATE AP_VENDOR SET NAME1='NAME1_6A', SCRUB_NAME1='VWXYZabcde' WHERE VENDOR_ID='VENID-6';</change_sql>                                
        <change_sql>UPDATE PCD_TRANSACTION SET MERCHANT_NAME1='VWXYZ', TRANSACTION_DATE=(SELECT INVOICE_DATE FROM AP_VOUCHER WHERE VOUCHER_ID='OSTBU-6') WHERE MERCHANT_ID='6';</change_sql>
        <change_verify_sql>SELECT COUNT(*) FROM PCD_TRANSACTION WHERE MERCHANT_NAME1='VWXYZ';</change_verify_sql>
        <change_verify_count>1</change_verify_count>
</condition>
<condition cid="2">
        <description>(VEN.SCRUB_NAME1 = substr(TRN.MERCHANT_NAME1,1,length(VEN.SCRUB_NAME1))</description>
 
        <change_sql>UPDATE AP_VENDOR SET NAME1='NAME1_5A', SCRUB_NAME1='ABCDE' WHERE VENDOR_ID='VENID-5';</change_sql>                                
        <change_sql>UPDATE PCD_TRANSACTION SET MERCHANT_NAME1='ABCDEjihgf', TRANSACTION_DATE=(SELECT INVOICE_DATE FROM AP_VOUCHER WHERE VOUCHER_ID='OSTBU-5') WHERE MERCHANT_ID='5';</change_sql>
        <change_verify_sql>SELECT COUNT(*) FROM PCD_TRANSACTION WHERE MERCHANT_NAME1='ABCDEjihgf';</change_verify_sql>
        <change_verify_count>1</change_verify_count>
</condition>

Open in new window

0
jrram
Asked:
jrram
  • 4
  • 4
1 Solution
 
Hugh FraserConsultantCommented:
Grep doesn't support multi-line patterns. Try this awk script as a starting point.

echo $indicator | awk 'BEGIN {x=0}
{
if ($0~"<condition cid=\"1\">") {x=1}
if (x==1) {print $0}
if ($0~"</condition>") {x=0}
}'

I'm not an expert awk programmer, so you may have to play with the substitution for cid="n", or just write a couple of scripts with different values.
0
 
jrramAuthor Commented:
The XML is stored in a variable, so I don't think (?) multi-line input.  My thinking is if it was multi-line input then it wouldn't not work when trying to extract the 2nd condition also.

I think the questions is how do I tell it to stop when it finds the first </condition>.
0
 
Hugh FraserConsultantCommented:
I see. So if this this appears as a single line, the following works.

echo $y
This is a test <condition cid="1">more stuff</condition>More junk

echo $x
1

echo $y | grep -o "<condition cid=\"$x\">*.*<\/condition>"
<condition cid="1">more stuff</condition>

More importantly, this looks suspiciously like your example except for the fact that your XML string prints out as multiple lines. Can you do an

echo $indicator

to see what it looks like.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
jrramAuthor Commented:
When I do an "echo $indicator | wc -l" it returns 1 so this confirms the input is only 1 line.

And yes, in the example test condition that you gave, the grep expression does work b/c you only have one </condition> in variable $y.  If you put a second one in there (see example) below, then run the grep statement, it returns too much.

Data Setup:

x="1"
y="<condition cid="1">test data 1</condition><condition cid="2">test data 2</condition>"

Problem Statement:
echo $y | grep -o "<condition cid=\"$x\">*.*<\/condition>"

=====

Expected Result:

<condition cid="1">test data 1</condition>

Actual Result:

<condition cid="1">test data 1</condition><condition cid="2">test data 2</condition>

Notes:

As it is, the grep statement correctly finds the <condition cid="1">, but I think because of the '*.*', it greedily ignores the first </condition> (expected stopping point) and includes everything up until the last </condition> value.

Does this make sense?  Know of any parameters or changes that can me made to grep statement?
0
 
Hugh FraserConsultantCommented:
It does make sense. The *.* should be .*? to make it non-greedy, but that doesn't seem to work either. Are you bound to a grep solution, or are you willing to use an alternative?
0
 
jrramAuthor Commented:
I'm open to using an alternative solution.  I chose grep b/c it seemed like a simple thing to do but doesn't appear that way anymore.  I also looked at SED, but that'd didn't work for me either (as a standalone solution) and I'm not that familiar with awk, but it seems like it could work.

I'm still interested in whatever alternate solution you can provide, but as a workaround I added a sed statement to after the grep statement to chop off the un-needed data and this works for me.

condition=`echo $indicator | grep -o "<condition cid=\"$COND_NUM\">*.*<\/condition>" | sed "s/<\/condition>.*//g"`

0
 
macker-Commented:
Have you tried using -m to match just the first occurrence?

You could combine this in a bash script, with a for loop, to increment $i and loop thru the matches, assigning each to a corresponding numbered variable.
0
 
jrramAuthor Commented:
macker,

I did try the -m option and it still brings back the entire thing.  The below code is which is a repeat of what I posted just before your first post is what worked for me.

If there are no objections, I'm going to request a points refund.
condition=`echo $indicator | grep -o "<condition cid=\"$COND_NUM\">*.*<\/condition>" | sed "s/<\/condition>.*//g"`

Open in new window

0
 
Hugh FraserConsultantCommented:
Sorry for the delay, jrram. The solution you posted is classic Unix shell stuff, and I can't find a way to do better in shell code.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 4
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now