asked on

Help with a 'grep' statement

I need help with a grep statement. Suppose I have the below XML code snippet stored in a variable called 'indicator'. I am using the grep statement below in a loop to extract the conditions one at a time.

export condition=`echo $indicator | grep -o "<condition cid=\"\$COND_NUM\">*.*<\/condition>"`

On first pass of the loop, COND_NUM will equal 2 so I'm expecting to only get this condition, but everything gets returned. I think the problem is b/c I am using the *.*<\/condition> in the grep statement and its recognizing the second <\/condition> at the end of the file instead of the first one it comes to.

How can I modify my grep statement to only get the first condition?

<condition cid="1">
        <description>TRN.MERCHANT_NAME1 = substr(VEN.SCRUB_NAME1,1,length(TRN.MERCHANT_NAME1)))</description>
 
        <change_sql>UPDATE AP_VENDOR SET NAME1='NAME1_6A', SCRUB_NAME1='VWXYZabcde' WHERE VENDOR_ID='VENID-6';</change_sql>                                
        <change_sql>UPDATE PCD_TRANSACTION SET MERCHANT_NAME1='VWXYZ', TRANSACTION_DATE=(SELECT INVOICE_DATE FROM AP_VOUCHER WHERE VOUCHER_ID='OSTBU-6') WHERE MERCHANT_ID='6';</change_sql>
        <change_verify_sql>SELECT COUNT(*) FROM PCD_TRANSACTION WHERE MERCHANT_NAME1='VWXYZ';</change_verify_sql>
        <change_verify_count>1</change_verify_count>
</condition>
<condition cid="2">
        <description>(VEN.SCRUB_NAME1 = substr(TRN.MERCHANT_NAME1,1,length(VEN.SCRUB_NAME1))</description>
 
        <change_sql>UPDATE AP_VENDOR SET NAME1='NAME1_5A', SCRUB_NAME1='ABCDE' WHERE VENDOR_ID='VENID-5';</change_sql>                                
        <change_sql>UPDATE PCD_TRANSACTION SET MERCHANT_NAME1='ABCDEjihgf', TRANSACTION_DATE=(SELECT INVOICE_DATE FROM AP_VOUCHER WHERE VOUCHER_ID='OSTBU-5') WHERE MERCHANT_ID='5';</change_sql>
        <change_verify_sql>SELECT COUNT(*) FROM PCD_TRANSACTION WHERE MERCHANT_NAME1='ABCDEjihgf';</change_verify_sql>
        <change_verify_count>1</change_verify_count>
</condition>

Open in new window

Hugh Fraser

Grep doesn't support multi-line patterns. Try this awk script as a starting point.

echo $indicator | awk 'BEGIN {x=0}
{
if ($0~"<condition cid=\"1\">") {x=1}
if (x==1) {print $0}
if ($0~"</condition>") {x=0}
}'

I'm not an expert awk programmer, so you may have to play with the substitution for cid="n", or just write a couple of scripts with different values.

jrram

ASKER

The XML is stored in a variable, so I don't think (?) multi-line input. My thinking is if it was multi-line input then it wouldn't not work when trying to extract the 2nd condition also.

I think the questions is how do I tell it to stop when it finds the first </condition>.

Hugh Fraser

I see. So if this this appears as a single line, the following works.

echo $y
This is a test <condition cid="1">more stuff</condition>More junk

echo $x
1

echo $y | grep -o "<condition cid=\"$x\">*.*<\/condition>"
<condition cid="1">more stuff</condition>

More importantly, this looks suspiciously like your example except for the fact that your XML string prints out as multiple lines. Can you do an

echo $indicator

to see what it looks like.

jrram

ASKER

When I do an "echo $indicator | wc -l" it returns 1 so this confirms the input is only 1 line.

And yes, in the example test condition that you gave, the grep expression does work b/c you only have one </condition> in variable $y. If you put a second one in there (see example) below, then run the grep statement, it returns too much.

Data Setup:

x="1"
y="<condition cid="1">test data 1</condition><condition cid="2">test data 2</condition>"

Problem Statement:
echo $y | grep -o "<condition cid=\"$x\">*.*<\/condition>"

=====

Expected Result:

<condition cid="1">test data 1</condition>

Actual Result:

<condition cid="1">test data 1</condition><condition cid="2">test data 2</condition>

Notes:

As it is, the grep statement correctly finds the <condition cid="1">, but I think because of the '*.*', it greedily ignores the first </condition> (expected stopping point) and includes everything up until the last </condition> value.

Does this make sense? Know of any parameters or changes that can me made to grep statement?

Hugh Fraser

It does make sense. The *.* should be .*? to make it non-greedy, but that doesn't seem to work either. Are you bound to a grep solution, or are you willing to use an alternative?

jrram

ASKER

I'm open to using an alternative solution. I chose grep b/c it seemed like a simple thing to do but doesn't appear that way anymore. I also looked at SED, but that'd didn't work for me either (as a standalone solution) and I'm not that familiar with awk, but it seems like it could work.

I'm still interested in whatever alternate solution you can provide, but as a workaround I added a sed statement to after the grep statement to chop off the un-needed data and this works for me.

condition=`echo $indicator | grep -o "<condition cid=\"$COND_NUM\">*.*<\/condition>" | sed "s/<\/condition>.*//g"`

macker-

Have you tried using -m to match just the first occurrence?

You could combine this in a bash script, with a for loop, to increment $i and loop thru the matches, assigning each to a corresponding numbered variable.

ASKER CERTIFIED SOLUTION

jrram

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Hugh Fraser

Sorry for the delay, jrram. The solution you posted is classic Unix shell stuff, and I can't find a way to do better in shell code.