jrram
asked on
Help with a 'grep' statement
I need help with a grep statement. Suppose I have the below XML code snippet stored in a variable called 'indicator'. I am using the grep statement below in a loop to extract the conditions one at a time.
export condition=`echo $indicator | grep -o "<condition cid=\"\$COND_NUM\">*.*<\/c ondition>" `
On first pass of the loop, COND_NUM will equal 2 so I'm expecting to only get this condition, but everything gets returned. I think the problem is b/c I am using the *.*<\/condition> in the grep statement and its recognizing the second <\/condition> at the end of the file instead of the first one it comes to.
How can I modify my grep statement to only get the first condition?
export condition=`echo $indicator | grep -o "<condition cid=\"\$COND_NUM\">*.*<\/c
On first pass of the loop, COND_NUM will equal 2 so I'm expecting to only get this condition, but everything gets returned. I think the problem is b/c I am using the *.*<\/condition> in the grep statement and its recognizing the second <\/condition> at the end of the file instead of the first one it comes to.
How can I modify my grep statement to only get the first condition?
<condition cid="1">
<description>TRN.MERCHANT_NAME1 = substr(VEN.SCRUB_NAME1,1,length(TRN.MERCHANT_NAME1)))</description>
<change_sql>UPDATE AP_VENDOR SET NAME1='NAME1_6A', SCRUB_NAME1='VWXYZabcde' WHERE VENDOR_ID='VENID-6';</change_sql>
<change_sql>UPDATE PCD_TRANSACTION SET MERCHANT_NAME1='VWXYZ', TRANSACTION_DATE=(SELECT INVOICE_DATE FROM AP_VOUCHER WHERE VOUCHER_ID='OSTBU-6') WHERE MERCHANT_ID='6';</change_sql>
<change_verify_sql>SELECT COUNT(*) FROM PCD_TRANSACTION WHERE MERCHANT_NAME1='VWXYZ';</change_verify_sql>
<change_verify_count>1</change_verify_count>
</condition>
<condition cid="2">
<description>(VEN.SCRUB_NAME1 = substr(TRN.MERCHANT_NAME1,1,length(VEN.SCRUB_NAME1))</description>
<change_sql>UPDATE AP_VENDOR SET NAME1='NAME1_5A', SCRUB_NAME1='ABCDE' WHERE VENDOR_ID='VENID-5';</change_sql>
<change_sql>UPDATE PCD_TRANSACTION SET MERCHANT_NAME1='ABCDEjihgf', TRANSACTION_DATE=(SELECT INVOICE_DATE FROM AP_VOUCHER WHERE VOUCHER_ID='OSTBU-5') WHERE MERCHANT_ID='5';</change_sql>
<change_verify_sql>SELECT COUNT(*) FROM PCD_TRANSACTION WHERE MERCHANT_NAME1='ABCDEjihgf';</change_verify_sql>
<change_verify_count>1</change_verify_count>
</condition>
ASKER
The XML is stored in a variable, so I don't think (?) multi-line input. My thinking is if it was multi-line input then it wouldn't not work when trying to extract the 2nd condition also.
I think the questions is how do I tell it to stop when it finds the first </condition>.
I think the questions is how do I tell it to stop when it finds the first </condition>.
I see. So if this this appears as a single line, the following works.
echo $y
This is a test <condition cid="1">more stuff</condition>More junk
echo $x
1
echo $y | grep -o "<condition cid=\"$x\">*.*<\/condition >"
<condition cid="1">more stuff</condition>
More importantly, this looks suspiciously like your example except for the fact that your XML string prints out as multiple lines. Can you do an
echo $indicator
to see what it looks like.
echo $y
This is a test <condition cid="1">more stuff</condition>More junk
echo $x
1
echo $y | grep -o "<condition cid=\"$x\">*.*<\/condition
<condition cid="1">more stuff</condition>
More importantly, this looks suspiciously like your example except for the fact that your XML string prints out as multiple lines. Can you do an
echo $indicator
to see what it looks like.
ASKER
When I do an "echo $indicator | wc -l" it returns 1 so this confirms the input is only 1 line.
And yes, in the example test condition that you gave, the grep expression does work b/c you only have one </condition> in variable $y. If you put a second one in there (see example) below, then run the grep statement, it returns too much.
Data Setup:
x="1"
y="<condition cid="1">test data 1</condition><condition cid="2">test data 2</condition>"
Problem Statement:
echo $y | grep -o "<condition cid=\"$x\">*.*<\/condition >"
=====
Expected Result:
<condition cid="1">test data 1</condition>
Actual Result:
<condition cid="1">test data 1</condition><condition cid="2">test data 2</condition>
Notes:
As it is, the grep statement correctly finds the <condition cid="1">, but I think because of the '*.*', it greedily ignores the first </condition> (expected stopping point) and includes everything up until the last </condition> value.
Does this make sense? Know of any parameters or changes that can me made to grep statement?
And yes, in the example test condition that you gave, the grep expression does work b/c you only have one </condition> in variable $y. If you put a second one in there (see example) below, then run the grep statement, it returns too much.
Data Setup:
x="1"
y="<condition cid="1">test data 1</condition><condition cid="2">test data 2</condition>"
Problem Statement:
echo $y | grep -o "<condition cid=\"$x\">*.*<\/condition
=====
Expected Result:
<condition cid="1">test data 1</condition>
Actual Result:
<condition cid="1">test data 1</condition><condition cid="2">test data 2</condition>
Notes:
As it is, the grep statement correctly finds the <condition cid="1">, but I think because of the '*.*', it greedily ignores the first </condition> (expected stopping point) and includes everything up until the last </condition> value.
Does this make sense? Know of any parameters or changes that can me made to grep statement?
It does make sense. The *.* should be .*? to make it non-greedy, but that doesn't seem to work either. Are you bound to a grep solution, or are you willing to use an alternative?
ASKER
I'm open to using an alternative solution. I chose grep b/c it seemed like a simple thing to do but doesn't appear that way anymore. I also looked at SED, but that'd didn't work for me either (as a standalone solution) and I'm not that familiar with awk, but it seems like it could work.
I'm still interested in whatever alternate solution you can provide, but as a workaround I added a sed statement to after the grep statement to chop off the un-needed data and this works for me.
condition=`echo $indicator | grep -o "<condition cid=\"$COND_NUM\">*.*<\/co ndition>" | sed "s/<\/condition>.*//g"`
I'm still interested in whatever alternate solution you can provide, but as a workaround I added a sed statement to after the grep statement to chop off the un-needed data and this works for me.
condition=`echo $indicator | grep -o "<condition cid=\"$COND_NUM\">*.*<\/co
Have you tried using -m to match just the first occurrence?
You could combine this in a bash script, with a for loop, to increment $i and loop thru the matches, assigning each to a corresponding numbered variable.
You could combine this in a bash script, with a for loop, to increment $i and loop thru the matches, assigning each to a corresponding numbered variable.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Sorry for the delay, jrram. The solution you posted is classic Unix shell stuff, and I can't find a way to do better in shell code.
echo $indicator | awk 'BEGIN {x=0}
{
if ($0~"<condition cid=\"1\">") {x=1}
if (x==1) {print $0}
if ($0~"</condition>") {x=0}
}'
I'm not an expert awk programmer, so you may have to play with the substitution for cid="n", or just write a couple of scripts with different values.