Help with a 'grep' statement

I need help with a grep statement.  Suppose I have the below XML code snippet stored in a variable called 'indicator'.  I am using the grep statement below in a loop to extract the conditions one at a time.

export condition=`echo $indicator | grep -o "<condition cid=\"\$COND_NUM\">*.*<\/condition>"`

On first pass of the loop, COND_NUM will equal 2 so I'm expecting to only get this condition, but everything gets returned.  I think the problem is b/c I am using the *.*<\/condition> in the grep statement and its recognizing the second <\/condition> at the end of the file instead of the first one it comes to.

How can I modify my grep statement to only get the first condition?
<condition cid="1">
        <description>TRN.MERCHANT_NAME1 = substr(VEN.SCRUB_NAME1,1,length(TRN.MERCHANT_NAME1)))</description>
 
        <change_sql>UPDATE AP_VENDOR SET NAME1='NAME1_6A', SCRUB_NAME1='VWXYZabcde' WHERE VENDOR_ID='VENID-6';</change_sql>                                
        <change_sql>UPDATE PCD_TRANSACTION SET MERCHANT_NAME1='VWXYZ', TRANSACTION_DATE=(SELECT INVOICE_DATE FROM AP_VOUCHER WHERE VOUCHER_ID='OSTBU-6') WHERE MERCHANT_ID='6';</change_sql>
        <change_verify_sql>SELECT COUNT(*) FROM PCD_TRANSACTION WHERE MERCHANT_NAME1='VWXYZ';</change_verify_sql>
        <change_verify_count>1</change_verify_count>
</condition>
<condition cid="2">
        <description>(VEN.SCRUB_NAME1 = substr(TRN.MERCHANT_NAME1,1,length(VEN.SCRUB_NAME1))</description>
 
        <change_sql>UPDATE AP_VENDOR SET NAME1='NAME1_5A', SCRUB_NAME1='ABCDE' WHERE VENDOR_ID='VENID-5';</change_sql>                                
        <change_sql>UPDATE PCD_TRANSACTION SET MERCHANT_NAME1='ABCDEjihgf', TRANSACTION_DATE=(SELECT INVOICE_DATE FROM AP_VOUCHER WHERE VOUCHER_ID='OSTBU-5') WHERE MERCHANT_ID='5';</change_sql>
        <change_verify_sql>SELECT COUNT(*) FROM PCD_TRANSACTION WHERE MERCHANT_NAME1='ABCDEjihgf';</change_verify_sql>
        <change_verify_count>1</change_verify_count>
</condition>

Open in new window

LVL 13
jrramAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Hugh FraserConsultantCommented:
Grep doesn't support multi-line patterns. Try this awk script as a starting point.

echo $indicator | awk 'BEGIN {x=0}
{
if ($0~"<condition cid=\"1\">") {x=1}
if (x==1) {print $0}
if ($0~"</condition>") {x=0}
}'

I'm not an expert awk programmer, so you may have to play with the substitution for cid="n", or just write a couple of scripts with different values.
0
jrramAuthor Commented:
The XML is stored in a variable, so I don't think (?) multi-line input.  My thinking is if it was multi-line input then it wouldn't not work when trying to extract the 2nd condition also.

I think the questions is how do I tell it to stop when it finds the first </condition>.
0
Hugh FraserConsultantCommented:
I see. So if this this appears as a single line, the following works.

echo $y
This is a test <condition cid="1">more stuff</condition>More junk

echo $x
1

echo $y | grep -o "<condition cid=\"$x\">*.*<\/condition>"
<condition cid="1">more stuff</condition>

More importantly, this looks suspiciously like your example except for the fact that your XML string prints out as multiple lines. Can you do an

echo $indicator

to see what it looks like.
0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

jrramAuthor Commented:
When I do an "echo $indicator | wc -l" it returns 1 so this confirms the input is only 1 line.

And yes, in the example test condition that you gave, the grep expression does work b/c you only have one </condition> in variable $y.  If you put a second one in there (see example) below, then run the grep statement, it returns too much.

Data Setup:

x="1"
y="<condition cid="1">test data 1</condition><condition cid="2">test data 2</condition>"

Problem Statement:
echo $y | grep -o "<condition cid=\"$x\">*.*<\/condition>"

=====

Expected Result:

<condition cid="1">test data 1</condition>

Actual Result:

<condition cid="1">test data 1</condition><condition cid="2">test data 2</condition>

Notes:

As it is, the grep statement correctly finds the <condition cid="1">, but I think because of the '*.*', it greedily ignores the first </condition> (expected stopping point) and includes everything up until the last </condition> value.

Does this make sense?  Know of any parameters or changes that can me made to grep statement?
0
Hugh FraserConsultantCommented:
It does make sense. The *.* should be .*? to make it non-greedy, but that doesn't seem to work either. Are you bound to a grep solution, or are you willing to use an alternative?
0
jrramAuthor Commented:
I'm open to using an alternative solution.  I chose grep b/c it seemed like a simple thing to do but doesn't appear that way anymore.  I also looked at SED, but that'd didn't work for me either (as a standalone solution) and I'm not that familiar with awk, but it seems like it could work.

I'm still interested in whatever alternate solution you can provide, but as a workaround I added a sed statement to after the grep statement to chop off the un-needed data and this works for me.

condition=`echo $indicator | grep -o "<condition cid=\"$COND_NUM\">*.*<\/condition>" | sed "s/<\/condition>.*//g"`

0
macker-Commented:
Have you tried using -m to match just the first occurrence?

You could combine this in a bash script, with a for loop, to increment $i and loop thru the matches, assigning each to a corresponding numbered variable.
0
jrramAuthor Commented:
macker,

I did try the -m option and it still brings back the entire thing.  The below code is which is a repeat of what I posted just before your first post is what worked for me.

If there are no objections, I'm going to request a points refund.
condition=`echo $indicator | grep -o "<condition cid=\"$COND_NUM\">*.*<\/condition>" | sed "s/<\/condition>.*//g"`

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Hugh FraserConsultantCommented:
Sorry for the delay, jrram. The solution you posted is classic Unix shell stuff, and I can't find a way to do better in shell code.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.