?
Solved

Regular expression to extract value from between greater than and less than operator on Sun Solaris OS/8

Posted on 2007-07-30
17
Medium Priority
?
5,297 Views
Last Modified: 2013-12-27
I am trying to pull out the value that is contained between the greater than and less than operator but I must use regular expression.
Example <value>

Example below I need to pull lines with <8>, <9> and <10> but my regular expression below does not work.
Regular Expression:
.,$ s/\(.*\)<\([8-9]|[10]\)>\(.*\)/changed \1   \2   \3/

line number <7>
line number <8>
line number <9>
line number <10>
line number <11>
0
Comment
Question by:rayskelton
  • 5
  • 5
  • 3
  • +2
17 Comments
 
LVL 22

Expert Comment

by:DarkoLord
ID: 19596081
What about something like this:

<(.*)>
0
 
LVL 63

Expert Comment

by:Zvonko
ID: 19596159
Like this:

.,$ s/<(\d+)>/changed \1 /


0
 
LVL 63

Expert Comment

by:Zvonko
ID: 19596167
And if it can be other chars and not only digits, then use the not-notation:

.,$ s/<([^>]+)>/changed \1 /

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 54

Expert Comment

by:b0lsc0tt
ID: 19597471
The comments above may be all you need.  I am actually a little confused by what you need from the expression and what the "text" is like.

I do have one thing to point out or ask.  It seems like you want to capture the number used in between the less than and greater than sign (e.g 8 from <8>).  If that is true then you shouldn't be escaping the signs in your expression.  For example instead of ...

<\([8-9]|[10]\)>

you would have ...

<([8-9]|[10])>

That will look for the signs and either 8, 9 or 10 in between.  The number will be placed in a group for you to use later.  Unless you expression engine or language requires the parentheses to be escaped the escape will cause a problem.

I hope that helps.  Let me know if you have a question about this.  If you need more help then please clarify what the "text" is like and what exactly you need.

bol
0
 
LVL 63

Expert Comment

by:Zvonko
ID: 19598533
bol, the expression [10] will NOT look for: 10
The reason is that [8-9] will also not look for this string: 8-9
The square braces group is a set where only ONE character from the set does match.


0
 
LVL 54

Expert Comment

by:b0lsc0tt
ID: 19601514
@Zvonko - THANKS!  I actually used it from the question and thought something looked off.  I just couldn't identify the problem when I posted my comment last night.  You are right about [10] not working for the number 10 because it would match 1 or 0.  However the hypen, even in square brackets, is treated as a special character.  I believe the Asker wanted to match the range of numbers from 8 to 9 (i.e. 8 or 9) and [8-9] would do that.  If the intent was to match those three characters literally and in that order then you are correct, the square brackets wouldn't work.  Thanks for your post to correct the error I overlooked.  I can't believe I missed that. :)

bol
0
 
LVL 63

Expert Comment

by:Zvonko
ID: 19601598
It seams I also misunderstood the question. Now that you say it I see what he is asking! :)

OK, my proposal to catch lines 8, 9 and then is this:

.,$ s/(.*)<([8-9]|10)>(.*)/changed \1   \2   \3/


And if the < > chars can stay in output, the it can be reduced to this:

.,$ s/(.*<([8-9]|10)>.*)/changed \1  /



0
 
LVL 63

Expert Comment

by:Zvonko
ID: 19601617
Of course the dash char is not necessary:

.,$ s/(.*)<([89]|10)>(.*)/changed \1   \2   \3/

0
 

Author Comment

by:rayskelton
ID: 19609766
Thanks for all the response and sorry for the delay on my response.

Basically what I need is lines that have single and double digits between the < and > sign.  So the two expressions work below:

   This expression:
         1,$ s/\(.*\)<[89]>\(.*\)/changed \1   \2   \3/
         Matches on
         line number <8>
         line number <9>
   This expression:
       1$ s/\(.*\)<[1][0]>\(.*\)/changed \1   \2   \3/
       Matches on
      line number <10>
     So I need [89] or [1][01]
     I cannot get all three lines out with one regular expression to match on
        line number <8>
        line number <9>
        line number <10>

0
 

Author Comment

by:rayskelton
ID: 19609785
Also, I am donig this in vi for testing purposes. Once running, I will implement in a C program with other expressions.
0
 
LVL 54

Expert Comment

by:b0lsc0tt
ID: 19610137
Did you try Zvonko's suggestion in http:#19601617.  That will look for 8, 9 or 10.

You are using square brackets too much and wrong.  For example you don't need to ever use them if you only have one character inside (e.g. [1] should be just 1).  The comments above have some instructions on making an expression that could be useful to you.  Let us know if you have a question on what we said.  Also let us know the result of using Zvonko's expression.

bol
0
 

Author Comment

by:rayskelton
ID: 19616409
I'll try these options this morning.
0
 
LVL 17

Expert Comment

by:BogoJoker
ID: 19635204
I can only assume that you want this single regular expression to be expandable to take a list of numbers other then just 8, 9 and 10.

In that case you can literally put the list of numbers that you want in the regular expression!  Here is what mine looked like for 8, 9, and 10:

s/(.*?)<(8|9|10)>(.*?)/changed \1 \2 \3/

Notice that inside <( ... )> is just a pipe delimited list of numbers.  Its easy to create a such a string from a list (an array) of numbers.  Just join each element of the list with a |.  Here are examples in perl and ruby:

perl:
@arr = qw/ 11 2 3 /;
$str = join '|', @arr;
print $str, "\n"

ruby:
puts [11,2,3].join('|');

Each results in:
11|2|3

And each could easily be put into the regular expression in the specified spot.

- Joe P
0
 

Author Comment

by:rayskelton
ID: 19637783
This is still a problem I have not resolved and most of the regular expressions provided did not work. I am using in vi so I do not know if awk or ksh is looking at the expression terminator differently but I will look at the case from BogoJoker. Thanks to all for the feedback.
 
0
 
LVL 17

Expert Comment

by:BogoJoker
ID: 19642127
This is the regular expression I had to use in vi... I want to point out that vi is not very nice when it comes to regular expressions because you have to escape common characters like (, ), and |.


In VI go into command mode, and type the following (including the colon to get into that mode)...
:g/^\(.*\)<\(8\|9\|10\)>$/ s/^\(.*\)<\(8\|9\|10\)>$/changed \1 \2 \3/g


The first half identifies all lines (g = global) that match the regular expression:
g/^\(.*\)<\(8\|9\|10\)>$/

The second half then runs the s/// search and replace on that line... using the same regex:
s/^\(.*\)<\(8\|9\|10\)>$/changed \1 \2 \3/g

Maybe someone knows a better way for this to work but I couldn't seem to get the search and replace to work on more than one line unless I did the two commands.  Hope this helps... all of the above regular expressions work, just not in VI because regex in VI take extra measures....

- Joe P
0
 
LVL 17

Expert Comment

by:BogoJoker
ID: 19642152
Well... I looked at the questions again and I noticed your example uses a cool trick with ranges.  .,$!  So everything can be done like so:

:.,$ s/\(.*\)<\([8-9]\|10\)>\(.*\)/changed \1   \2   \3/

If your example you needed to escape the pipe => \|
And if you wanted 10 then you needed to remove the brackets => 10
And the thing with 8-9 has been mentioned above.

Good thinking,
Joe P
0
 

Accepted Solution

by:
rayskelton earned 0 total points
ID: 20659870
No provided solution works for ksh 88 on Sun Solaris. This is a dead issue.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
If you use Adobe Reader X it is possible you can't open OLE PDF documents in the standard. The reason is the 'save box mode' in adobe reader X. Many people think the protected Mode of adobe reader x is only to stop the write access. But this fe…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
Suggested Courses
Course of the Month15 days, 12 hours left to enroll

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question