Link to home
Start Free TrialLog in
Avatar of rayskelton
rayskelton

asked on

Regular expression to extract value from between greater than and less than operator on Sun Solaris OS/8

I am trying to pull out the value that is contained between the greater than and less than operator but I must use regular expression.
Example <value>

Example below I need to pull lines with <8>, <9> and <10> but my regular expression below does not work.
Regular Expression:
.,$ s/\(.*\)<\([8-9]|[10]\)>\(.*\)/changed \1   \2   \3/

line number <7>
line number <8>
line number <9>
line number <10>
line number <11>
Avatar of DarkoLord
DarkoLord
Flag of Slovenia image

What about something like this:

<(.*)>
Avatar of Zvonko
Like this:

.,$ s/<(\d+)>/changed \1 /


And if it can be other chars and not only digits, then use the not-notation:

.,$ s/<([^>]+)>/changed \1 /

The comments above may be all you need.  I am actually a little confused by what you need from the expression and what the "text" is like.

I do have one thing to point out or ask.  It seems like you want to capture the number used in between the less than and greater than sign (e.g 8 from <8>).  If that is true then you shouldn't be escaping the signs in your expression.  For example instead of ...

<\([8-9]|[10]\)>

you would have ...

<([8-9]|[10])>

That will look for the signs and either 8, 9 or 10 in between.  The number will be placed in a group for you to use later.  Unless you expression engine or language requires the parentheses to be escaped the escape will cause a problem.

I hope that helps.  Let me know if you have a question about this.  If you need more help then please clarify what the "text" is like and what exactly you need.

bol
bol, the expression [10] will NOT look for: 10
The reason is that [8-9] will also not look for this string: 8-9
The square braces group is a set where only ONE character from the set does match.


@Zvonko - THANKS!  I actually used it from the question and thought something looked off.  I just couldn't identify the problem when I posted my comment last night.  You are right about [10] not working for the number 10 because it would match 1 or 0.  However the hypen, even in square brackets, is treated as a special character.  I believe the Asker wanted to match the range of numbers from 8 to 9 (i.e. 8 or 9) and [8-9] would do that.  If the intent was to match those three characters literally and in that order then you are correct, the square brackets wouldn't work.  Thanks for your post to correct the error I overlooked.  I can't believe I missed that. :)

bol
It seams I also misunderstood the question. Now that you say it I see what he is asking! :)

OK, my proposal to catch lines 8, 9 and then is this:

.,$ s/(.*)<([8-9]|10)>(.*)/changed \1   \2   \3/


And if the < > chars can stay in output, the it can be reduced to this:

.,$ s/(.*<([8-9]|10)>.*)/changed \1  /



Of course the dash char is not necessary:

.,$ s/(.*)<([89]|10)>(.*)/changed \1   \2   \3/

Avatar of rayskelton
rayskelton

ASKER

Thanks for all the response and sorry for the delay on my response.

Basically what I need is lines that have single and double digits between the < and > sign.  So the two expressions work below:

   This expression:
         1,$ s/\(.*\)<[89]>\(.*\)/changed \1   \2   \3/
         Matches on
         line number <8>
         line number <9>
   This expression:
       1$ s/\(.*\)<[1][0]>\(.*\)/changed \1   \2   \3/
       Matches on
      line number <10>
     So I need [89] or [1][01]
     I cannot get all three lines out with one regular expression to match on
        line number <8>
        line number <9>
        line number <10>

Also, I am donig this in vi for testing purposes. Once running, I will implement in a C program with other expressions.
Did you try Zvonko's suggestion in http:#19601617.  That will look for 8, 9 or 10.

You are using square brackets too much and wrong.  For example you don't need to ever use them if you only have one character inside (e.g. [1] should be just 1).  The comments above have some instructions on making an expression that could be useful to you.  Let us know if you have a question on what we said.  Also let us know the result of using Zvonko's expression.

bol
I'll try these options this morning.
I can only assume that you want this single regular expression to be expandable to take a list of numbers other then just 8, 9 and 10.

In that case you can literally put the list of numbers that you want in the regular expression!  Here is what mine looked like for 8, 9, and 10:

s/(.*?)<(8|9|10)>(.*?)/changed \1 \2 \3/

Notice that inside <( ... )> is just a pipe delimited list of numbers.  Its easy to create a such a string from a list (an array) of numbers.  Just join each element of the list with a |.  Here are examples in perl and ruby:

perl:
@arr = qw/ 11 2 3 /;
$str = join '|', @arr;
print $str, "\n"

ruby:
puts [11,2,3].join('|');

Each results in:
11|2|3

And each could easily be put into the regular expression in the specified spot.

- Joe P
This is still a problem I have not resolved and most of the regular expressions provided did not work. I am using in vi so I do not know if awk or ksh is looking at the expression terminator differently but I will look at the case from BogoJoker. Thanks to all for the feedback.
 
This is the regular expression I had to use in vi... I want to point out that vi is not very nice when it comes to regular expressions because you have to escape common characters like (, ), and |.


In VI go into command mode, and type the following (including the colon to get into that mode)...
:g/^\(.*\)<\(8\|9\|10\)>$/ s/^\(.*\)<\(8\|9\|10\)>$/changed \1 \2 \3/g


The first half identifies all lines (g = global) that match the regular expression:
g/^\(.*\)<\(8\|9\|10\)>$/

The second half then runs the s/// search and replace on that line... using the same regex:
s/^\(.*\)<\(8\|9\|10\)>$/changed \1 \2 \3/g

Maybe someone knows a better way for this to work but I couldn't seem to get the search and replace to work on more than one line unless I did the two commands.  Hope this helps... all of the above regular expressions work, just not in VI because regex in VI take extra measures....

- Joe P
Well... I looked at the questions again and I noticed your example uses a cool trick with ranges.  .,$!  So everything can be done like so:

:.,$ s/\(.*\)<\([8-9]\|10\)>\(.*\)/changed \1   \2   \3/

If your example you needed to escape the pipe => \|
And if you wanted 10 then you needed to remove the brackets => 10
And the thing with 8-9 has been mentioned above.

Good thinking,
Joe P
ASKER CERTIFIED SOLUTION
Avatar of rayskelton
rayskelton

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial