Solved

Need help with TCL's regexp and regsub

Posted on 2004-09-17
14
2,382 Views
Last Modified: 2013-12-26
I code tcl scripts for eggdrops on IRC.
The thing is I am quite good with tcl scripting such as string matching and most of the list, string functions and I see pre-written scripts and get ideas from them.

I am having troubles with regexp, regular expressions and regsub complex matching. I don't really know how to use regexp to match certain types of patterns. Could anyone give me examples or tutorials on how to match regular expressions such as specific numbers, words, letters in a pattern with a text or all numbers/words in a text while using wildcards as well.

--
awyeah
0
Comment
Question by:awwyeah
  • 5
  • 3
  • 2
  • +1
14 Comments
 

Author Comment

by:awwyeah
ID: 12082728
Also don't give me lame websites or stuff I have searched alot, plus the sucky tcl manual only shows the syntax and command usage not detailed examples.
0
 

Author Comment

by:awwyeah
ID: 12091556
I presume no one's good with TCL on this forum. :o|
0
 
LVL 24

Accepted Solution

by:
fridom earned 150 total points
ID: 12158111
Well have you checked http://www.tcl.tk/doc/howto/regexp81.html

A very good book about regular expressions is "Mastering Regular Expressions"

I don't know if you have found http://aspn.activestate.com/ASPN/Cookbook/Tcl/
which gives some good tips

You have to be a more specific on what kind of data you want to match, otherwise one just has to guess...

The base line is the better you know your data and the more restricted they are the easier is it to shape a good regular expression
Here's one example (removing words with a number in it from a line)

proc filter_string {string pattern} {
    foreach element [split $string] {
        if {! [regexp $pattern $element]} {
           lappend result $element
        }
    }
    return [join $result]
}

set line "A line with so3me words with and with0ut number2s"

 filter_string $line {\d}
gives:
A line with words with and
...

Regards
Friedrich




0
 

Assisted Solution

by:s_federici
s_federici earned 150 total points
ID: 12352605
Well, as fridom said, it is not easy just give a good number of good examples on how to handle regexps. I use them practically all days, and by reading the manual page you can see that there is a lot to say about them. Just to give a comprehensive list of examples is not easy. So, I'll start with a short list, let me know how they works for you. Just one note. In this examples, as you say that string matching is very well known to you, I won't go into details that are too similar to common string matching (e.g. glob matching patterns).

A) regexp

1. specific numbers

> set match "NO ANSWER"; regexp "123" "890 123 456" match; set match
123

the advantage with respect to string matching is that if you want to find a complete number (that is not a part of a longer number) you don't have to use "tricks" such as putting spaces around all numbers in your string to search in. Let me give an example:

> set match "NO ANSWER"; regexp "123" "890 1237 456" match; set match
123

that is, even if "123" is just part of the number "1237", you still find it as a match. A viable string match solution is the following:

> set match "NO ANSWER"; regexp " 123 " " 890 1237 456 " match; set match
NO ANSWER

> set match "NO ANSWER"; regexp {\m123\M} "890 123 456" match; set match
123
> set match "NO ANSWER"; regexp {\m890\M} "890 123 456" match; set match
890

That is, the escape sequences "\m" and "\M" will allow the surrounded pattern to match only at the beginning and/or the end of a whole word. Note that I replaced double quotes in the pattern with braces this time. Indeed, escapes would be replaced by the corresponding chars (i.e. "m" and "M") if inside double quotes. This doesn't happen with braces (i.e. there is no escape substitution before evaluation). I know this is just "plain tcl", but some less experienced reader could check this answer.


2. specific words

The situation is pretty similar to the match of specific numbers


3. More than a number/word

With regexp you can also find match for more than just one word/number. Here you are a few examples:

> set match "NO ANSWER"; regexp {\m(123|890)\M} "890 123 456" match; set match
890
> set match "NO ANSWER"; regexp {\m(123|456)\M} "890 123 456" match; set match
123

Here you can see that the "(...|...)" notation will match whichever of the two patterns (the one on the left or the one on the right of the "|" char) come first.


B) regsub

With regsub you can ask to replace occurrences of a given pattern


4. All occurrences

By using just regsub you always replace the first occurrence of a pattern in a string
> regsub {\m(123|456)\M} "890 123 456" "xxx"
890 xxx 456

But with regsub you can also specify that you want to replace ALL occurrences of a pattern (whereas matching all occurrences doesn't make sense for regexp; with regexp you always -and only- match the first occurrence of the pattern in the string)

> regsub -all {\m(123|456)\M} "890 123 456" "xxx"
890 xxx xxx


5. Wildcards

Wildcards are more then in string matching. With regexp and regsub you have the following:


i) "." matches whichever char. Similar to "?" of string matching

> set match "NO ANSWER"; regexp "a.c" "abc"; set match
abc


ii) "*" matches whatever number (0 or more) of occurrences of the previous char

> set match "NO ANSWER"; regexp "a.*c" "abbbbbbc" match; set match
abbbbbbc
> set match "NO ANSWER"; regexp "a.*c" "ac" match; set match
ac


iii) "+" matches whatever number (1 or more) of occurrences of the previous char

> set match "NO ANSWER"; regexp "a.+c" "abbbbbbc" match; set match
abbbbbbc

BUT

> set match "NO ANSWER"; regexp "a.+c" "ac" match; set match
NO ANSWER


iv) "?" matches 1 or no occurrences of the previous char

> set match "NO ANSWER"; regexp "a.?c" "abc" match; set match
abc
> set match "NO ANSWER"; regexp "a.?c" "ac" match; set match
ac


Ok, I guess there is a lot more to say (classes, atoms, not-greedy quantifiers, etc). Let me know if it did help you
0
 

Expert Comment

by:s_federici
ID: 12352628
Last note. By "(e.g. glob matching patterns)" I just meant bracketed expressions. Sorry for the possible misunderstanding.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Expert Comment

by:s_federici
ID: 14362660
Well, I have given a few general examples about the patterns he mentioned, but having received no answer from the person who asked the question didn't help to give him exactly what he wanted.
0
 
LVL 24

Expert Comment

by:fridom
ID: 14362861
s_fedrici is right, it's annoying to get asked, giving some anser to often very vague questions and after that you do not hear anything again. So I vote for sharing points between me and s_federicy or just taking away the points from the original poster.

Friedrich
0
 

Expert Comment

by:s_federici
ID: 14366935
I agree with fridom, both alternatives are ok to me.
0
 
LVL 20

Expert Comment

by:Venabili
ID: 14368554
Points refund is NOT an option at all.. :) So I need to know if there is something valuable here or we should go for delete.
0
 
LVL 24

Expert Comment

by:fridom
ID: 14369038
Of course there is in both posting from s_federic and me. But no feed-back. The OP does not care to say anything.

Friedrich
0
 

Expert Comment

by:s_federici
ID: 14369082
yes, both answers address the subject of the question. They can be of help for anyone looking for help on this subject.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Introduction: Hints for the grid button.  Nested classes, templated collections.  Squash that darned bug! Continuing from the sixth article about sudoku.   Open the project in visual studio. First we will finish with the SUD_SETVALUE messa…
Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now