Solved

regex  match one number (or character) but not more than one

Posted on 2015-02-07
26
46 Views
Last Modified: 2015-07-29
I am trying to write regex that will match a single digit preceded by a $ sign.  (example $5) . The reason I am doing this is to learn to write custom rules to catch spam.

for a specific test example,  I would like to match  something like "sign up now for only $5" in a subject of the email. The wording is irrelevant and obviously can change. I already know how to match subjects, so, I am not asking about spamassassin or spamassassin rule types. only regex matching.

In order to help do some tests, I created an example test file like this.

$
$5
$55
$555
$5555
5

I am using the one line perl 'script' listed below to do my tests:

 perl -ne 'if (/\&\d/) {print "$&\n";}' < testfile

my result/goal is to match the line that contains "$5".. and no other line

after testing, my regex matches each line containing a $ followed by a 5, no matter how many '5's are on the line, I have tried the following regexs

\d
[\d]
\d{1}

and similar, but nothing works, it will match each line containing a $ followed by a 5. I did a little research and found that since \d matches a single digit. The [] and {1} are unneeded.  After some thought and research, I think my initial thoughts of \d would only match the line that contains $5. however, \d matches $5 on each line containing a $ followed by a 5, no matter how many 5's are on the line.  Note: It doesn't match the line having only a $  or only a 5.  This makes sense after thinking about it.

But how can I match only the line containing $5  and no other line?

(I do not want to match $55 or $555 etc.  and per my initial subject example. the $5 can be surrounded by unknown words, characters etc. (as email subject lines can vary after all)


Thanks in advance.
0
Comment
Question by:camstutz
  • 11
  • 9
  • 3
  • +3
26 Comments
 
LVL 24

Expert Comment

by:NVIT
ID: 40595761
Does \b\d\b work?
0
 
LVL 12

Expert Comment

by:Jeff Darling
ID: 40595766
I'm assuming you only care about matches as long as it is one dollar sign followed by 1 digit.


(\$[0-9]{1}[^0-9])

Open in new window


http://regexr.com/3acf4
0
 
LVL 12

Expert Comment

by:Jeff Darling
ID: 40595771
You are correct, {1} is not needed.

(\$[0-9][^0-9])

Open in new window

0
Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40595821
Or, shorter: \$\d\D
Uppercase classes are negated.

HTH,
Dan
0
 

Author Comment

by:camstutz
ID: 40596029
First, thank you everyone for posting suggestions. NewVillageIT: I did try that, it didn't seem to work.   Jeff. I tried doing \$\d[^\d]  but that didn't seem to work. However, I didn't use the parenthesis (I think it is called a character class? if I remember right :) ..  Both Jeff and Dan, I will try your suggestions.
0
 

Author Comment

by:camstutz
ID: 40596038
Hello Everyone, here is the test results.

/(/(\&\d[^\d])/   returned nothing.

/\&\d\D/  returns nothing.

/\&[0-9][^0-9]/  returns nothing.
0
 

Author Comment

by:camstutz
ID: 40596041
oh and also...
/(\&[0-9][^0-9]) returned nothing
0
 

Author Comment

by:camstutz
ID: 40596046
I am wondering if it has to do with perl  vs using spamassassin searches?
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596082
A line of text containing exactly one $ sign and one digit, with nothing preceding or following
/^\$\d$/

If a non-digit can be allowed following the one digit
/^\$\d(?!\d)/
0
 

Author Comment

by:camstutz
ID: 40596128
Thanks ozo, however that won't work. In my opening post, I mentioned that the $5 is surrounded by random words that would make up an email subject. If i use your example, it will return nothing unless the email subject is *only* $5.
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596132
Anything allowed before  the $ sign, and a non-digit followed by other stuff allowed after the digit
/\$\d(?!\d)/
0
 

Author Comment

by:camstutz
ID: 40596182
Thanks ozo... that worked....   However, I owe an apology. I discovered a typo (user error) when trying the other  suggestions. I had a & where a $ was supposed to be. No wonder it was printing nothing. I went back and tried the word boundary, unless I did something wrong again, it printed three lines of 5's and stripped off the word boundary.

however, I do have a question, many of the other examples print the correct line, but also seems to add a blank line at the end. Can someone explain this to me or am I doing something wrong again?

This isn't my first regex, but i am still pretty new to them still.
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596224
A "\n" in your print statement will print a newline character.
0
 

Author Comment

by:camstutz
ID: 40596341
Ozo, please forgive my ignorance, but when you gave me this regex: /\$\d(?!\d)/

and when used this way:

perl -ne 'if (/\$\d(?!\d)/) {print "$&\n";}' < file2


then it doesn't print the second "blank line"
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596349
What regex, used in what way, seems to add a blank line?
And what regex, used in what way, printed three lines of 5's and stripped off the word boundary?
0
 

Author Comment

by:camstutz
ID: 40596366
These print a second blank line. Please trust that I copied this exactly except for changing the username and host name.  the empty line is what the command output produced, not me hitting enter to separate the commands.

user@host:~ # perl -ne 'if (/\$\d\D/) {print "$&\n";}' < file2
$5

user@host:~ # perl -ne 'if (/\$\d[^\d]/) {print "$&\n";}' < file2
$5

user@host:~ # perl -ne 'if (/\$\d[^\d]/) {print "$&\n";}' < file2
$5

user@host:~ # perl -ne 'if (/\$\d\D/) {print "$&\n";}' < file2
$5

======================Just a line separation I added to this post for separation===========================
This is the word boundary:

user@host:~ # perl -ne 'if (/\b\d\b/) {print "$&\n";}' < file2
5
5
user@host:~ #

This is the second regex you mentioned ozo:

user@host~ # perl -ne 'if (/\$\d(?!\d)/) {print "$&\n";}' < file2
$5
user@host:~ #
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 40596375
if file2 contains
$5

then the \D in /\$\d\D/ will match the newline following the "5" and $& would print it, then the \n
/[^\d]/ is equivalent to /\D/

/\b\d\b/ matches only the digit, so $& would not contain whatever non-word characters that may surround it.
0
 
LVL 12

Expert Comment

by:Jeff Darling
ID: 40596838
This is why I like to use a group.

try this

perl -ne 'if (/(\$\d)\D/) {print "---\nYes \[$1"."]\n---\n";}else{print "---\nNo \[$1"."]\n---\n";}' < file2

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 40597096
Failing matches will not set $1, so }else{print "---\nNo \[$1"."]\n---\n" may not be very meaningful.
(also, the \[ seems unnecessary and the "." seems superfluous)

\D requires a non-digit following the digit.
With -n reads from a file, there will usually be a newline at the end of $_, but it is possible for it to be missing from the last line, in which case it may prevent the /(\$\d)\D/ from matching.
0
 

Author Comment

by:camstutz
ID: 40597213
ozo, while it seems that the \D matches a newline (from my tests)  ... I read it doesn't match white space. (tabs, space, etc.)  do you have a link that says exactly what it does match? I usually just see non digit... and in my preliminary studying and understanding, I was just thinking of ASCII characters. (A-Z or a-z) as an example.  But that is my limited experience with regex and thinking of this.
0
 

Author Comment

by:camstutz
ID: 40597227
I think I found it from a previous post on rexegg: http://www.rexegg.com/regex-quickstart.html

[\d\D]      One character that is a digit or a non-digit      [\d\D]+      Any characters, inc-
luding new lines, which the regular dot doesn't match
0
 
LVL 84

Expert Comment

by:ozo
ID: 40597353
\D matches any character that \d does not.  equivalent to [^\d]
0
 

Author Comment

by:camstutz
ID: 40597372
and not white space... (according to what I read) ... though I should try it for myself
0
 
LVL 84

Expert Comment

by:ozo
ID: 40597423
whitespace is not [0-9], so it will not match \d and it will match \D
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40597748
and not white space... (according to what I read)
Perhaps if you post what you read? Every engine that I've used which defines \D defines it as anything not a digit, which includes whitespace.
0
 
LVL 84

Expert Comment

by:ozo
ID: 40597764
Perhaps you were thinking of \W, which matches non-whitespace characters (like [^\w])
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Little introduction about CP: CP is a command on linux that use to copy files and folder from one location to another location. Example usage of CP as follow: cp /myfoder /pathto/destination/folder/ cp abc.tar.gz /pathto/destination/folder/ab…
It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question