Solved

regex  match one number (or character) but not more than one

Posted on 2015-02-07
26
59 Views
Last Modified: 2015-07-29
I am trying to write regex that will match a single digit preceded by a $ sign.  (example $5) . The reason I am doing this is to learn to write custom rules to catch spam.

for a specific test example,  I would like to match  something like "sign up now for only $5" in a subject of the email. The wording is irrelevant and obviously can change. I already know how to match subjects, so, I am not asking about spamassassin or spamassassin rule types. only regex matching.

In order to help do some tests, I created an example test file like this.

$
$5
$55
$555
$5555
5

I am using the one line perl 'script' listed below to do my tests:

 perl -ne 'if (/\&\d/) {print "$&\n";}' < testfile

my result/goal is to match the line that contains "$5".. and no other line

after testing, my regex matches each line containing a $ followed by a 5, no matter how many '5's are on the line, I have tried the following regexs

\d
[\d]
\d{1}

and similar, but nothing works, it will match each line containing a $ followed by a 5. I did a little research and found that since \d matches a single digit. The [] and {1} are unneeded.  After some thought and research, I think my initial thoughts of \d would only match the line that contains $5. however, \d matches $5 on each line containing a $ followed by a 5, no matter how many 5's are on the line.  Note: It doesn't match the line having only a $  or only a 5.  This makes sense after thinking about it.

But how can I match only the line containing $5  and no other line?

(I do not want to match $55 or $555 etc.  and per my initial subject example. the $5 can be surrounded by unknown words, characters etc. (as email subject lines can vary after all)


Thanks in advance.
0
Comment
Question by:camstutz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 11
  • 9
  • 3
  • +3
26 Comments
 
LVL 25

Expert Comment

by:NVIT
ID: 40595761
Does \b\d\b work?
0
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 40595766
I'm assuming you only care about matches as long as it is one dollar sign followed by 1 digit.


(\$[0-9]{1}[^0-9])

Open in new window


http://regexr.com/3acf4
0
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 40595771
You are correct, {1} is not needed.

(\$[0-9][^0-9])

Open in new window

0
Docker-Compose to Simplify Multi-Container Builds

Our veteran DevOps Author takes you through how to build a multi-container environment, managed with a single utility in order to simplify your deployments.

 
LVL 35

Expert Comment

by:Dan Craciun
ID: 40595821
Or, shorter: \$\d\D
Uppercase classes are negated.

HTH,
Dan
0
 

Author Comment

by:camstutz
ID: 40596029
First, thank you everyone for posting suggestions. NewVillageIT: I did try that, it didn't seem to work.   Jeff. I tried doing \$\d[^\d]  but that didn't seem to work. However, I didn't use the parenthesis (I think it is called a character class? if I remember right :) ..  Both Jeff and Dan, I will try your suggestions.
0
 

Author Comment

by:camstutz
ID: 40596038
Hello Everyone, here is the test results.

/(/(\&\d[^\d])/   returned nothing.

/\&\d\D/  returns nothing.

/\&[0-9][^0-9]/  returns nothing.
0
 

Author Comment

by:camstutz
ID: 40596041
oh and also...
/(\&[0-9][^0-9]) returned nothing
0
 

Author Comment

by:camstutz
ID: 40596046
I am wondering if it has to do with perl  vs using spamassassin searches?
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596082
A line of text containing exactly one $ sign and one digit, with nothing preceding or following
/^\$\d$/

If a non-digit can be allowed following the one digit
/^\$\d(?!\d)/
0
 

Author Comment

by:camstutz
ID: 40596128
Thanks ozo, however that won't work. In my opening post, I mentioned that the $5 is surrounded by random words that would make up an email subject. If i use your example, it will return nothing unless the email subject is *only* $5.
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596132
Anything allowed before  the $ sign, and a non-digit followed by other stuff allowed after the digit
/\$\d(?!\d)/
0
 

Author Comment

by:camstutz
ID: 40596182
Thanks ozo... that worked....   However, I owe an apology. I discovered a typo (user error) when trying the other  suggestions. I had a & where a $ was supposed to be. No wonder it was printing nothing. I went back and tried the word boundary, unless I did something wrong again, it printed three lines of 5's and stripped off the word boundary.

however, I do have a question, many of the other examples print the correct line, but also seems to add a blank line at the end. Can someone explain this to me or am I doing something wrong again?

This isn't my first regex, but i am still pretty new to them still.
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596224
A "\n" in your print statement will print a newline character.
0
 

Author Comment

by:camstutz
ID: 40596341
Ozo, please forgive my ignorance, but when you gave me this regex: /\$\d(?!\d)/

and when used this way:

perl -ne 'if (/\$\d(?!\d)/) {print "$&\n";}' < file2


then it doesn't print the second "blank line"
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596349
What regex, used in what way, seems to add a blank line?
And what regex, used in what way, printed three lines of 5's and stripped off the word boundary?
0
 

Author Comment

by:camstutz
ID: 40596366
These print a second blank line. Please trust that I copied this exactly except for changing the username and host name.  the empty line is what the command output produced, not me hitting enter to separate the commands.

user@host:~ # perl -ne 'if (/\$\d\D/) {print "$&\n";}' < file2
$5

user@host:~ # perl -ne 'if (/\$\d[^\d]/) {print "$&\n";}' < file2
$5

user@host:~ # perl -ne 'if (/\$\d[^\d]/) {print "$&\n";}' < file2
$5

user@host:~ # perl -ne 'if (/\$\d\D/) {print "$&\n";}' < file2
$5

======================Just a line separation I added to this post for separation===========================
This is the word boundary:

user@host:~ # perl -ne 'if (/\b\d\b/) {print "$&\n";}' < file2
5
5
user@host:~ #

This is the second regex you mentioned ozo:

user@host~ # perl -ne 'if (/\$\d(?!\d)/) {print "$&\n";}' < file2
$5
user@host:~ #
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 40596375
if file2 contains
$5

then the \D in /\$\d\D/ will match the newline following the "5" and $& would print it, then the \n
/[^\d]/ is equivalent to /\D/

/\b\d\b/ matches only the digit, so $& would not contain whatever non-word characters that may surround it.
0
 
LVL 13

Expert Comment

by:Jeff Darling
ID: 40596838
This is why I like to use a group.

try this

perl -ne 'if (/(\$\d)\D/) {print "---\nYes \[$1"."]\n---\n";}else{print "---\nNo \[$1"."]\n---\n";}' < file2

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 40597096
Failing matches will not set $1, so }else{print "---\nNo \[$1"."]\n---\n" may not be very meaningful.
(also, the \[ seems unnecessary and the "." seems superfluous)

\D requires a non-digit following the digit.
With -n reads from a file, there will usually be a newline at the end of $_, but it is possible for it to be missing from the last line, in which case it may prevent the /(\$\d)\D/ from matching.
0
 

Author Comment

by:camstutz
ID: 40597213
ozo, while it seems that the \D matches a newline (from my tests)  ... I read it doesn't match white space. (tabs, space, etc.)  do you have a link that says exactly what it does match? I usually just see non digit... and in my preliminary studying and understanding, I was just thinking of ASCII characters. (A-Z or a-z) as an example.  But that is my limited experience with regex and thinking of this.
0
 

Author Comment

by:camstutz
ID: 40597227
I think I found it from a previous post on rexegg: http://www.rexegg.com/regex-quickstart.html

[\d\D]      One character that is a digit or a non-digit      [\d\D]+      Any characters, inc-
luding new lines, which the regular dot doesn't match
0
 
LVL 84

Expert Comment

by:ozo
ID: 40597353
\D matches any character that \d does not.  equivalent to [^\d]
0
 

Author Comment

by:camstutz
ID: 40597372
and not white space... (according to what I read) ... though I should try it for myself
0
 
LVL 84

Expert Comment

by:ozo
ID: 40597423
whitespace is not [0-9], so it will not match \d and it will match \D
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40597748
and not white space... (according to what I read)
Perhaps if you post what you read? Every engine that I've used which defines \D defines it as anything not a digit, which includes whitespace.
0
 
LVL 84

Expert Comment

by:ozo
ID: 40597764
Perhaps you were thinking of \W, which matches non-whitespace characters (like [^\w])
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Using libpcap/Jpcap to capture and send packets on Solaris version (10/11) Library used: 1.      Libpcap (http://www.tcpdump.org) Version 1.2 2.      Jpcap(http://netresearch.ics.uci.edu/kfujii/Jpcap/doc/index.html) Version 0.6 Prerequisite: 1.      GCC …
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

627 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question