Solved

regex  match one number (or character) but not more than one

Posted on 2015-02-07
26
41 Views
Last Modified: 2015-07-29
I am trying to write regex that will match a single digit preceded by a $ sign.  (example $5) . The reason I am doing this is to learn to write custom rules to catch spam.

for a specific test example,  I would like to match  something like "sign up now for only $5" in a subject of the email. The wording is irrelevant and obviously can change. I already know how to match subjects, so, I am not asking about spamassassin or spamassassin rule types. only regex matching.

In order to help do some tests, I created an example test file like this.

$
$5
$55
$555
$5555
5

I am using the one line perl 'script' listed below to do my tests:

 perl -ne 'if (/\&\d/) {print "$&\n";}' < testfile

my result/goal is to match the line that contains "$5".. and no other line

after testing, my regex matches each line containing a $ followed by a 5, no matter how many '5's are on the line, I have tried the following regexs

\d
[\d]
\d{1}

and similar, but nothing works, it will match each line containing a $ followed by a 5. I did a little research and found that since \d matches a single digit. The [] and {1} are unneeded.  After some thought and research, I think my initial thoughts of \d would only match the line that contains $5. however, \d matches $5 on each line containing a $ followed by a 5, no matter how many 5's are on the line.  Note: It doesn't match the line having only a $  or only a 5.  This makes sense after thinking about it.

But how can I match only the line containing $5  and no other line?

(I do not want to match $55 or $555 etc.  and per my initial subject example. the $5 can be surrounded by unknown words, characters etc. (as email subject lines can vary after all)


Thanks in advance.
0
Comment
Question by:camstutz
  • 11
  • 9
  • 3
  • +3
26 Comments
 
LVL 23

Expert Comment

by:NVIT
ID: 40595761
Does \b\d\b work?
0
 
LVL 12

Expert Comment

by:Jeff Darling
ID: 40595766
I'm assuming you only care about matches as long as it is one dollar sign followed by 1 digit.


(\$[0-9]{1}[^0-9])

Open in new window


http://regexr.com/3acf4
0
 
LVL 12

Expert Comment

by:Jeff Darling
ID: 40595771
You are correct, {1} is not needed.

(\$[0-9][^0-9])

Open in new window

0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40595821
Or, shorter: \$\d\D
Uppercase classes are negated.

HTH,
Dan
0
 

Author Comment

by:camstutz
ID: 40596029
First, thank you everyone for posting suggestions. NewVillageIT: I did try that, it didn't seem to work.   Jeff. I tried doing \$\d[^\d]  but that didn't seem to work. However, I didn't use the parenthesis (I think it is called a character class? if I remember right :) ..  Both Jeff and Dan, I will try your suggestions.
0
 

Author Comment

by:camstutz
ID: 40596038
Hello Everyone, here is the test results.

/(/(\&\d[^\d])/   returned nothing.

/\&\d\D/  returns nothing.

/\&[0-9][^0-9]/  returns nothing.
0
 

Author Comment

by:camstutz
ID: 40596041
oh and also...
/(\&[0-9][^0-9]) returned nothing
0
 

Author Comment

by:camstutz
ID: 40596046
I am wondering if it has to do with perl  vs using spamassassin searches?
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596082
A line of text containing exactly one $ sign and one digit, with nothing preceding or following
/^\$\d$/

If a non-digit can be allowed following the one digit
/^\$\d(?!\d)/
0
 

Author Comment

by:camstutz
ID: 40596128
Thanks ozo, however that won't work. In my opening post, I mentioned that the $5 is surrounded by random words that would make up an email subject. If i use your example, it will return nothing unless the email subject is *only* $5.
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596132
Anything allowed before  the $ sign, and a non-digit followed by other stuff allowed after the digit
/\$\d(?!\d)/
0
 

Author Comment

by:camstutz
ID: 40596182
Thanks ozo... that worked....   However, I owe an apology. I discovered a typo (user error) when trying the other  suggestions. I had a & where a $ was supposed to be. No wonder it was printing nothing. I went back and tried the word boundary, unless I did something wrong again, it printed three lines of 5's and stripped off the word boundary.

however, I do have a question, many of the other examples print the correct line, but also seems to add a blank line at the end. Can someone explain this to me or am I doing something wrong again?

This isn't my first regex, but i am still pretty new to them still.
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596224
A "\n" in your print statement will print a newline character.
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:camstutz
ID: 40596341
Ozo, please forgive my ignorance, but when you gave me this regex: /\$\d(?!\d)/

and when used this way:

perl -ne 'if (/\$\d(?!\d)/) {print "$&\n";}' < file2


then it doesn't print the second "blank line"
0
 
LVL 84

Expert Comment

by:ozo
ID: 40596349
What regex, used in what way, seems to add a blank line?
And what regex, used in what way, printed three lines of 5's and stripped off the word boundary?
0
 

Author Comment

by:camstutz
ID: 40596366
These print a second blank line. Please trust that I copied this exactly except for changing the username and host name.  the empty line is what the command output produced, not me hitting enter to separate the commands.

user@host:~ # perl -ne 'if (/\$\d\D/) {print "$&\n";}' < file2
$5

user@host:~ # perl -ne 'if (/\$\d[^\d]/) {print "$&\n";}' < file2
$5

user@host:~ # perl -ne 'if (/\$\d[^\d]/) {print "$&\n";}' < file2
$5

user@host:~ # perl -ne 'if (/\$\d\D/) {print "$&\n";}' < file2
$5

======================Just a line separation I added to this post for separation===========================
This is the word boundary:

user@host:~ # perl -ne 'if (/\b\d\b/) {print "$&\n";}' < file2
5
5
user@host:~ #

This is the second regex you mentioned ozo:

user@host~ # perl -ne 'if (/\$\d(?!\d)/) {print "$&\n";}' < file2
$5
user@host:~ #
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 40596375
if file2 contains
$5

then the \D in /\$\d\D/ will match the newline following the "5" and $& would print it, then the \n
/[^\d]/ is equivalent to /\D/

/\b\d\b/ matches only the digit, so $& would not contain whatever non-word characters that may surround it.
0
 
LVL 12

Expert Comment

by:Jeff Darling
ID: 40596838
This is why I like to use a group.

try this

perl -ne 'if (/(\$\d)\D/) {print "---\nYes \[$1"."]\n---\n";}else{print "---\nNo \[$1"."]\n---\n";}' < file2

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 40597096
Failing matches will not set $1, so }else{print "---\nNo \[$1"."]\n---\n" may not be very meaningful.
(also, the \[ seems unnecessary and the "." seems superfluous)

\D requires a non-digit following the digit.
With -n reads from a file, there will usually be a newline at the end of $_, but it is possible for it to be missing from the last line, in which case it may prevent the /(\$\d)\D/ from matching.
0
 

Author Comment

by:camstutz
ID: 40597213
ozo, while it seems that the \D matches a newline (from my tests)  ... I read it doesn't match white space. (tabs, space, etc.)  do you have a link that says exactly what it does match? I usually just see non digit... and in my preliminary studying and understanding, I was just thinking of ASCII characters. (A-Z or a-z) as an example.  But that is my limited experience with regex and thinking of this.
0
 

Author Comment

by:camstutz
ID: 40597227
I think I found it from a previous post on rexegg: http://www.rexegg.com/regex-quickstart.html

[\d\D]      One character that is a digit or a non-digit      [\d\D]+      Any characters, inc-
luding new lines, which the regular dot doesn't match
0
 
LVL 84

Expert Comment

by:ozo
ID: 40597353
\D matches any character that \d does not.  equivalent to [^\d]
0
 

Author Comment

by:camstutz
ID: 40597372
and not white space... (according to what I read) ... though I should try it for myself
0
 
LVL 84

Expert Comment

by:ozo
ID: 40597423
whitespace is not [0-9], so it will not match \d and it will match \D
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 40597748
and not white space... (according to what I read)
Perhaps if you post what you read? Every engine that I've used which defines \D defines it as anything not a digit, which includes whitespace.
0
 
LVL 84

Expert Comment

by:ozo
ID: 40597764
Perhaps you were thinking of \W, which matches non-whitespace characters (like [^\w])
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now