Regular Expressions

A regular expression ("regex") is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. Regular expression processors are found in several search engines, search and replace dialogs of several word processors and text editors, and in the command lines of text processing utilities, such as sed and AWK. Many programming languages provide regular expression capabilities, some built-in, for example Perl, JavaScript, Ruby, AWK, and Tcl, and others via a standard library, for example .NET languages, Java, Python and C++ (since C++11). Most other languages offer regular expressions via a library.

Share tech news, updates, or what's on your mind.

Sign up to Post

Blocking a period at the end of a whitelisted URL?

I have a RegEx which is partly working to help verify a white listed list of domains, with '|' as delimiter, is working

              string regEx3 = @"https?://(" + whitelist + ")(.*)\\?(goto|returnurl)=https?://(" + whitelist + ")";

But I need to BLOCK a domain which uses my white listed domain as a sub-domain.


sso.mydomain.com is okay, but the following is not
sso.mydomain.com.badURL.com


I need to make the RegEx fail when it discovers a period after the white listed domain.

How?

Thanks
0
Free Tool: IP Lookup
LVL 12
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

I am having trouble understanding this RegEx:

            string redirectRegex = @"(.*)?\?(.*)?(" + redirectparams + "){1}\\=(.*?[^&]+)?&?(.*)?";

Please break out each element and explain what that element does.

Thanks
0
RegEx: Allowing for an optional path in the URL

The underlined portion MAY contain chartacters and MAY NOT.

            string regEx = @"^https?://(" + Regex.Escape(redirectWhitelist) + ")_______\\?(goto|returnurl)=https?://(" + Regex.Escape(redirectWhitelist) + ")";

This RegEx requires a certain URL, but I do not want to require that:

            string regEx = @"^https?://(" + Regex.Escape(redirectWhitelist) + ")/openup/UI/Login\\?(goto|returnurl)=https?://(" + Regex.Escape(redirectWhitelist) + ")";

What can I place into the RegEx in place of  "/openup/UI/Login"

So that anything is acceptable, including nothing?

Thanks.
0
Hi There, how to write a regex expression to identify string match:
for ex: in my entire workbook it should highlight the summary column,  if it contains any string length greater than 6 digits or 8 digit. for ex: if summary column has 6 digit number it should highlight that row, if it has only 8 digit even then it should be highlighted, if it has both 6 and 8 digit it should get highlighted:

attached is the sample
Book1.xlsx
0

Making a ReturnURL optional


I need to assert that the URL has one of the following:
  • a goto
  • a returnurl
  • no return url or goto

How do I change this RegEx

            string regEx = @"https?://(" + redirectWhitelist + ")/\\?(goto|returnurl)=https?://(" + redirectWhitelist + ")";

to permit making the following portion to be optional?
                                                                                                     \\?(goto|returnurl)=https?://(" + redirectWhitelist + ")";

Thanks.
0
I need to block all attempts at URL Hijacking. Please review my RegEx and my approach...

I will persist the whitelist in a config file.

            sampleRedirectUrl = "https://sso.mydomain.org/?goto=http://mydomain.org3a80/myhome/";

            redirectWhitelist = "mydomain.org|sso.mydomain.org";


            string regEx = @"https?://(" + redirectWhitelist + ")/\\?(goto|returnurl)=https?://(" + redirectWhitelist + ")";

           bool isMatch = Regex.IsMatch(sampleRedirectUrl, regEx);


I verify that both the base URL and the RedirectURL are in the white list.

Does this block all attempts at URL Hijacking?

I also worry that if key off of "?goto=" (since that is the URL that is coming back to me while debugging in Visual Studio) I would reject the standard name:
"returnurl"

I think I need my RegEx to allow either "goto" or "returnurl". Is my use of the OR symbol correct to force "goto" or "returnurl"? Is there ever a worry about failing ReturnURL due to case?



Thanks
0
I have the following regular expression:
(?<=def\s+\w+[\(])(?<arg>[^\),]+)

Open in new window

It does a fine job of capturing the first parameter in a Python function declaration like this:
def GetStatus(Param1, Param2, Param3)

Open in new window

How do I alter the expression so that it can capture each argument? Bear in mind that this needs to work for any Python function declaration so it may have any number of arguments.
0
My RegExp object is allowing characters, that I think/feel it should not be. I have a function that looks like the one below. When I enter a string such as:

 .,dhgf45,/\_-!@#$%^&*()+:";'

based on my new knowledge (I'm still learning regular expression validation syntax), it would seem/appear that the last two characters in the string above, ; and ' would cause the string to be invalid and there for fall into the if and set args.IsValid equal to false. Or, stated another way, since the characters ; and ' are not in the list of valid characters, shouldn't it fall through and set args.IsValid equal to false? Please advise?




function ValidateSearchTxt(source, args) {
                var strSrchTxt = new RegExp(/[a-zA-Z0-9.,/\\_\-!@#\$%\^&\*()\+:"]/);

                if (!strSrchTxt.test(val)){
                    args.IsValid = false                
                }
}

Open in new window

0
I'm learning about regular expressions using jQuery and the RegExp  object using its constructor function. I've been given a set of special characters that will be allowed in my search box and, aside from the letters and numbers, have been incrementally adding each special character and then testing my jQuery function (see below). When I add the '!' character, and enter a search string in my text box that contains the '!' character, for some reason, the function gets by passed all together, and based on the rules that I've read, (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-negated-character-set), I don't understand why. Can someone help me to understand what is happening?

<script type="text/javascript">

            function ValidateSearchTxt(source, args) {
                args.IsValid = true;
                var val = $('#txtSearchText').val();
                val.trim();
                var selectAllClinics = $("#ddlClinics").get(0).selectedIndex;

                var strSrchTxt = new RegExp(/[a-zA-Z0-9.,/\\_-!]/); <--WHEN I ADDED THE '!' THE FUNCTION IS COMPLETELY BY PASSED. WHY DOES THIS HAPPEN?

                if (!strSrchTxt.test(val)){
                    args.IsValid = false                
                }

                //for (var i = 0, i < val.length; i++){
                //    args.IsValid = false
                //}

                if (val == "Enter Name or Account #") {
                    

Open in new window

0
I have the following Regular Expression:
(?:^|\s|[\(\)])(and|as|assert|break|class|continue|def|del|elif|else|except|False|finally|for|from|global|if|import|in|is|lambda|None|nonlocal|not|or|pass|raise|return|True|try|while|with|yield)(?:$|\s|[\(\)])

Open in new window

It is used for finding keywords in a Python script. Unfortunately, it also finds them inside quoted strings and after comment characters. (In Python the # character is the start of a comment and nothing after that character should be matched UNLESS that character is inside quotes in which case it is treated as a literal)

What do I need to do to this Regex to force it to not match if there is a non-quoted # character anywhere on the line before the keyword? Also, what do I have to do to make sure the keywords are ignored if there are enclosed in quotes?

In the following example:
# The following is used for iteration
  for row in table.Rows
    myVariable = "What is this text for anyway?"
There should be no matches for the first line since it is preceded with a '#' character
the second line should match the words "for" and "in" since they are keywords not considered part of a comment or a quoted string
There should be no matches for the third line since the keywords "is" and "for" are already enclosed in quotes
0
Free Tool: SSL Checker
LVL 12
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Techies, Can someone isolate where I'm dropping the ball on getting the regex matches I'm expecting in this dataflow? My goal is to move the matched versions over to 1 kafka topic and  the unmatched over to another kafka topic.  Attached is the client.csv test file


Here's what the data flow looks like--with the regex used in the ExtractText config. NiFi uses Java's version of regular expressions.  

ExtractFileProcessorNiFiwithRegexclient.csv
0
Techies,
 I want to transform these 2 sample lines in a tab delimited file:

77785027532971      02/05/2017      G7FD20D37B77
77785027533003      02/06/2017      G74420D29220

into 3 separate variable values (@EmpId, @StartDate, @Cube) which will later be written to a database. The end values will need to to look like this:

77785027532971
2017-02-05
G7FD20D37B77

How would I transform this using a regular expression (java pattern) ?
0
What changes do I have to make to modify this to match testcreditcard. and creditcard.
if (currHost.match(/(accountopening\.|creditcard\.|web\.oflows\.us|blog\.)/i)) { }

Open in new window

I thought it should it be something like the below but that seems to allow anything infront of creditcard :
if (currHost.match(/(accountopening\.|(test)?creditcard\.|web\.oflows\.us|blog\.)/i)) { }

Open in new window

Thanks!
0
Regex matching _app_ and _app[X]_

input files:

898989_app_p99.pdf
353535X_appN_p99.pdf
575779X_appX_p99.pdf
524244X_appK_p99.pdf
0
Hi,

How to write regex expression for below string to get the final output as

"d1a2227e-291f-4d82-8991-b9458b4ad0d3","fb48e632-3c85-483b-86b3-76f7a1c7eb25","93fa5301-29e2-4ef6-9353-686868686888"



input string= @Check:"workspace://SpacesStore/d1a2227e-291f-4d82-8991-b9458b4ad0d3" OR @Check:"workspace://SpacesStore/fb48e632-3c85-483b-86b3-76f7a1c7eb25" OR @Check:"workspace://SpacesStore/93fa5301-29e2-4ef6-9353-686868686888
0
Hi Experts,

I'm feeling perplexed with trying to understand regex :(

- SC-EU-T-XXX01
- SC-US-T-XXX01
- SC-EU-S-XXX01
- VW-US-S-XXX01

- VWUPXXX01
- 85020-VWUPXXX01
- SCEPXXX01

From the above, i always want to get the text in square brackets

- [SC-EU-T]-XXX01
- [SC-US-T]-XXX01
- [SC-EU-S]-XXX01
- [VW-US-S]-XXX01

- [VWUP]XXX01
- [85020-VWUP]XXX01
- [SCEP]XXX01
0
I'm using the createobject("vbscript.regexp") object in a VBA environment.  I'm getting an extra, unexpected, match that I'd like to eliminate or understand.

pattern: (.*?)($|(?:&#\d+;))
string: Now 15 & $42.0 the time;  for all good# &#34;men&#34; -2 pet the dog cat horse 1,234.56

I'm getting the expected matches (submatch tuples):
("Now 15 & $42.0 the time;  for all good# ", "&#34;")
("men", "&#34;")
(" -2 pet the dog cat horse 1,234.56", "")

as well as this unexpected match:
("", "")

Why is this happening and is there a better pattern that will eliminate this extra match?
0
I have an editor that is putting in many &nbsp;'s in the source code it is creating.  I have been unable to stop the insertion of the strings but I do need them removed.

Here is an example
<tr><td align="left" style="color: #6e6f74; font-family: Arial, Helvetica, sans-serif; font-size: 14px; padding: 0px 0px 0px 10px; text-align: left;"><a href="http://www.theherbsplace.com/onsale" style="color: #5583c7;" target="_blank"><img alt="Nature's Sunshine" src="http://image.exct.net/lib/ff2c1c757166/i/4/096bf034-0.jpg" style="border: 0px; display: block;" /></a></td><td align="right" style="color: #6e6f74; font-family: Arial, Helvetica, sans-serif; font-size: 14px; padding: 0px;" valign="middle">
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

Open in new window

0
I am using Adobe Bridge to batch rename 130 files in a folder. It can handle regular expressions, so I am looking for the correct expression to achieve the following:

Original filename formats:
127155aaaa_58_Road to Cottage Lake Black.jpg
001000_Cottage Map.jpg

Description of revised filename:
Starting from left of entire filename -- Retain first 3 numbers of original filename
Starting from the left of '.jpg' file extension -- retain all text and spaces to left, up to and including the first instance of "_"
Retain ".jpg" file extension

Desired filename after batch renaming:
127_Road to Cottage Lake Black.jpg
001_Cottage Map.jpg

I hope I've provided enough details.

Thanks,
Andrea
0
Get expert help—faster!
LVL 12
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

i use this code: (.+?@(yahoo|xplorer|link).(?:ca|net|com))

results are:
asd@yahoo.com
asdsad@xplorer.ca
adas@earhtlink.net
asdsfsd@linkin.com


as you can see here appear: earthlink.net and also linkin.com  etc i want only @link.com
only that words MATCH CASE and not other extensions
0
i know how to filter: (.+?@yahoo.(?:ca|fr))
yahoo.ca
yahoo.fr


but how to filter more domains at the same time:
yahoo.ca
yahoo.com
xplorer.ca
xplorer.com
jazzera.ya
0
I have the following regular expression:
^\s*<\s*script\s*name\s*=\s*["](?<name>[^"]*)["]\s* platform\s*=\s*["](?<platform>[^"]*)["]\s*(deferred\s*=\s*["](?<deferred>[^"]*)["])?>\s*$

Open in new window

It matches fine against the following:
<script name="FOO" platform="all">
<script name="FOO" platform="all" deferred="yes">
However, I need it to match regardless of the order of the parameters so it should also match:
<script name="FOO" deferred="yes" platform="all">
 -- and --
<script platform="all" deferred="yes" name="FOO">
How do I rewrite the above Regex so that it will match each name-value pair regardless of what order they appear?
0
Given  the following code segments:
  lookupEvents={
    "/responsive/lead-form.":"event19",
    "checking-offer":"event29",
    "checking-offer-confirmation":"event32"
  };
...
 __.map(lookupEvents,function(value,key) {
            if(pathname.indexOf(key)!==-1) {returnValue.push(value);}
}

Open in new window


Is there a way to use a regX or other approach, possibly _filter, so both event29 and event32 are not set for URLs containing "checking-offer-confirmation"? I only want to set event32 for  "checking-offer-confirmation". I'm new to Underscore so not quiet sure how to handle this particular situation?

Thanks!
0
Basically, I want to return the matching FIRST pattern of 1 or 2 digits (as the day), 1 or 2 (digits as the month), 4 digits as the year separated by forwarding slash characters '/' from a valid or malformed input.

given these hypothetical inputs:

01/01/2017
1/1/2017/01/03/2017
1/1/2017/01/03/2017/a/b/z/0000
a/1/2/b/c/9/1/g/8/99/1/34/9/99/2017/z
ab/cd/efgh

I would like the output to be:
01/01/2017
1/1/2017
1/1/2017
9/99/2017
FALSE
0
Hi all.

I have several occurence of similar text in a very long email:

*1116 1200 ABC_Content_124853_124855 1117 1500
ABC_Content_123456_ABC_124865_Sound 1117 1000 - Documentation - 75% to 84% and 85% to 99%*

The text can change much but I am able to get all the relevant matches using this regex:
(?s).*\s(\*\d+\s+\d+.*?\*)+.*

The problem is that I can have many different occurencies of such group to extract and I'd need to implement it in Python.

Python says that (?s) is not a valid RegEx...

Therefore I've tried:
print re.findall(r'.*\s(\*\d+\s+\d+.*?\*)+.*', my_very_long_text.replace('\n', ' ').replace('\r', ''))

But I only print the LAST match and not ALL the matches.

Can you kindly help?
0

Regular Expressions

A regular expression ("regex") is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. Regular expression processors are found in several search engines, search and replace dialogs of several word processors and text editors, and in the command lines of text processing utilities, such as sed and AWK. Many programming languages provide regular expression capabilities, some built-in, for example Perl, JavaScript, Ruby, AWK, and Tcl, and others via a standard library, for example .NET languages, Java, Python and C++ (since C++11). Most other languages offer regular expressions via a library.