Regular Expressions

A regular expression ("regex") is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. Regular expression processors are found in several search engines, search and replace dialogs of several word processors and text editors, and in the command lines of text processing utilities, such as sed and AWK. Many programming languages provide regular expression capabilities, some built-in, for example Perl, JavaScript, Ruby, AWK, and Tcl, and others via a standard library, for example .NET languages, Java, Python and C++ (since C++11). Most other languages offer regular expressions via a library.

Share tech news, updates, or what's on your mind.

Sign up to Post

I need an SQL regex to capture ICD10 code range from:  F1 thru F9.99, i was trying the following: t10."ICD-10 CODE" LIKE 'F[1-9]%' but it is not working out for me.
0
Get expert help—faster!
LVL 12
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

https://www.experts-exchange.com/questions/29106263/replacing-hypertext-links-with-simple-text-'URL'.html#a42605453

I need to remove all http starting from http:\/
I am able to remove all http starting from http://  and the regular expression for this is given in the above link.

url_rex = re.compile(r'(http|ftp|https)://[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?')

How can I modify this regular expression remove http starting from http:\/

Also, the below code deletes the characters @, # from the words starting from @ or # which I intend to keep.

import re, string, html
uni_escape = re.compile(r'\\u[0-9a-f]{4}')
url_rex = re.compile(r'(http|ftp|https)://[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?')
with open("input.txt", "r") as fin, open("output.txt", "w") as fout:
    for text in fin:
		unesc_html_text = html.unescape(text)
		encoded_text = bytes(unesc_html_text, 'ascii', errors='ignore')
		decoded_ascii_text = encoded_text.decode('ascii')
		unesc_text = re.sub(uni_escape, '', decoded_ascii_text)
		text = url_rex.sub('', unesc_text)
		fout.write(text)

Open in new window


How to change the regular expression to keep @ and # in words?
0
I need to replace a line in a configuration file and want to make sure it's done safely since it needs to be done on several systems.

The line in the file is:
*.emerg                                                 *

Open in new window

I want to replace the end of line * on only lines that start with *.emerg with :omusrmsg:* so it looks like this:
*.emerg                                                 :omusrmsg:*

Open in new window

This sed command seems to work, but was curious if there was a more full proof way to do it:
sed -i '/^\*.emerg/s/\*$/:omusrmsg:*/' config.conf

Open in new window

0
As far as I can tell, the Multiline property of regexp does not have any effect on anything.  Can anybody give me an example where it actually changes the behavior of regexp?

For instance, in the following program it does not matter.
Sub func2()
' extract 7 digit numbers for a string, and treat everything else as a deliminter
Dim regex As New regexp
Dim Mullti, Glog, s1, s2, s3, msg, match As Object
Debug.Print "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
For Each multi In Array(True, False)

Debug.Print "Multiline is " & multi

For Each Glob In Array(True, False)
regex.MultiLine = Mullti '  this two values do not affect the outcome of this test
regex.Global = Glob ' this two values do not affect the outcome of this test
s1 = " 1234567" & vbLf & "567 2072083 "  ' vbLf DOES NOT match to .*
s2 = " 1234567" & vbCr & "567 2072083 "  ' vbcr DOES match to .*  regardless of the multiline value.
s3 = " 1234567" & vbCrLf & "567 2072083 "  ' vbcr DOES match, but the lf DOES NOT match to .*
regex.pattern = "(\b|\D)\d{7,7}(?=\b|\D)" ' use [\s\S]+ instead of .+  and if you REALLY want to match anything
msg = ""
If Glob Then msg = msg & "Global "
If Not Glob Then msg = msg & "Not Global "

Set match = regex.Execute(s1) '
msg = match.count & "-" & msg
Set match = regex.Execute(s2) '
msg = match.count & "-" & msg
Set match = regex.Execute(s3) '
msg = match.count & "-" & msg
Debug.Print msg
'Stop
Next
Next
Debug.Print "///////////////////////////////"
End Sub

Open in new window

0
VBA regex supports look ahead, but not look behind.
As a result, I frequently use .submatches which makes my code wordy and harder to understand.

Has anybody ever figured out a way around this?

The following code demonstrates the problem. I desire to extract every integer from a string, and ignore everything else.

Dim regex As Object, match As Object
Set regex = CreateObject("vbscript.regexp")
regex.Global = True
regex.MultiLine = False

' This is displays exactly what I want, but it leads to a wordy msgbox syntax.

regex.Pattern = "(\b|\D)(\d{7})(?=\b|\D)"
Set match = regex.Execute("a1234567 1234 a2072083")

MsgBox match.Count & "'first=" & match(0).submatches(1) & "'last=" & match(match.Count - 1).submatches(1)


' I wish I could do it this way because the msgbox syntax is cleaner. Unfortunately, it does not work because VBA does not support lookbehind.

regex.Pattern = "(?<=\b|\D)(\d{7})(?=\b|\D)"
Set match = regex.Execute("a1234567a2072083")

MsgBox match.Count & "'first=" & match(0) & "'last=" & match(match.Count - 1)

Open in new window

1
In Excel, what is the fastest/most compatible RegEx and/or VBA code needed to download the third table from a website WITHOUT Query Tables?
0
Hi.  I am having some trouble to get the pattern for the following

Match criteria
1)   word begins with 4 alphabetical characters
2)  immediately followed by "PE"
3)  get the rest of the word

string to be searched  =  "This is the sample string abcdPEy2c4s  where I hope to get that weird word."


I would like the result to be abcdPEy2c4s
ps.. The string may be multiline.

Thank you.
0
What is the best way to pull the second table from a website without using Web Queries?

 RegEx and/or VBA seem promising, but am willing to explore other alternatives.
0
I'm trying to extract three groups from an html text (attached), when trying to run the expression on a regex debugger on a website it works. But when i try in my C# program, it doesn't. Any help would be appreciated.

String input:
"    <div class=\"download\">\n        <a href=\"Movie/שלושה-שלטים-מחוץ-לאבינג-מיזורי-_7685.html\" title=\"שלושה שלטים מחוץ לאבינג, מיזורי  / Three Billboards Outside Ebbing, Missouri\"><img src=\"images/movies/hX8EbtvY61p8.jpg\" alt=\"Three Billboards Outside Ebbing, Missouri\" /></a><!--\n        --><div class=\"content\">\n            <div class=\"title\">\n                <a href=\"Movie/שלושה-שלטים-מחוץ-לאבינג-מיזורי-_7685.html\" title=\"שלושה שלטים מחוץ לאבינג, מיזורי  / Three Billboards Outside Ebbing, Missouri\"><h4>CAPTURE1</h4><span> / </span><h4>CAPTURE2</h4></a>\n            </div>\n            <div class=\"text\">\n                <p>CAPTURE3</p>\n            </div>\n            <div class=\"bottom\">\n                <p>\n                שנת יציאה: <b>2017</b> | \n                תאריך העלאה: <b title=\"21-05-2018 15:08\">לפני 5 ימים</b> | \n                צפיות: <b>7824</b>\n                </p>\n            </div>\n        </div>\n    </div>\n    "


Regex pattern -  <h4>(.*?)</h4>.*?<h4>(.*?)</h4>.*?<p>(.*?)</p>
0
I'm reading a text field from a pdf form and this is a date field.
Unfortunately the format used was for example -  4/9/18 instead of the acceptable standard format 04/09/2018  (xx/xx/xxxx) and this is causing me a lot of issues.

What is the regular expression formula to correct this and may require a conditional formula because by the time it reaches October, the month will change to 2 digits  and like wise for day when it changes to 2 digits beginning with 10...
0
Get your problem seen by more experts
LVL 12
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

Hi E's,
I try to explode a string with preg_split like this:
<?
header('Content-type: text/plain; charset=utf-8');
$str = "this ? é ! a $ test =";
$arr = preg_split('/\\s|(\\W+)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r($arr);
?>

Open in new window

The out put is:
Array
(
    [0] => this
    [1] =>
    [2] => ? é !
    [3] => a
    [4] =>
    [5] => $
    [6] => test
    [7] =>
    [8] => =
    [9] =>
)

Have any way to get like this:
Array
(
    [0] => this
    [1] =>
    [2] => ?
    [3] =>
    [4] => é
    [5] =>
    [6] => !
    [7] =>
    [8] => a
    [9] =>
    [10] => $
    [11] =>
    [12] => test
    [13] =>
    [14] => =
    [15] =>
)

The objective is, separate text, punctuation and empty spaces (preserve words with accents and other punctuation, like é, ã, está etc).

The best regards,
JC
0
Hi experts,

I have this working fiddle:
https://jsfiddle.net/ynm498pv/4/

In this example I have a textbox.

I'm using regex to check the input.

Right now if you type the less than symbol < followed by any numbers [0-9] a space is placed between the < and the number that was typed like this:

R4.PNG
Notice that when I type < followed by letters like this <m it's not putting a space in between.

How do i revise my example to also put a space in between when I type < followed by a lowercase or upper case letter or symbol?
I'm using this regex for that <[^0-9]

The only time a space should not be put in between, is when I type less than symbol followed by the equals sign <= that shouldn't get a space in between.,

So sample texbox output would look like this. Notice how only time it didn't get a space in between after the < symbol is when I typed = after it.

< 4dsmmdfs < tyfsd < 9s < Twe <= fsddskj
0
Here is my fiddle
https://jsfiddle.net/r2d22k18/7gc5pwaj/2/

I'm trying to prevent typing of this </ or />

In the above fiddle when i type </ or />  the second character is deleted

Instead of deleting the second character , how do I just put a space in between.

So if I type this </ it will put a space in between and show this < /
So if I type this /> it will put a space in between and show this / >
0
Hi experts,

I have this example fiddle
https://jsfiddle.net/r2d22k18/oyhg1je0/1/

In this example I prevent users from typing the symbol > in a textbox.

How do i do the same thing, but using a regular expression?
0
Hi experts,

I'm using JavaScript and a regular expression to validate input in a TextArea.

Here is my working fiddle:
https://jsfiddle.net/r2d22k18/jj27pLbm/1/

p1.PNG
When a user types in /> inside the text area, they get the message "Sorry that input is invalid"

tc1_FSGT_Message1.PNG
When a user types in </ inside the text area, they get the message "Sorry that input is invalid"

tc2_LTFS_Message1.PNG
These two case are correct and validation message should show up. Any other input is valid.

But I noticed when I press the Enter key on my keyboard that is also failing the validation and causing the message "Sorry that input is invalid" to display.

tc3_EnterKey.PNG
How do I fix my example, so pressing the Enter Key on my board doesn't cause the validation to fail?
0
I have this example fiddle:
https://jsfiddle.net/h6vkt3jg/2/

In this example I'm using this regex pattern:

/<\/|\/>/

With this pattern only when you type </ together or /> together does validation pass and the button becomes enabled.

When you run it, it looks like this. The button is disabled.
 
Run1.PNG
So if you type the symbols separately that doesn’t pass.

Run2ErrorMessage.PNG


If I type the </ symbols together regardless whether there is text before or after it, validation passes.
 
Run3Pass1.PNG
If I type the /> symbols together regardless whether there is text before or after it, validation passes.


Run3Pass2.PNG

How do I revised my regex example to do the opposite?
I want to add Not to this regex pattern /<\/|\/>/

So in my example the validation should pass for everything except </ or this />

The tooltip should only popup if I type </ or this />
0
I'm using these 2 articles as reference.
https://autohotkey.com/docs/misc/RegEx-QuickRef.htm
https://code.tutsplus.com/tutorials/8-regular-expressions-you-should-know--net-6149

What is the regex pattern for the less than symbol proceded with a forward slash </
or
the forward slash less proceded with the greater than symbol />

Basically I want to allow user to type everything into the text box except when those 2 cases occur.

But if > or < is by itself then that's OK.
0
This is a followup to the example in this ticket.
https://www.experts-exchange.com/questions/29098002/checking-input-of-textbox.html?anchor=a42554823

I created the example in this fiddle.
https://jsfiddle.net/2edzjg0x/1/

In this fiddle example this is what it looks like when you run it.
Notice the button is disabled. The textbox gets validated to see if user typed a less than sign < or greater than sign >

on run

I'm using a regular expression to check for a less than sign < or greater than sign > in the textbox.

If user types this < then they get this message:

S2p2.PNG
If user types this > then they get this message:

  S2p4.PNG
If they type in regular text, validation passes and the button becomes enabled.

S2p3.PNG


In this example fiddle I'm using a regular expression , I'm learning to work with regular expresssions.

How do I revise my fiddle example to check for what the example in this ticket does:

https://www.experts-exchange.com/questions/29098002/checking-input-of-textbox.html?anchor=a42554823

That ticket example test this:

  $('#tester').keyup(function() {
    var disabled = $(this).val().indexOf('/>') > 0 || $(this).val().indexOf('</') > 0;
    $('#thebutton').prop({disabled: disabled});
  });

What is the equivalient of this:  var disabled = $(this).val().indexOf('/>') > 0 || $(this).val().indexOf('</') > 0;

Using a regular expression?
S2p4.PNG
0
I have a text field and want to implement regular expression to validate mm/dd format. I have a regex currently which works for this format but not able to validate against months like 29th February, 30th April etc.. How to modify the regex to include this month validation?

Currently regex I'm using is - (0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])
0
Cloud Class® Course: MCSA MCSE Windows Server 2012
LVL 12
Cloud Class® Course: MCSA MCSE Windows Server 2012

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

I am building a C# application and one of the requirements is that I find values that match the pattern below...

Starts with an AB or lower case ab
Has 6-9 numerical values after the aA
Ends with YZ or a lower case yz

I was expecting something like the following to work but I know it's not right because it isn't working
^[abAB][0-9][yzYZ]$

An example of the value I am trying to match is ab665897yz or ab6658975yz

Any help is appreciated!
0
DB = Postgress

I want to select only values that are prefixed with  "d_mc". example of field values are  "d_mc###", "d_mm###", "d_ma###"

How do I do that form an a Select statement ?  All of the following just seem to return everything.

select relname, substring(relname, '\d+$') as my_Number 
     from pg_stat_user_tables
     -- where substring(relname, 'd_mc\d+$')
     -- where relname like "d_mc\D%"
    where relname like 'd_mc[0-9]+'

Open in new window


Thanks
0
I'm trying to set up a 301 redirect in my htaccess page where a certain subfolder and ANYTHING after it will redirect to a specific page.

For example:
I want http://website.com/service-guide/ or http://website.com/service-guide/folder1/  or http://website.com/service-guide/folder1/folder2 to redirect to http://website.com/static-page.

I have been trying to make this work with RegEx but can't figure it out. I'm totally new to RegEx so any help would be greatly appreciated.
0
I have a problem with regex in oracle.
I am trying to detect the pattern in the text below
Suspect(s) detected by OFAC-Agent:25 
SystemId: 
Associate: 
=============================
Suspect detected #1

OFAC ID:AS06762733
MATCH: 0.00
TAG: NAM
MATCHINGTEXT: Masked,**,Masked, 
RESULT: (0)

BATCH: 2018/03/24_0001_IN_ONDM_PRI3_2
NAME: Masked,Masked
  Synonyms: none
ADDRESS: Masked
  Synonyms: none
CITY: Masked
  Synonyms: none
COUNTRY: Masked
  Synonyms:
   - REPUBLIC OF Masked
   - Masked
   - Masked
   - Masked
   - Masked
STATE: Masked
  Synonyms: none
ORIGIN: 
EDA
DESIGNATION: 
GWL
TYPE: 
I
SEARCH CODES: 
none
USER DATA 1: 
none
USER DATA 2: 
none
OFFICIAL REF: 
2017-09-05 20:28:59 EDA
PASSPORT: 
none
BIC CODES: 
none
NATID: 
none
PLACE OF BIRTH: 
none
DATE OF BIRTH: 
none
NATIONALITY: 
none
ADDITIONAL INFOS: 
List ID: 1106 / Create Date: 09/05/2017 20:28:59 / Last Update Date: 09/05/2017 20:28:59 / Org_PID: 8388550 / Title: ARRESTED FOR BURGLARY - AUGUST, 2017 / Gender: MALE / OtherInformation: NickName: Masked; According to the timesofindia.indiatimes.com; August 25, 2017: In August, 2017, Masked was arrested for burglary. blah blha  blah blah blah blah. They / Relationship: Co-Defendant / OriginalID: 8388619
FML TYPE: 
1
FML PRIORITY: 
0
FML CONFIDENTIALITY: 
0
FML INFO: 
none
PEP-FEP: 
0 0
KEYWORDS: 
OS:ADVERSE_MEDIA NS:NAMESOURCE_WEBSITE ENTITYLEVEL:LEVEL_NA SC:ORGANIZED_CRIME
HYPERLINKS: 

Open in new window

0
I need a regular expression to find something in a string so that it can be removed.

I need to find when there is something like V100, v1, V1.0, or v10.0123, so the letter 'v' upper or lower case, followed by any number of numbers that could possibly have a '.' in it, but it being sandwiched between other characters should not find it.

Example:

words v10 - it should find v10
wordsv1 - it should not find v1
v1words -it should not find v1

I'm new to regular expressions, so if you could please explain what each part of the expression means that would be very helpful thanks!
0
I need to use a regular expression to transform text.
This is the test
201844722350 3/21/2018 8:36:00 AM
I need the result to be the full date but I don't understand how to parse it out.
Does anyone have any suggestions ?
0

Regular Expressions

A regular expression ("regex") is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. Regular expression processors are found in several search engines, search and replace dialogs of several word processors and text editors, and in the command lines of text processing utilities, such as sed and AWK. Many programming languages provide regular expression capabilities, some built-in, for example Perl, JavaScript, Ruby, AWK, and Tcl, and others via a standard library, for example .NET languages, Java, Python and C++ (since C++11). Most other languages offer regular expressions via a library.

Top Experts In
Regular Expressions
<
Monthly
>