Regular Expressions

A regular expression ("regex") is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. Regular expression processors are found in several search engines, search and replace dialogs of several word processors and text editors, and in the command lines of text processing utilities, such as sed and AWK. Many programming languages provide regular expression capabilities, some built-in, for example Perl, JavaScript, Ruby, AWK, and Tcl, and others via a standard library, for example .NET languages, Java, Python and C++ (since C++11). Most other languages offer regular expressions via a library.

Share tech news, updates, or what's on your mind.

Sign up to Post

Looking for a REGEX to extract the last field delimited by the pipe symbol(myContractNo) in this example:

FAS HC508C016P |116006|116006|myContractNo| 1  QR_CODE 2018-08-15T11:32:46-04:00
0
Upgrade your Question Security!
LVL 12
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

Hi Everyone,

I am trying to use regular expressions to parse the date from roughly 40 different file name strings in an automated environment. I have one solution but it’s not returning the correct string in all cases I need it to. I think it should be fairly straightforward, The problem is the names have different formats for the date piece and in some file names the numeric string is longer than 16 digits and highlights both parts as if they are two dates when they aren't. Also I have not come across any abbreviated year format's so checking for 20** should be acceptable. Finally I have never seen a file with a date pattern of mmddyyyy or any combination where year is at the end of the string, so this format type should not be considered.

My RegEx string I'm trying to build at this point.
"\d{4}\d{4}|\d{4}\d{1,2}|\d{4}-\d{1,2}-\d{1,2}"

Open in new window

I just don't have enough experience with more complex expressions.

Below are the main date types that I am trying to parse, unfortunately I don't have any control over the naming convention of the files themselves so I must be prepared for any and all of the following...

File_type_OneA_2018-8-09.csv <-Month and day sections are not always consistent
FileTypeOneB-2018-6-29.csv

File Type Two 201807.xls <-No day value

201310140703_FileTypeThreeA.csv  <-where the date is the first 8 chars.
20180531_FileTypeThreeB.csv

FileTypeFour-20180713090107228.xml <-I cannot say how the next 9 digits are …
0
What would be the JavaScript regex (or another approach) to detect in a string (path from a website)  that either starts with:

/small-business
/wealth-management
/commercial-banking

or contains:

campaigns
0
I need help with a regular expression.  I just need an expression that will find a string of words.   For example;

Before:                                            Expected Result
37"  (side to side)                           37"
1-1/2'  (yellow)                                1-1/2'

Just need an expression that will find any non-numeric character except " ' / and -.
0
Hi team,

How to validate square bracket '[ and ]' in JavaScript (regular expression).

I have a textbox having value like this. So, I need a regular expression which will tell me square bracket is present in the textbox or not.

If there is a square bracket present in the textbox it will give me a error message.

[ABC  ESS080820183171][CTK  ESS080820189505][TROUS  ESS080820183485][SIMINESS  ESS080820184038][RAMMMM  ESS080820185998]
0
I have noticed a bot that is using several ip numbers at the same time that I want to block.  This will take a custom filter but all of the tutorials to create a filter take you up to the status code column.  I want the regex to catch HeadlessChrome.

The problem is that the browser name is much further past the status code as you can see below:


1.1.1.1 - - [05/Aug/2018:17:23:51 -0400] "POST /?wc-ajax=get_refreshed_fragments HTTP/1.1" 200 273 "https://www.theherbsplace.com/Fat_Grabbers_120_Capsules_p_189.html" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/64.0.3282.119 Safari/537.36"

Please tell me how to build a regex that goes past the status code and picks up the browser name.
0
var request = "{ "merchantRef":"ffd031516002"";

function camelToUnderscore(str) {

      return str.replace(/([a-z])([A-Z])/g, '$1_$2').toLowerCase(); //merchant_ref

}

i would want to return merchant_ref matching string before ":
0
How do I find and replace to remove inside div child element, keeping everything (all <p>*</p>'s) .  some times their is just on p element, and sometimes alot.
link to regex101
<div class="expand_collapse section_box">
     <h2>Animal </h2>
     <div class="box_content">                        
          <div class="expand_collapse section_box">
              <h2>Agent Facing</h2>
               <div class="box_content">
                  <p><a href="/Brochure.pdf" target="_blank">Animal<br>(bully breeds &amp; bite history)</a></p>                                                   
              </div>
          </div>
          <div class="expand_collapse  section_box">
              <h2>Consumer Facing</h2>
              <div class="box_content">
                 <p>content</p>
              </div>
          </div>
       </div>
   </div>
   <!-- Next Product -->

Open in new window


I just want to remove:
 <div class="expand_collapse section_box">
      <h2>Agent Facing</h2>
       <div class="box_content">

Open in new window

and
                           </div>
                        </div>
                        <div class="expand_collapse  section_box">
                          <h2>Consumer Facing</h2>
                          <div class="box_content">
                            <p>content</p>
                          </div>
                        </div>

Open in new window

so keeping one or more of the 'P' elements inside the second div.box_content element.
I think I need to create a group for one or many p elements  (<p>.*</p>)  then replace with $1? So I keep the p's? (This does not work)
Can you provide a link to a resource to help me with regex as I do a lot of replaces/removals.

It would look like this when done.
<div class="expand_collapse section_box">
    <h2>Animal </h2>
     <div class="box_content">                        
         <p><a href="/Brochure.pdf" target="_blank">Animal<br>(bully breeds &amp; bite history)</a></p>
      </div>
</div>
<!-- Next Product -->

Open in new window


Thanks in advance.   I am not good with regex.
0
Help me to fetch first 3 lines of my email body by regular expression
0
I have a text file which has multiple attached words like RainyDay, PlayingInTheCold etc. These words can be split into normal forms using regex to make them into meaningful words.
import re, string, html
with open("1.txt", "r") as fin, open("2.txt", "w") as fout:
    for text in fin:
        words = text.split()
        cleaned = " ".join(re.findall('[A-Z][^A-Z]*', words))
        fout.write(cleaned)

Open in new window

error: error1Also, there are many slang words like helo, luv which should be converted to hello, love. I am trying like this
with open("1.txt", "r") as fin, open("2.txt", "w") as fout:
    for text in fin:
        words = text.split()
        words = slang_loopup(words)
        text = ' '.join(words)
        fout.write(cleaned)

Open in new window

I tried _slang_loopup() also but same NameError
error is: error2
Can someone please help me?
Thanks
0
Cloud Class® Course: Python 3 Fundamentals
LVL 12
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

I need an SQL regex to capture ICD10 code range from:  F1 thru F9.99, i was trying the following: t10."ICD-10 CODE" LIKE 'F[1-9]%' but it is not working out for me.
0
https://www.experts-exchange.com/questions/29106263/replacing-hypertext-links-with-simple-text-'URL'.html#a42605453

I need to remove all http starting from http:\/
I am able to remove all http starting from http://  and the regular expression for this is given in the above link.

url_rex = re.compile(r'(http|ftp|https)://[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?')

How can I modify this regular expression remove http starting from http:\/

Also, the below code deletes the characters @, # from the words starting from @ or # which I intend to keep.

import re, string, html
uni_escape = re.compile(r'\\u[0-9a-f]{4}')
url_rex = re.compile(r'(http|ftp|https)://[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?')
with open("input.txt", "r") as fin, open("output.txt", "w") as fout:
    for text in fin:
		unesc_html_text = html.unescape(text)
		encoded_text = bytes(unesc_html_text, 'ascii', errors='ignore')
		decoded_ascii_text = encoded_text.decode('ascii')
		unesc_text = re.sub(uni_escape, '', decoded_ascii_text)
		text = url_rex.sub('', unesc_text)
		fout.write(text)

Open in new window


How to change the regular expression to keep @ and # in words?
0
I need to replace a line in a configuration file and want to make sure it's done safely since it needs to be done on several systems.

The line in the file is:
*.emerg                                                 *

Open in new window

I want to replace the end of line * on only lines that start with *.emerg with :omusrmsg:* so it looks like this:
*.emerg                                                 :omusrmsg:*

Open in new window

This sed command seems to work, but was curious if there was a more full proof way to do it:
sed -i '/^\*.emerg/s/\*$/:omusrmsg:*/' config.conf

Open in new window

0
Using Oracl regular expressions I want to extract character after the \

'OPS$BCTGTWDOM\SMANAVI'     Output => SMANAVI

How can I do this?

Thanks
0
Is there any machine learning algorithm using which we can detect/identify the articles from multiple news pages of any website.I tried it using Beautifulsoup/python fetching all links and processing it based on regular expressions but its taking much time.

Any help or suggestion will be much appreciated.
0
As far as I can tell, the Multiline property of regexp does not have any effect on anything.  Can anybody give me an example where it actually changes the behavior of regexp?

For instance, in the following program it does not matter.
Sub func2()
' extract 7 digit numbers for a string, and treat everything else as a deliminter
Dim regex As New regexp
Dim Mullti, Glog, s1, s2, s3, msg, match As Object
Debug.Print "\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
For Each multi In Array(True, False)

Debug.Print "Multiline is " & multi

For Each Glob In Array(True, False)
regex.MultiLine = Mullti '  this two values do not affect the outcome of this test
regex.Global = Glob ' this two values do not affect the outcome of this test
s1 = " 1234567" & vbLf & "567 2072083 "  ' vbLf DOES NOT match to .*
s2 = " 1234567" & vbCr & "567 2072083 "  ' vbcr DOES match to .*  regardless of the multiline value.
s3 = " 1234567" & vbCrLf & "567 2072083 "  ' vbcr DOES match, but the lf DOES NOT match to .*
regex.pattern = "(\b|\D)\d{7,7}(?=\b|\D)" ' use [\s\S]+ instead of .+  and if you REALLY want to match anything
msg = ""
If Glob Then msg = msg & "Global "
If Not Glob Then msg = msg & "Not Global "

Set match = regex.Execute(s1) '
msg = match.count & "-" & msg
Set match = regex.Execute(s2) '
msg = match.count & "-" & msg
Set match = regex.Execute(s3) '
msg = match.count & "-" & msg
Debug.Print msg
'Stop
Next
Next
Debug.Print "///////////////////////////////"
End Sub

Open in new window

0
VBA regex supports look ahead, but not look behind.
As a result, I frequently use .submatches which makes my code wordy and harder to understand.

Has anybody ever figured out a way around this?

The following code demonstrates the problem. I desire to extract every integer from a string, and ignore everything else.

Dim regex As Object, match As Object
Set regex = CreateObject("vbscript.regexp")
regex.Global = True
regex.MultiLine = False

' This is displays exactly what I want, but it leads to a wordy msgbox syntax.

regex.Pattern = "(\b|\D)(\d{7})(?=\b|\D)"
Set match = regex.Execute("a1234567 1234 a2072083")

MsgBox match.Count & "'first=" & match(0).submatches(1) & "'last=" & match(match.Count - 1).submatches(1)


' I wish I could do it this way because the msgbox syntax is cleaner. Unfortunately, it does not work because VBA does not support lookbehind.

regex.Pattern = "(?<=\b|\D)(\d{7})(?=\b|\D)"
Set match = regex.Execute("a1234567a2072083")

MsgBox match.Count & "'first=" & match(0) & "'last=" & match(match.Count - 1)

Open in new window

1
Hi there,
Is there a way/tool/ search engine that would allow me to search for multiple variations of the same phrase at the same time?
For example, I want to search for "revealing his true belief". But I also want to search for all the phrases that have the same or similar meaning like"betraying their real opinion".
The total number of words here is 8 but there are many different ways to combine the words so there are many different potential alternative phrases: "revealing his true opinion" "revealing his real opinion" "betraying our real opinion" etc….
Is there a way to accomplish this in two conditions?
1- I determine the alternatives. So I specify that for "word 1 word 2 word 3 word4", the alternatives for word 1 are: Only revealing or showing.
2- Use all the possible synonyms or even antonyms for "revealing" in word 1.
The tips I got so far from Reddit
"
Word2Vec or Doc2Vec is something that can be used for this, depending on whether you just wish to substitute synonyms or match the distributional semantics of arbitrary phrases.

---
If you don’t know your alternatives (e.g., you don’t know that revealing is similar to showing), then you need something than can do synonyms. Most search engines (e.g., Google) can do this, as can some natural language processing programs.

If you do know the alternatives, you can describe your pattern to a search engine or most computer programs using regular expressions. A regex that would match your example would be …
0
What would the JS regex be for a path that has to be either "/responsive/appointment.html" or "/appointment.html"?  These would always need to be a top-level directory (no other dirs preceding them? Example : ^(/appointment.html)

Valid: "/responsive/appointment.html"
Valid: "/appointment.html"
Invalid: "/dir2/appointment.html"
Invalid: "/somedir/responsive/appointment.html"

Thanks!
0
Cloud Class® Course: SQL Server Core 2016
LVL 12
Cloud Class® Course: SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

In Excel, what is the fastest/most compatible RegEx and/or VBA code needed to download the third table from a website WITHOUT Query Tables?
0
Hi.  I am having some trouble to get the pattern for the following

Match criteria
1)   word begins with 4 alphabetical characters
2)  immediately followed by "PE"
3)  get the rest of the word

string to be searched  =  "This is the sample string abcdPEy2c4s  where I hope to get that weird word."


I would like the result to be abcdPEy2c4s
ps.. The string may be multiline.

Thank you.
0
What is the best way to pull the second table from a website without using Web Queries?

 RegEx and/or VBA seem promising, but am willing to explore other alternatives.
0
How would i get all child elements of a root element by class name using a regex pattern?

something like this

var items = $("#root_element_id").children("div:regex(class,\_[a-z0-9]{4})");
0
I'm trying to extract three groups from an html text (attached), when trying to run the expression on a regex debugger on a website it works. But when i try in my C# program, it doesn't. Any help would be appreciated.

String input:
"    <div class=\"download\">\n        <a href=\"Movie/שלושה-שלטים-מחוץ-לאבינג-מיזורי-_7685.html\" title=\"שלושה שלטים מחוץ לאבינג, מיזורי  / Three Billboards Outside Ebbing, Missouri\"><img src=\"images/movies/hX8EbtvY61p8.jpg\" alt=\"Three Billboards Outside Ebbing, Missouri\" /></a><!--\n        --><div class=\"content\">\n            <div class=\"title\">\n                <a href=\"Movie/שלושה-שלטים-מחוץ-לאבינג-מיזורי-_7685.html\" title=\"שלושה שלטים מחוץ לאבינג, מיזורי  / Three Billboards Outside Ebbing, Missouri\"><h4>CAPTURE1</h4><span> / </span><h4>CAPTURE2</h4></a>\n            </div>\n            <div class=\"text\">\n                <p>CAPTURE3</p>\n            </div>\n            <div class=\"bottom\">\n                <p>\n                שנת יציאה: <b>2017</b> | \n                תאריך העלאה: <b title=\"21-05-2018 15:08\">לפני 5 ימים</b> | \n                צפיות: <b>7824</b>\n                </p>\n            </div>\n        </div>\n    </div>\n    "


Regex pattern -  <h4>(.*?)</h4>.*?<h4>(.*?)</h4>.*?<p>(.*?)</p>
0
I'm reading a text field from a pdf form and this is a date field.
Unfortunately the format used was for example -  4/9/18 instead of the acceptable standard format 04/09/2018  (xx/xx/xxxx) and this is causing me a lot of issues.

What is the regular expression formula to correct this and may require a conditional formula because by the time it reaches October, the month will change to 2 digits  and like wise for day when it changes to 2 digits beginning with 10...
0

Regular Expressions

A regular expression ("regex") is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. Regular expression processors are found in several search engines, search and replace dialogs of several word processors and text editors, and in the command lines of text processing utilities, such as sed and AWK. Many programming languages provide regular expression capabilities, some built-in, for example Perl, JavaScript, Ruby, AWK, and Tcl, and others via a standard library, for example .NET languages, Java, Python and C++ (since C++11). Most other languages offer regular expressions via a library.