Solved

How do I use regex too find 2 groups in a URL?

Posted on 2014-04-18
19
181 Views
Last Modified: 2014-08-06
I am not a programmer by trade so I did my best but I need help. I have a need to create a rule for Google Tag Manager using regex. My goal is to look at a URL and find two separate group matches in the string. Here is a sample URL

http://123.website.com/?&guid=blahblahblah&page=something&type=abc&adv=abc1234&site={siteID}

I originally had this which worked great if it weren't for the "&guid=blahblahblah&page=something&" in between the two groups. How do I check for those two groups in one expression? Here is what I oginally had:

(http://)(([0-9])|([0-9][0-9])|([0-9][0-9][0-9])).website.com\?(type\=abc)

Bonus: How can I make it check for https as well as http?

Thx!
0
Comment
Question by:futr_vision
  • 11
  • 8
19 Comments
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
Assuming you want to capture "123" and "type=abc" from your sample link, this:
https?://(\w+)\.website\.com/.*(type=\w+)

Open in new window

will give you "123" in $1 and "type=abc" in $2.

Bonus: it allows both http and https :)

HTH,
Dan
0
 

Author Comment

by:futr_vision
Comment Utility
Cool. I got this answer in another resource.

https?://\d{1,3}\.website\.com/.*type=abc.* (pearl which I am not sure works with Google Tag Manager)

and this one

https?:\/\/([\d]{1,4})\.website\.com\/.*?&type=(.*?)&.*?

The second one works but type needs to be an exact match. I also, after testing found that I need to revise my request. I need the statement to find an exact match for both type= and adv= . So in my example I need to make sure that "type=abc" and "adv=abc1234" match exactly.
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
OK. This:
https?:\/\/(\w+)\.website\.com\/.*type=abc&adv=abc1234

Open in new window

will give you "123" in $1, only if type=abc and adv=abc1234
0
 

Author Comment

by:futr_vision
Comment Utility
Hmm. So maybe i am not being completely clear or maybe I am misreading your response.

A number from 1-9999 needs to be in that first spot after the http(s)://
If a number is present and only a number then it has to match the type= and the adv= next.

Is that what your regex does?
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
My regexp will return any letter, digit or _ between "http://" and ".website.com". If you need to restrict it to a number between 1 and 1999, change \w+ to
[1-9]\d{0,3}
0
 

Author Comment

by:futr_vision
Comment Utility
Would this work?

https?:\/\/(\d{1,4})\.website\.com\/.*type=abc&adv=abc1234

Actually, I think it will fail if it starts with a "0". I don't forsee that happening but maybe I should be loose in my definition in case that does happen.
0
 
LVL 34

Accepted Solution

by:
Dan Craciun earned 500 total points
Comment Utility
You need this to make sure the number does not start with 0:
https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc&adv=abc1234

Open in new window

0
 

Author Comment

by:futr_vision
Comment Utility
what I am saying is the person creating the landing pages may start the number with a "0". I don't have much control over that. i haven't seen it happen but who knows. I guess I could always alter it if I need too in those instances. Otherwise I think this will work. Now to make sure that Google Tag Manager accepts it :) Looks like pretty standard RegEx so I don't see why not.
0
 

Author Comment

by:futr_vision
Comment Utility
Hmm. Just test4ed. I think the ampersand might be breaking it. This is how Google see it

{{url}} matches RegEx https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc&adv=abc1234
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
Then use an omnichar:
https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc\.adv=abc1234
0
 

Author Comment

by:futr_vision
Comment Utility
doesn't like me escaping the ampersand either. I'll try you newest method. Thanks!
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
As I said, use an omnichar. The only risk is matching stuff like type=abc,adv=abc1234, which are illegal anyway.
0
 

Author Comment

by:futr_vision
Comment Utility
Hmm. That doesn't validate in any of the tools I've tested it with or in google. Looks like all you did was escape a period but then again I don't know much about RegEx so I am probably missing a nuance. I need to be fairly strict about things but it also needs to work :)
What a trial this has been. You'd think this would be easier.
0
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 500 total points
Comment Utility
Yup, you're right :) The omnichar is a dot. No escape needed.

https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc.adv=abc1234
0
 

Author Comment

by:futr_vision
Comment Utility
Cool. Looks like that works and I found out that the other one works as well. The one with the ampersand. One quick question. if I do not care whether or not the URL starts with an http:// or https:// do i just leave the https?:\/\/ off? I'm guessing that is not exactly the solution is it?
0
 

Author Comment

by:futr_vision
Comment Utility
Hmmm. I don't think the expression fails if I remove https?:\/\/
0
 

Author Comment

by:futr_vision
Comment Utility
As a quick add on to this will this finad page that start with a combination of letters and numbers such as in this URL?

http://ab12.website.com

https?:\/\/(\w+)
0
 
LVL 34

Expert Comment

by:Dan Craciun
Comment Utility
Quick answer: it will find ab12
0
 

Author Comment

by:futr_vision
Comment Utility
perfect
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now