Solved

How do I use regex too find 2 groups in a URL?

Posted on 2014-04-18
19
184 Views
Last Modified: 2014-08-06
I am not a programmer by trade so I did my best but I need help. I have a need to create a rule for Google Tag Manager using regex. My goal is to look at a URL and find two separate group matches in the string. Here is a sample URL

http://123.website.com/?&guid=blahblahblah&page=something&type=abc&adv=abc1234&site={siteID}

I originally had this which worked great if it weren't for the "&guid=blahblahblah&page=something&" in between the two groups. How do I check for those two groups in one expression? Here is what I oginally had:

(http://)(([0-9])|([0-9][0-9])|([0-9][0-9][0-9])).website.com\?(type\=abc)

Bonus: How can I make it check for https as well as http?

Thx!
0
Comment
Question by:futr_vision
  • 11
  • 8
19 Comments
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40009369
Assuming you want to capture "123" and "type=abc" from your sample link, this:
https?://(\w+)\.website\.com/.*(type=\w+)

Open in new window

will give you "123" in $1 and "type=abc" in $2.

Bonus: it allows both http and https :)

HTH,
Dan
0
 

Author Comment

by:futr_vision
ID: 40010934
Cool. I got this answer in another resource.

https?://\d{1,3}\.website\.com/.*type=abc.* (pearl which I am not sure works with Google Tag Manager)

and this one

https?:\/\/([\d]{1,4})\.website\.com\/.*?&type=(.*?)&.*?

The second one works but type needs to be an exact match. I also, after testing found that I need to revise my request. I need the statement to find an exact match for both type= and adv= . So in my example I need to make sure that "type=abc" and "adv=abc1234" match exactly.
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40010944
OK. This:
https?:\/\/(\w+)\.website\.com\/.*type=abc&adv=abc1234

Open in new window

will give you "123" in $1, only if type=abc and adv=abc1234
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 

Author Comment

by:futr_vision
ID: 40011513
Hmm. So maybe i am not being completely clear or maybe I am misreading your response.

A number from 1-9999 needs to be in that first spot after the http(s)://
If a number is present and only a number then it has to match the type= and the adv= next.

Is that what your regex does?
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40011553
My regexp will return any letter, digit or _ between "http://" and ".website.com". If you need to restrict it to a number between 1 and 1999, change \w+ to
[1-9]\d{0,3}
0
 

Author Comment

by:futr_vision
ID: 40011603
Would this work?

https?:\/\/(\d{1,4})\.website\.com\/.*type=abc&adv=abc1234

Actually, I think it will fail if it starts with a "0". I don't forsee that happening but maybe I should be loose in my definition in case that does happen.
0
 
LVL 34

Accepted Solution

by:
Dan Craciun earned 500 total points
ID: 40011616
You need this to make sure the number does not start with 0:
https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc&adv=abc1234

Open in new window

0
 

Author Comment

by:futr_vision
ID: 40011636
what I am saying is the person creating the landing pages may start the number with a "0". I don't have much control over that. i haven't seen it happen but who knows. I guess I could always alter it if I need too in those instances. Otherwise I think this will work. Now to make sure that Google Tag Manager accepts it :) Looks like pretty standard RegEx so I don't see why not.
0
 

Author Comment

by:futr_vision
ID: 40012580
Hmm. Just test4ed. I think the ampersand might be breaking it. This is how Google see it

{{url}} matches RegEx https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc&adv=abc1234
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40012637
Then use an omnichar:
https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc\.adv=abc1234
0
 

Author Comment

by:futr_vision
ID: 40012646
doesn't like me escaping the ampersand either. I'll try you newest method. Thanks!
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40012651
As I said, use an omnichar. The only risk is matching stuff like type=abc,adv=abc1234, which are illegal anyway.
0
 

Author Comment

by:futr_vision
ID: 40012669
Hmm. That doesn't validate in any of the tools I've tested it with or in google. Looks like all you did was escape a period but then again I don't know much about RegEx so I am probably missing a nuance. I need to be fairly strict about things but it also needs to work :)
What a trial this has been. You'd think this would be easier.
0
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 500 total points
ID: 40012677
Yup, you're right :) The omnichar is a dot. No escape needed.

https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc.adv=abc1234
0
 

Author Comment

by:futr_vision
ID: 40013415
Cool. Looks like that works and I found out that the other one works as well. The one with the ampersand. One quick question. if I do not care whether or not the URL starts with an http:// or https:// do i just leave the https?:\/\/ off? I'm guessing that is not exactly the solution is it?
0
 

Author Comment

by:futr_vision
ID: 40013663
Hmmm. I don't think the expression fails if I remove https?:\/\/
0
 

Author Comment

by:futr_vision
ID: 40244701
As a quick add on to this will this finad page that start with a combination of letters and numbers such as in this URL?

http://ab12.website.com

https?:\/\/(\w+)
0
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 40244737
Quick answer: it will find ab12
0
 

Author Comment

by:futr_vision
ID: 40244749
perfect
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Regular expression for "To be or not to be"... just for fun 4 377
Regular expression patterns 2 77
Need some help with grep 7 99
Regex code,how to do this? 3 44
by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question