Link to home
Start Free TrialLog in
Avatar of futr_vision
futr_vision

asked on

How do I use regex too find 2 groups in a URL?

I am not a programmer by trade so I did my best but I need help. I have a need to create a rule for Google Tag Manager using regex. My goal is to look at a URL and find two separate group matches in the string. Here is a sample URL

http://123.website.com/?&guid=blahblahblah&page=something&type=abc&adv=abc1234&site={siteID}

I originally had this which worked great if it weren't for the "&guid=blahblahblah&page=something&" in between the two groups. How do I check for those two groups in one expression? Here is what I oginally had:

(http://)(([0-9])|([0-9][0-9])|([0-9][0-9][0-9])).website.com\?(type\=abc)

Bonus: How can I make it check for https as well as http?

Thx!
Avatar of Dan Craciun
Dan Craciun
Flag of Romania image

Assuming you want to capture "123" and "type=abc" from your sample link, this:
https?://(\w+)\.website\.com/.*(type=\w+)

Open in new window

will give you "123" in $1 and "type=abc" in $2.

Bonus: it allows both http and https :)

HTH,
Dan
Avatar of futr_vision
futr_vision

ASKER

Cool. I got this answer in another resource.

https?://\d{1,3}\.website\.com/.*type=abc.* (pearl which I am not sure works with Google Tag Manager)

and this one

https?:\/\/([\d]{1,4})\.website\.com\/.*?&type=(.*?)&.*?

The second one works but type needs to be an exact match. I also, after testing found that I need to revise my request. I need the statement to find an exact match for both type= and adv= . So in my example I need to make sure that "type=abc" and "adv=abc1234" match exactly.
OK. This:
https?:\/\/(\w+)\.website\.com\/.*type=abc&adv=abc1234

Open in new window

will give you "123" in $1, only if type=abc and adv=abc1234
Hmm. So maybe i am not being completely clear or maybe I am misreading your response.

A number from 1-9999 needs to be in that first spot after the http(s)://
If a number is present and only a number then it has to match the type= and the adv= next.

Is that what your regex does?
My regexp will return any letter, digit or _ between "http://" and ".website.com". If you need to restrict it to a number between 1 and 1999, change \w+ to
[1-9]\d{0,3}
Would this work?

https?:\/\/(\d{1,4})\.website\.com\/.*type=abc&adv=abc1234

Actually, I think it will fail if it starts with a "0". I don't forsee that happening but maybe I should be loose in my definition in case that does happen.
ASKER CERTIFIED SOLUTION
Avatar of Dan Craciun
Dan Craciun
Flag of Romania image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
what I am saying is the person creating the landing pages may start the number with a "0". I don't have much control over that. i haven't seen it happen but who knows. I guess I could always alter it if I need too in those instances. Otherwise I think this will work. Now to make sure that Google Tag Manager accepts it :) Looks like pretty standard RegEx so I don't see why not.
Hmm. Just test4ed. I think the ampersand might be breaking it. This is how Google see it

{{url}} matches RegEx https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc&adv=abc1234
Then use an omnichar:
https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc\.adv=abc1234
doesn't like me escaping the ampersand either. I'll try you newest method. Thanks!
As I said, use an omnichar. The only risk is matching stuff like type=abc,adv=abc1234, which are illegal anyway.
Hmm. That doesn't validate in any of the tools I've tested it with or in google. Looks like all you did was escape a period but then again I don't know much about RegEx so I am probably missing a nuance. I need to be fairly strict about things but it also needs to work :)
What a trial this has been. You'd think this would be easier.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Cool. Looks like that works and I found out that the other one works as well. The one with the ampersand. One quick question. if I do not care whether or not the URL starts with an http:// or https:// do i just leave the https?:\/\/ off? I'm guessing that is not exactly the solution is it?
Hmmm. I don't think the expression fails if I remove https?:\/\/
As a quick add on to this will this finad page that start with a combination of letters and numbers such as in this URL?

http://ab12.website.com

https?:\/\/(\w+)
Quick answer: it will find ab12
perfect