Solved

How do I use regex too find 2 groups in a URL?

Posted on 2014-04-18
19
188 Views
Last Modified: 2014-08-06
I am not a programmer by trade so I did my best but I need help. I have a need to create a rule for Google Tag Manager using regex. My goal is to look at a URL and find two separate group matches in the string. Here is a sample URL

http://123.website.com/?&guid=blahblahblah&page=something&type=abc&adv=abc1234&site={siteID}

I originally had this which worked great if it weren't for the "&guid=blahblahblah&page=something&" in between the two groups. How do I check for those two groups in one expression? Here is what I oginally had:

(http://)(([0-9])|([0-9][0-9])|([0-9][0-9][0-9])).website.com\?(type\=abc)

Bonus: How can I make it check for https as well as http?

Thx!
0
Comment
Question by:futr_vision
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 11
  • 8
19 Comments
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 40009369
Assuming you want to capture "123" and "type=abc" from your sample link, this:
https?://(\w+)\.website\.com/.*(type=\w+)

Open in new window

will give you "123" in $1 and "type=abc" in $2.

Bonus: it allows both http and https :)

HTH,
Dan
0
 

Author Comment

by:futr_vision
ID: 40010934
Cool. I got this answer in another resource.

https?://\d{1,3}\.website\.com/.*type=abc.* (pearl which I am not sure works with Google Tag Manager)

and this one

https?:\/\/([\d]{1,4})\.website\.com\/.*?&type=(.*?)&.*?

The second one works but type needs to be an exact match. I also, after testing found that I need to revise my request. I need the statement to find an exact match for both type= and adv= . So in my example I need to make sure that "type=abc" and "adv=abc1234" match exactly.
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 40010944
OK. This:
https?:\/\/(\w+)\.website\.com\/.*type=abc&adv=abc1234

Open in new window

will give you "123" in $1, only if type=abc and adv=abc1234
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Author Comment

by:futr_vision
ID: 40011513
Hmm. So maybe i am not being completely clear or maybe I am misreading your response.

A number from 1-9999 needs to be in that first spot after the http(s)://
If a number is present and only a number then it has to match the type= and the adv= next.

Is that what your regex does?
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 40011553
My regexp will return any letter, digit or _ between "http://" and ".website.com". If you need to restrict it to a number between 1 and 1999, change \w+ to
[1-9]\d{0,3}
0
 

Author Comment

by:futr_vision
ID: 40011603
Would this work?

https?:\/\/(\d{1,4})\.website\.com\/.*type=abc&adv=abc1234

Actually, I think it will fail if it starts with a "0". I don't forsee that happening but maybe I should be loose in my definition in case that does happen.
0
 
LVL 35

Accepted Solution

by:
Dan Craciun earned 500 total points
ID: 40011616
You need this to make sure the number does not start with 0:
https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc&adv=abc1234

Open in new window

0
 

Author Comment

by:futr_vision
ID: 40011636
what I am saying is the person creating the landing pages may start the number with a "0". I don't have much control over that. i haven't seen it happen but who knows. I guess I could always alter it if I need too in those instances. Otherwise I think this will work. Now to make sure that Google Tag Manager accepts it :) Looks like pretty standard RegEx so I don't see why not.
0
 

Author Comment

by:futr_vision
ID: 40012580
Hmm. Just test4ed. I think the ampersand might be breaking it. This is how Google see it

{{url}} matches RegEx https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc&adv=abc1234
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 40012637
Then use an omnichar:
https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc\.adv=abc1234
0
 

Author Comment

by:futr_vision
ID: 40012646
doesn't like me escaping the ampersand either. I'll try you newest method. Thanks!
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 40012651
As I said, use an omnichar. The only risk is matching stuff like type=abc,adv=abc1234, which are illegal anyway.
0
 

Author Comment

by:futr_vision
ID: 40012669
Hmm. That doesn't validate in any of the tools I've tested it with or in google. Looks like all you did was escape a period but then again I don't know much about RegEx so I am probably missing a nuance. I need to be fairly strict about things but it also needs to work :)
What a trial this has been. You'd think this would be easier.
0
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 500 total points
ID: 40012677
Yup, you're right :) The omnichar is a dot. No escape needed.

https?:\/\/([1-9]\d{0,3})\.website\.com\/.*type=abc.adv=abc1234
0
 

Author Comment

by:futr_vision
ID: 40013415
Cool. Looks like that works and I found out that the other one works as well. The one with the ampersand. One quick question. if I do not care whether or not the URL starts with an http:// or https:// do i just leave the https?:\/\/ off? I'm guessing that is not exactly the solution is it?
0
 

Author Comment

by:futr_vision
ID: 40013663
Hmmm. I don't think the expression fails if I remove https?:\/\/
0
 

Author Comment

by:futr_vision
ID: 40244701
As a quick add on to this will this finad page that start with a combination of letters and numbers such as in this URL?

http://ab12.website.com

https?:\/\/(\w+)
0
 
LVL 35

Expert Comment

by:Dan Craciun
ID: 40244737
Quick answer: it will find ab12
0
 

Author Comment

by:futr_vision
ID: 40244749
perfect
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

622 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question