Tolgar
asked on
How to extract part of a string using regular expressions in JS?
Hi,
I wan to capture a part of a string using regular expressions in JavaScript. The example case is below.
Example:
http://www-internal.mywork.com/mywork/devel/sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/sbtest/myweb.html
In this string I want to capture the following:
/sandbox/tolgar/Dir1/Dir2/ Dir3/Dir4/
I showed 4 directories but this number may vary. The only thing that I am sure of is the I want to capture the part upto sbtest. I don't want to include sbtest in the part that I capture.
Thanks,
I wan to capture a part of a string using regular expressions in JavaScript. The example case is below.
Example:
http://www-internal.mywork.com/mywork/devel/sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/sbtest/myweb.html
In this string I want to capture the following:
/sandbox/tolgar/Dir1/Dir2/
I showed 4 directories but this number may vary. The only thing that I am sure of is the I want to capture the part upto sbtest. I don't want to include sbtest in the part that I capture.
Thanks,
What is different about this question than the one you previously asked?
kaufmed - difference is he doesn't want the "sbtest" final subdirectory included anymore.
I'll let you take this one. :)
I'll let you take this one. :)
ASKER
Oh yes, I don't want the "sbtest" final subdirectory included.
Thanks,
Thanks,
@sjklein42
>> I'll let you take this one
I bow to you, sir.
@Tolgar
Then you should be able to modify sjklein42's previous solution to accommodate:
>> I'll let you take this one
I bow to you, sir.
@Tolgar
Then you should be able to modify sjklein42's previous solution to accommodate:
sandbox\/(.*?)[^\/]+/[^\/]+$
kaufmed, you are my hero.
ASKER
This does capture till the end including .html. I only want till the beginning of sbtest
Thanks,
Thanks,
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
ok but it does not work as in your case in my code
sbcheckLink is the actual string. AndI expect "sandboxUsernameSBLocglnx" to be exactly what you got.
Could there be a difference between Perl and JavaScript?
Thanks,
var sandboxUsernameSBLocglnx = sbcheckLink.match(/(\/sandbox\/.*?\/)sbtest\/[^\/]+$/g);
sbcheckLink is the actual string. AndI expect "sandboxUsernameSBLocglnx"
Could there be a difference between Perl and JavaScript?
Thanks,
Tiny test HTML page with Javascript
I get an alert box showing /sandbox/tolgar/Dir1/Dir2/ Dir3/Dir4/
What do you get?
<script>
var str = 'http://www-internal.mywork.com/mywork/devel/sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/sbtest/myweb.html';
var pat = /(\/sandbox\/.*?\/)sbtest\/[^\/]+$/;
var mymatch = pat.exec(str);
alert( mymatch[1] )
</script>
I get an alert box showing /sandbox/tolgar/Dir1/Dir2/
What do you get?
ASKER
Hi again,
ok it works in your example. However, in my case I have multiple matches and this one only changes the first match. How can imake it greedy?
Thanks,
ok it works in your example. However, in my case I have multiple matches and this one only changes the first match. How can imake it greedy?
Thanks,
So if I understand the problem now, your input variable ( sbcheckLink ) contains more than one of these URLs, and you want to find them all? Please post a more complete input example so I can see what you are dealing with and how to separate out the URLs from the rest of the string. If that is the case.
@sjklein42
>> kaufmed, you are my hero.
As long as I'm not "the wind beneath your wings," then I'm cool = )
@Tolgar
It's important, especially with regex, to clearly define you input and expected output. Your initial example only described a single occurrence, but now the impression is that multiple occurrences can exist. With regex, whereas one particular pattern may match one set of data, it may be meaningless to another set of data, as you should now be becoming familiar with. Just as changing one character in a pattern can drastically change the meaning of the pattern, so too can changing the input data drastically change the result of a pattern matching or not matching : )
>> kaufmed, you are my hero.
As long as I'm not "the wind beneath your wings," then I'm cool = )
@Tolgar
It's important, especially with regex, to clearly define you input and expected output. Your initial example only described a single occurrence, but now the impression is that multiple occurrences can exist. With regex, whereas one particular pattern may match one set of data, it may be meaningless to another set of data, as you should now be becoming familiar with. Just as changing one character in a pattern can drastically change the meaning of the pattern, so too can changing the input data drastically change the result of a pattern matching or not matching : )
ASKER
As in your example I tried exactly my case on a seperate simple webpage:
<script>
var str = 'http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html';
/sandbox/tolgar/Abcde/preq ual_testlo g/Abcde_j1 46899_myte sts/glnxa6 4/
which is totaly what I want.
However, in my Greasemonkey script there is one case where this returns null. It is regardless of how many matches on the page. Because on other pages where there are multiple matches it works fine. However only on one page it does what I want for the first 3 matches then it does not do the same for the rest of the matches.
I checked the error console and I see that the following line return null:
Since "sandboxUsernameSBLocglnx" is null it stops I guess. Becuase this what I see last in the error console messages tab even though I put GM log command for the next lines. Then I copied "thisAnchor.href" to a string on this simple web page which I showed you above. The str variable. And did the same pattern.exec and it worked.
But it does not work in my GM script. I don;t understand why it returns null for this match even though it works fine on this seperate examle.
My entire code is in the attachment. Null is returned at line 214
expert.user.txt
<script>
var str = 'http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html';
var pat = /(\/sandbox\/.*?\/)sbtest\/[^\/]+$/;
var mymatch = pat.exec(str);
alert( mymatch[1] )
</script>
This returns:/sandbox/tolgar/Abcde/preq
which is totaly what I want.
However, in my Greasemonkey script there is one case where this returns null. It is regardless of how many matches on the page. Because on other pages where there are multiple matches it works fine. However only on one page it does what I want for the first 3 matches then it does not do the same for the rest of the matches.
I checked the error console and I see that the following line return null:
var sandboxUsernameSBLocglnx = pattern.exec(thisAnchor.href);
Since "sandboxUsernameSBLocglnx"
But it does not work in my GM script. I don;t understand why it returns null for this match even though it works fine on this seperate examle.
My entire code is in the attachment. Null is returned at line 214
expert.user.txt
Yes, that is weird. Line 214 where it is faiing can only be reached if line 197's match for .html fails:
line 197 -> if (thisAnchor.href.match(/\. html/g)){
but the string you showed at the top of your previous message clearly has a ".html" in it.
line 197 -> if (thisAnchor.href.match(/\.
but the string you showed at the top of your previous message clearly has a ".html" in it.
ASKER
Yes, that's right. I still couldn't figure why this happens.And it works fine on other pages similar to this one. I don't understand why it fails at that point even if it satisfies the if condition as you stated.
Thanks,
Thanks,
Gremlins.
It may not make a difference here, but I don't think you need the /g on most of the pattern match calls.
/g means "global" which makes it find all the matches if there are more than one and put the resullts in an array.
I think in most of your regexp matches you only want to find only one match each time through the loop.
Specifically I think you can get rid of the "g" flag on lines 172, 174, 179, 197, 199, 204, and 214.
I notice you debug log the value of thisAnchor.href in two places - lines 165 and 196. When you look at the output log, do you see both these log lines next to each other as we would expect?
We'll get to the bottom of this eventually.
It may not make a difference here, but I don't think you need the /g on most of the pattern match calls.
/g means "global" which makes it find all the matches if there are more than one and put the resullts in an array.
I think in most of your regexp matches you only want to find only one match each time through the loop.
Specifically I think you can get rid of the "g" flag on lines 172, 174, 179, 197, 199, 204, and 214.
I notice you debug log the value of thisAnchor.href in two places - lines 165 and 196. When you look at the output log, do you see both these log lines next to each other as we would expect?
We'll get to the bottom of this eventually.
ASKER
I don't know what you meant by next to each other, but when I see the error console in FF, I can see the output of lines 165 and 196 as shown below:
=== PART of THE LOG TILL THE END ===
=== END of the LOG ===
=== PART of THE LOG TILL THE END ===
i = 41
All anchors before match ==>http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html
MATCHED regexp
Set SPAN NODE
http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html
APPEND IMAGE
thisAnchor.href===>'http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html';
pattern===>/(\/sandbox\/.*?\/)sbtest\/[^\/]+$/g
=== END of the LOG ===
Yes, thanks. That's what I was wondering about - whether the two places you show the value of thisAnchor.href showed the same thing - and they do, as expected. I don't see anything unusual about the string.
This is the end of the log file, just before it crashes on line 214, right? So we know pretty much for sure this has to be the href that is causing it to crash?
Here's a different idea: Are you using Firefox, or IE? If Firefox, try upgrading. We may be hitting the regexp bug they just fixed in the Firefox4 Beta release:
http://www.mozilla.com/en-US/firefox/beta/
This is the end of the log file, just before it crashes on line 214, right? So we know pretty much for sure this has to be the href that is causing it to crash?
Here's a different idea: Are you using Firefox, or IE? If Firefox, try upgrading. We may be hitting the regexp bug they just fixed in the Firefox4 Beta release:
http://www.mozilla.com/en-US/firefox/beta/
ASKER
>>This is the end of the log file, just before it crashes on line 214, right? So we know pretty much for sure this has to be the href that is causing it to crash?
Yes.
And this script is in greasemonkey script. So I have to use Firefox. I use Firefox 3.6.13. I checked the updates for Firefox and it didn't find any.
Thanks,
Yes.
And this script is in greasemonkey script. So I have to use Firefox. I use Firefox 3.6.13. I checked the updates for Firefox and it didn't find any.
Thanks,
FF4 Beta is not an update, it has to be downloaded specifically. Here's the link:
http://www.mozilla.com/en-US/firefox/beta/
http://www.mozilla.com/en-US/firefox/beta/
ASKER
Great catch !!! It worked after installing Firefox4 Beta
Thanks for the help,
Thanks for the help,
WOW! WHAT A RELIEF!!!
Thanks!
Thanks!