Link to home
Start Free TrialLog in
Avatar of Tolgar
Tolgar

asked on

How to extract part of a string using regular expressions in JS?

Hi,
I wan to capture a part of a string using regular expressions in JavaScript. The example case is below.

Example:

http://www-internal.mywork.com/mywork/devel/sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/sbtest/myweb.html

In this string I want to capture the following:

/sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/

I showed 4 directories but this number may vary. The only thing that I am sure of is the I want to capture the part upto sbtest. I don't want to include sbtest in the part that I capture.

Thanks,
Avatar of kaufmed
kaufmed
Flag of United States of America image

What is different about this question than the one you previously asked?
kaufmed - difference is he doesn't want the "sbtest" final subdirectory included anymore.

I'll let you take this one.  :)
Avatar of Tolgar
Tolgar

ASKER

Oh yes, I don't want the "sbtest" final subdirectory included.

Thanks,
@sjklein42

>>  I'll let you take this one

I bow to you, sir.


@Tolgar

Then you should be able to modify sjklein42's previous solution to accommodate:
sandbox\/(.*?)[^\/]+/[^\/]+$

Open in new window

kaufmed, you are my hero.
Avatar of Tolgar

ASKER

This does capture till the end including .html. I only want till the beginning of sbtest

Thanks,
ASKER CERTIFIED SOLUTION
Avatar of sjklein42
sjklein42
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Tolgar

ASKER

ok but it does not work as in your case in my code

var sandboxUsernameSBLocglnx = sbcheckLink.match(/(\/sandbox\/.*?\/)sbtest\/[^\/]+$/g);

Open in new window


sbcheckLink is the actual string. AndI expect "sandboxUsernameSBLocglnx" to be exactly what you got.

Could there be a difference between Perl and JavaScript?


Thanks,
Tiny test HTML page with Javascript

<script>
var str = 'http://www-internal.mywork.com/mywork/devel/sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/sbtest/myweb.html';

var pat = /(\/sandbox\/.*?\/)sbtest\/[^\/]+$/;
var mymatch = pat.exec(str);
alert( mymatch[1] )
</script>

Open in new window


I get an alert box showing /sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/

What do you get?
Avatar of Tolgar

ASKER

Hi again,
ok it works in your example. However, in my case I have multiple matches and this one only changes the first match. How can  imake it greedy?

Thanks,
So if I understand the problem now,  your input variable ( sbcheckLink ) contains more than one of these URLs, and you want to find them all?  Please post a more complete input example so I can see what you are dealing with and how to separate out the URLs from the rest of the string.  If that is the case.
@sjklein42

>>  kaufmed, you are my hero.

As long as I'm not "the wind beneath your wings," then I'm cool   = )


@Tolgar

It's important, especially with regex, to clearly define you input and expected output. Your initial example only described a single occurrence, but now the impression is that multiple occurrences can exist. With regex, whereas one particular pattern may match one set of data, it may be meaningless to another set of data, as you should now be becoming familiar with. Just as changing one character in a pattern can drastically change the meaning of the pattern, so too can changing the input data drastically change the result of a pattern matching or not matching  :  )
Avatar of Tolgar

ASKER

As in your example I tried exactly my case on a seperate simple webpage:

<script>
var str = 'http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html';

var pat = /(\/sandbox\/.*?\/)sbtest\/[^\/]+$/;
var mymatch = pat.exec(str);
alert( mymatch[1] )
</script>

Open in new window

This returns:

/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/

which is totaly what I want.

However, in my Greasemonkey script there is one case where this returns null. It is regardless of how many matches  on the page. Because on other pages where there are multiple matches it works fine. However only on one page it does what I want for the first 3 matches then it does not do the same for the rest of the matches.

I checked the error console and I see that the following line return null:

var sandboxUsernameSBLocglnx  = pattern.exec(thisAnchor.href);

Open in new window


Since "sandboxUsernameSBLocglnx" is null it stops I guess. Becuase this what I see last in the error console messages tab even though I put GM log command for the next lines. Then I copied "thisAnchor.href" to a string on this simple web page which I showed you above. The str variable. And did the same pattern.exec and it worked.

But it does not work in my GM script. I don;t understand why it returns null for this match even though it works fine on this seperate examle.

My entire code is in the attachment. Null is returned at line 214

expert.user.txt
Yes, that is weird.  Line 214 where it is faiing can only be reached if line 197's match for .html fails:

line 197 ->  if (thisAnchor.href.match(/\.html/g)){

but the string you showed at the top of your previous message clearly has a ".html" in it.
Avatar of Tolgar

ASKER

Yes, that's right. I still couldn't figure why this happens.And it works fine on other pages similar to this one. I don't understand why it fails at that point even if it satisfies the if condition as you stated.

Thanks,

Gremlins.

It may not make a difference here, but I don't think you need the /g on most of the pattern match calls.

/g means "global" which makes it find all the matches if there are more than one and put the resullts in an array.

I think in most of your regexp matches you only want to find only one match each time through the loop.

Specifically I think you can get rid of the "g" flag on lines 172, 174, 179, 197, 199, 204, and 214.

I notice you debug log the value of thisAnchor.href in two places - lines 165 and 196.  When you look at the output log, do you see both these log lines next to each other as we would expect?

We'll get to the bottom of this eventually.
Avatar of Tolgar

ASKER

I don't know what you meant by next to each other, but when I see the error console in FF,  I can see the output of lines 165 and 196 as shown below:

=== PART of THE LOG TILL THE END ===

i = 41

All anchors before match ==>http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html

MATCHED regexp

Set SPAN NODE

http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html

APPEND IMAGE

thisAnchor.href===>'http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html';

pattern===>/(\/sandbox\/.*?\/)sbtest\/[^\/]+$/g

Open in new window


=== END of the LOG ===
Yes, thanks.  That's what I was wondering about - whether the two places you show the value of thisAnchor.href showed the same thing - and they do, as expected.  I don't see anything unusual about the string.

This is the end of the log file, just before it crashes on line 214, right?  So we know pretty  much for sure this has to be the href that is causing it to crash?

Here's a different idea:  Are you using Firefox, or IE?  If Firefox, try upgrading. We may be hitting the regexp bug they just fixed in the Firefox4 Beta release:

http://www.mozilla.com/en-US/firefox/beta/
Avatar of Tolgar

ASKER

>>This is the end of the log file, just before it crashes on line 214, right?  So we know pretty  much for sure this has to be the href that is causing it to crash?

Yes.

And this script is in greasemonkey script. So I have to use Firefox. I use Firefox 3.6.13. I checked the updates for Firefox and it didn't find any.

Thanks,
FF4 Beta is not an update, it has to be downloaded specifically.  Here's the link:

http://www.mozilla.com/en-US/firefox/beta/ 
Avatar of Tolgar

ASKER

Great catch !!! It worked after installing Firefox4 Beta

Thanks for the help,

WOW!  WHAT A RELIEF!!!

Thanks!