Solved

How to extract part of a string using regular expressions in JS?

Posted on 2011-02-25
22
597 Views
Last Modified: 2012-05-11
Hi,
I wan to capture a part of a string using regular expressions in JavaScript. The example case is below.

Example:

http://www-internal.mywork.com/mywork/devel/sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/sbtest/myweb.html

In this string I want to capture the following:

/sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/

I showed 4 directories but this number may vary. The only thing that I am sure of is the I want to capture the part upto sbtest. I don't want to include sbtest in the part that I capture.

Thanks,
0
Comment
Question by:Tolgar
  • 10
  • 9
  • 3
22 Comments
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34983631
What is different about this question than the one you previously asked?
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34983674
kaufmed - difference is he doesn't want the "sbtest" final subdirectory included anymore.

I'll let you take this one.  :)
0
 

Author Comment

by:Tolgar
ID: 34983808
Oh yes, I don't want the "sbtest" final subdirectory included.

Thanks,
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34983961
@sjklein42

>>  I'll let you take this one

I bow to you, sir.


@Tolgar

Then you should be able to modify sjklein42's previous solution to accommodate:
sandbox\/(.*?)[^\/]+/[^\/]+$

Open in new window

0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34984068
kaufmed, you are my hero.
0
 

Author Comment

by:Tolgar
ID: 34984083
This does capture till the end including .html. I only want till the beginning of sbtest

Thanks,
0
 
LVL 16

Accepted Solution

by:
sjklein42 earned 250 total points
ID: 34984136
The expression is this:

(\/sandbox\/.*?\/)sbtest\/[^\/]+$

Open in new window


$x = q{http://www-internal.mywork.com/mywork/devel/sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/sbtest/myweb.html};

$x =~ /(\/sandbox\/.*?\/)sbtest\/[^\/]+$/;

print $1;

Open in new window


c:\temp>perl foo.pl
/sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/

Open in new window

0
 

Author Comment

by:Tolgar
ID: 34984234
ok but it does not work as in your case in my code

var sandboxUsernameSBLocglnx = sbcheckLink.match(/(\/sandbox\/.*?\/)sbtest\/[^\/]+$/g);

Open in new window


sbcheckLink is the actual string. AndI expect "sandboxUsernameSBLocglnx" to be exactly what you got.

Could there be a difference between Perl and JavaScript?


Thanks,
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34984343
Tiny test HTML page with Javascript

<script>
var str = 'http://www-internal.mywork.com/mywork/devel/sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/sbtest/myweb.html';

var pat = /(\/sandbox\/.*?\/)sbtest\/[^\/]+$/;
var mymatch = pat.exec(str);
alert( mymatch[1] )
</script>

Open in new window


I get an alert box showing /sandbox/tolgar/Dir1/Dir2/Dir3/Dir4/

What do you get?
0
 

Author Comment

by:Tolgar
ID: 34985195
Hi again,
ok it works in your example. However, in my case I have multiple matches and this one only changes the first match. How can  imake it greedy?

Thanks,
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34985242
So if I understand the problem now,  your input variable ( sbcheckLink ) contains more than one of these URLs, and you want to find them all?  Please post a more complete input example so I can see what you are dealing with and how to separate out the URLs from the rest of the string.  If that is the case.
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 34985582
@sjklein42

>>  kaufmed, you are my hero.

As long as I'm not "the wind beneath your wings," then I'm cool   = )


@Tolgar

It's important, especially with regex, to clearly define you input and expected output. Your initial example only described a single occurrence, but now the impression is that multiple occurrences can exist. With regex, whereas one particular pattern may match one set of data, it may be meaningless to another set of data, as you should now be becoming familiar with. Just as changing one character in a pattern can drastically change the meaning of the pattern, so too can changing the input data drastically change the result of a pattern matching or not matching  :  )
0
 

Author Comment

by:Tolgar
ID: 34985696
As in your example I tried exactly my case on a seperate simple webpage:

<script>
var str = 'http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html';

var pat = /(\/sandbox\/.*?\/)sbtest\/[^\/]+$/;
var mymatch = pat.exec(str);
alert( mymatch[1] )
</script>

Open in new window

This returns:

/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/

which is totaly what I want.

However, in my Greasemonkey script there is one case where this returns null. It is regardless of how many matches  on the page. Because on other pages where there are multiple matches it works fine. However only on one page it does what I want for the first 3 matches then it does not do the same for the rest of the matches.

I checked the error console and I see that the following line return null:

var sandboxUsernameSBLocglnx  = pattern.exec(thisAnchor.href);

Open in new window


Since "sandboxUsernameSBLocglnx" is null it stops I guess. Becuase this what I see last in the error console messages tab even though I put GM log command for the next lines. Then I copied "thisAnchor.href" to a string on this simple web page which I showed you above. The str variable. And did the same pattern.exec and it worked.

But it does not work in my GM script. I don;t understand why it returns null for this match even though it works fine on this seperate examle.

My entire code is in the attachment. Null is returned at line 214

expert.user.txt
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34985787
Yes, that is weird.  Line 214 where it is faiing can only be reached if line 197's match for .html fails:

line 197 ->  if (thisAnchor.href.match(/\.html/g)){

but the string you showed at the top of your previous message clearly has a ".html" in it.
0
 

Author Comment

by:Tolgar
ID: 34988622
Yes, that's right. I still couldn't figure why this happens.And it works fine on other pages similar to this one. I don't understand why it fails at that point even if it satisfies the if condition as you stated.

Thanks,

0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34989033
Gremlins.

It may not make a difference here, but I don't think you need the /g on most of the pattern match calls.

/g means "global" which makes it find all the matches if there are more than one and put the resullts in an array.

I think in most of your regexp matches you only want to find only one match each time through the loop.

Specifically I think you can get rid of the "g" flag on lines 172, 174, 179, 197, 199, 204, and 214.

I notice you debug log the value of thisAnchor.href in two places - lines 165 and 196.  When you look at the output log, do you see both these log lines next to each other as we would expect?

We'll get to the bottom of this eventually.
0
 

Author Comment

by:Tolgar
ID: 34989101
I don't know what you meant by next to each other, but when I see the error console in FF,  I can see the output of lines 165 and 196 as shown below:

=== PART of THE LOG TILL THE END ===

i = 41

All anchors before match ==>http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html

MATCHED regexp

Set SPAN NODE

http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html

APPEND IMAGE

thisAnchor.href===>'http://www-internal.mywork.com/mywork/dev/sandbox/tolgar/Abcde/prequal_testlog/Abcde_j146899_mytests/glnxa64/sbtest/myscanlog_results.html';

pattern===>/(\/sandbox\/.*?\/)sbtest\/[^\/]+$/g

Open in new window


=== END of the LOG ===
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34989203
Yes, thanks.  That's what I was wondering about - whether the two places you show the value of thisAnchor.href showed the same thing - and they do, as expected.  I don't see anything unusual about the string.

This is the end of the log file, just before it crashes on line 214, right?  So we know pretty  much for sure this has to be the href that is causing it to crash?

Here's a different idea:  Are you using Firefox, or IE?  If Firefox, try upgrading. We may be hitting the regexp bug they just fixed in the Firefox4 Beta release:

http://www.mozilla.com/en-US/firefox/beta/
0
 

Author Comment

by:Tolgar
ID: 34989296
>>This is the end of the log file, just before it crashes on line 214, right?  So we know pretty  much for sure this has to be the href that is causing it to crash?

Yes.

And this script is in greasemonkey script. So I have to use Firefox. I use Firefox 3.6.13. I checked the updates for Firefox and it didn't find any.

Thanks,
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34989319
FF4 Beta is not an update, it has to be downloaded specifically.  Here's the link:

http://www.mozilla.com/en-US/firefox/beta/
0
 

Author Comment

by:Tolgar
ID: 34989393
Great catch !!! It worked after installing Firefox4 Beta

Thanks for the help,

0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34989412
WOW!  WHAT A RELIEF!!!

Thanks!
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now