?
Solved

String Matching

Posted on 2003-04-01
7
Medium Priority
?
304 Views
Last Modified: 2010-03-05
Ok, I want to parse a url. I want to take out an optional "http" an I'm not quite sure how to do it. Is there anyway to group the "http" so that you can add a "?" after it (making it optional)?

Also, I'm having a problem using "*." with matching. Say I have a string which I get from a text datafile "$name=Me&password=mypass&address=myaddress&status=active&hobby=computer design" and I want to change just the address field, but I don't know if the field are always going to be the same in the same order. So I try:

$string =~ s/&address=.*&/&$address=newaddress&/i;

But the .* will make it continue to the last "&". Any suggestions?

Thanks,
OKSD
0
Comment
Question by:OKSD
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
7 Comments
 
LVL 10

Expert Comment

by:rj2
ID: 8246791
#!/usr/bin/perl
$url='http://www.myhost.com/cgi-bin/test.pl';
$url =~ m!(?:http://)?([^/]*)!;
print "Host is $1\n";

$string='name=Me&password=mypass&address=myaddress&status=active&hobby=computer design';
$string =~ s/address=[^&]*/address=newaddress/;
print "$string\n";
0
 
LVL 11

Accepted Solution

by:
bcladd earned 140 total points
ID: 8246807
(1) Yes, you can make http optional with a question mark. Just group the part you want in parentheses _or_ in non-capturing parentheses if you are worried about speed. So the following matches the experts exchange url with or without the leading access specifier:

    if ($url =~ m{(http://)?www.experts-exchange.com}) {
      print "We have a match!\n";
    } else {
      print "No match!\n";
    }

(2) Use non-greedy quantification. *? is a non-greedy star (match zero or more but as few as possible). So your expression can be rewritten as
    $name =~ s/&address=.*?&/&address=$newaddress&/i;

Note that there is a problem with your code. You can replace the address so long as it is not the last field in the list of fields (you require the terminating &). It would make more sense to use

    $name =~ s/&address=[^&]*/&address=$newaddress/i;

(this takes care of the overly greedy feature of * so it is safe to use the greedy quantifier again).

Hope this helps, -bcl
0
 
LVL 1

Author Comment

by:OKSD
ID: 8249210
Well, It looks as if I've got two working answers. I think I'm going to have to give the points to bcladd because of the more descriptive answer. Thanks for the help though, rj2.

And bcladd, I am aware that there will need to be an ending ampersand (&), when I put the full thing together, I always pull some nonsense like null=null at both ends of the entry in the datafile for various other reasons (like line breaks). But, could you explain what [^&]* does? I though that "^" limited a match to the beginning of a string....

Thanks,
OKSD
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 11

Expert Comment

by:bcladd
ID: 8250096
^ has two different meanings in Perl R.E.:

(1) As you say, it is used to anchor a match to the beginning of the line (though the exact meaning of beginning of line can be modified with the /m and /s modifiers on the regular expression).

(2) When ^ appears as the FIRST character in a character class (between []), it means NOT. Thus [^&] is any character that is not & just as [^aeiou] is any character but a vowel (note: not any LETTER). Adding the star matches 0 or more of the preceding regular expression component so the character class of anything but an & is matched 0 or more times.

-bcl
0
 
LVL 1

Author Comment

by:OKSD
ID: 8262101
All right, I've got one more problem. I'm parsing the URL of a vaiable address that I get from $ENV{'HTTP_REFERER'} and I want to seperate out the domain name and the page. So I type:

#!/usr/bin/perl
$ENV{'HTTP_REFERER'} = "http://somedomain.com/page.html";
#I just set that for testing purposes
$url = $ENV{'HTTP_REFERER'};
$url =~ m{(http://)?([^/])(.*)}i;
$domain = $2;
$page = $3;
print "Content-type: text/html\n\n";
print "Domin is: $domain<p>\nPage is: $page";

and I get:

Domin is: s
Page is: omedomain.com/page.html

Sould you help me with that please?

Thanks!
OKSD
0
 
LVL 1

Author Comment

by:OKSD
ID: 8262135
All right, I've got one more problem. I'm parsing the URL of a vaiable address that I get from $ENV{'HTTP_REFERER'} and I want to seperate out the domain name and the page. So I type:

#!/usr/bin/perl
$ENV{'HTTP_REFERER'} = "http://somedomain.com/page.html";
#I just set that for testing purposes
$url = $ENV{'HTTP_REFERER'};
$url =~ m{(http://)?([^/])(.*)}i;
$domain = $2;
$page = $3;
print "Content-type: text/html\n\n";
print "Domin is: $domain<p>\nPage is: $page";

and I get:

Domin is: s
Page is: omedomain.com/page.html

Sould you help me with that please?

Thanks!
OKSD
0
 
LVL 1

Author Comment

by:OKSD
ID: 8262142
Never mind, I got it, I forgot to add the "*" after the brackets, like so:

$url =~ m{(http://)?([^/]*)(.*)}i;

OKSD
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question