• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 154
  • Last Modified:

Splitting an URL string

Hello,

As a novice in regular expression I am struggling to find a short (and easy?) way to split any given URL in its subparts

As far as I know an URL can consist of the following  parts

Protocol (http: or https or ..) (required)
host (eg: www.host.org) (required)
port (eg 80) (optional)
username (user) (optional)
password (pass) (optional)
webpage (eg: /home/index.html) (optional)
data (eg: val1=test&val2=name) (optional)

The most I find on Internet is how to get the data (and to split these) but not how the get all the rest as well. So, what Perl regex or coding will give this back to my code?

Regards

Marc




0
Marc_Engrie
Asked:
Marc_Engrie
  • 4
  • 3
1 Solution
 
ozoCommented:
use Regexp::Common qw /URI/;
 my($uri,$scheme,$host,$port,undef,undef,$path,$query) = /$RE{URI}{HTTP}{-keep}/;
0
 
Marc_EngrieAuthor Commented:
It looks like that could solve my problem.
However I got 2 more questions:

I guess this statement takes $_ as input? If so, what if the URL must come from a var eg: $url_string

I am trying your coding using ActivePerl (on Windosw). But Perl complains that it can not find Regexp module. I know how to use ppm to install an extra module but I can not locate the module. Do you happen to know the module name to install?

Thx in advance

Marc
0
 
ozoCommented:
$url_string  = 'http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Regexp/Common/URI.pm';
$url_string =~ /$RE{URI}{HTTP}{-keep}/
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
ozoCommented:
ppm i Rexexp-Common
0
 
Marc_EngrieAuthor Commented:
Got the module -> Thx

Still one more issue

#my $url_string  = 'http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Regexp/Common/URI.pm?test=1';
my $url_string  = 'http://search.cpan.org:80/~abigail/Regexp-Common-2.120/lib/Regexp/Common/URI.pm?test=1';
$url_string =~ /$RE{URI}{HTTP}{-keep}/;
printf("url: %s\n,scheme: %s\n,host2: %s\n,port2: %s\n,path: %s\n\n",$1,$2,$3,$4,$5);

running above will work.
But running is with the commented line will give uninitialized value in the print because there is no port in the URL. Is there a trick to prevent/capture the uninitialized value?

(sorry for this probably basic Perl question :-( )
0
 
ozoCommented:
printf("url: %s\n,scheme: %s\n,host2: %s\n,port2: %s\n,path: %s\n\n",$1,$2,$3||'',$4,$5);
0
 
Marc_EngrieAuthor Commented:
Yep.
from here on I can walk along again :-)

Thx a lot for helping me out and 'teaching' me extra tricks in Perl.

Have a great WE

Marc
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 4
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now