Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 191
  • Last Modified:

HTML:Parser cgi script (part 2)

My previous question asked for:-

1. The first 8 characters of the url ie: ' awebsite ' or everything between www. and .com (or whatever after . ie .net .co.uk .org) if the address has less than 8 characters. If the address is a repeat then add number
(incremented) ' awebsite_1 '.

answer accepted (snip):-

while( <L> ){
   chomp;
   ($u) = m'//(?:www\.)?([^.]{0,8})';
   if( $n = $n{$u}++ ){ $u .= "_$n"; }


I need the code to return only characters (letters of the alphabet) & numbers ( 0 - 9 )so if there were any other characters in the web address such as a forward slash ( / )it would ignore those.

0
malkie
Asked:
malkie
1 Solution
 
trevorwCommented:
Hi,

You can replace all non-alphanumeric characters in the url's as follows:

while (<L>) {
  ($u) = m'//(?:www\.)?([^.]{0,8})';
  $u =~ s/\W//g;
  if( $n = $n{$u}++ ){ $u .= "_$n"; }
}

There is probably a way to incorporate this into the original regexp but I'm not too hot on them :)

Hope this helps.

Best regards,
Trevor
0
 
malkieAuthor Commented:
Thanks Trevor works okay for me God Bless
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now