Biffo
asked on
Removing duplicate URL's from search cache
I made a custom search script and I like to remove all the duplicate URLs that get mixed into the results.
My scripts returns via STDOUT:
Title:
Discription:
URL:
I am thinking I better combine title and URL together prior to sorting and dupe removal. I can sort it, just never seen anyting on dupe removal in any of the perl references I have, and not even sure what is the proper method for combining Titel & URL to a one line link.
My scripts returns via STDOUT:
Title:
Discription:
URL:
I am thinking I better combine title and URL together prior to sorting and dupe removal. I can sort it, just never seen anyting on dupe removal in any of the perl references I have, and not even sure what is the proper method for combining Titel & URL to a one line link.
Oh, well, that should have been an answer, nu?
ASKER
This is how my results appear below, notice the title, brief discription and URL. As you can see, the results need a little formating to make them presentable :-)
1. (title: modperl Archive: Re: Problem Compiling Mod-Perl -- description: Problems Compiling Mod-Perl -- httpd...
http://outside.organic.com/mail-rchives/modperl/
2. (title: Perl 5 How-To,
description: Perl 5 How-To. The Definitive Perl Programming Problem-Solver. Author: Aidan Humphreys Mike Glover Ed Weiss Publishing Information Publication Date: May...,
http://www.techexpo.com/bookfair/macmillan/perl5ht.html
1. (title: modperl Archive: Re: Problem Compiling Mod-Perl -- description: Problems Compiling Mod-Perl -- httpd...
http://outside.organic.com/mail-rchives/modperl/
2. (title: Perl 5 How-To,
description: Perl 5 How-To. The Definitive Perl Programming Problem-Solver. Author: Aidan Humphreys Mike Glover Ed Weiss Publishing Information Publication Date: May...,
http://www.techexpo.com/bookfair/macmillan/perl5ht.html
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
By the way, it's considered courteous to grade questions when you get
an answer. (You have two ungraded questions right now, some effort
was put forth on your behalf because you requested it... it would seem
proper to at least acknowledge that effort)
an answer. (You have two ungraded questions right now, some effort
was put forth on your behalf because you requested it... it would seem
proper to at least acknowledge that effort)
two different questions here.
1.) Removing duplicates and combining
Throw each URL into a hash. For instance, if your search function
comes up with the following URL's
HTTP://www.abc.com/a.html
HTTP://www.def.com/b.html
HTTP://www.abc.com/a.html
HTTP://www.abc.com/c.html
HTTP://www.def.com/a.html
then you say something like
$title = '' unless defined($title);
$URLS{$url} = $title;
then any duplicate urls will disappear.
2.) Sorting, combining, etc.
To sort these by url, you can just
foreach (sort keys %URLS) {
## Do whatever you want
}
If you need to do that case-insensitive...
foreach (sort {uc($a) cmp uc($b)} keys %URLS) {
## Do whatever you want
}
Finally, if you want to sort by title...
foreach (sort {$URLS{$a} cmp $URLS{$b}} keys %URLS ) {
## Do whatever you want
}
I'll let you figure out how to do that last one with no regard to case