Solved

Output and page creation from html pages and a template...PERL/would mind making it in javascript

Posted on 2004-04-02
9
295 Views
Last Modified: 2011-09-20
Ok, This perl script scans a directory looking for <!-- balbal, yad yad --> on the first line of every .htm file and then looks at a template and matches the <!-- balbal, yad yad -->'s and inserts links.. Then displays the totals for each <!-- balbal, yad yad --> by displaying the blabla : 1000

What i would like to do with it now is make it display the GRAND total of all the links, and how many it had to dump due to no home. after the display of how many in each catogory.
Example :

Total Links : 2474
Total lost/dumped : 123

and with the dumps, dump them into a text file in link format <a href="http://www.mysite.com/whatever/me.htm>example</a>
for easy cut and paste

and the last option to perfect this script,
I need a prompt that ask ( Do you want to use the template to make a page for (E)each catogroy or (M)one main index page? )
The idea, main catorory is the way its designed now, with the one page..with all the links on it.
but, catogory would be a page for each <!-- new catorgory, blabla --> ...so in effect it takes the template and inserts the link after <!-- START --> instead of matching... reason i would like this option is in the times i have 1000+ links the htm file is pushing 1mb.. so i'd like a page for each catogory..may it be venues/artist..etc

------ add.pl ---------

#!/usr/bin/perl

# $dir is the directory with the html pages (e.g. c:\\concert)
# $wdir is a path the path to be inserted in the URLs (e.g. /whatever/concerts)
# $indextpl is the index template file (e.g. C:\\test\\index.tpl)
# $indexhtm is the final index file (e.g. c:\\test\\index.htm)
# The script has been tested under unix. For WIn32 you might need to change the slashes in paths from / to \\, and also the perl path in the first
# line.

$dir="C:\\web\\ticketstogo.com\\venues";
$wdir="http://www.ticketstogo.com/venues";
$indextpl = "C:\\web\\ticketstogo.com\\venues\\states\\tempindex.htm";
$indexhtm = "C:\\web\\ticketstogo.com\\venues\\states\\aindex.htm";

# read venue files
opendir(DH, $dir) or die "Can't open $dir for reading: $!";
while(defined($file=readdir(DH))) {
  next unless $file =~ /\.htm$/i;
  open(FILE, "<$dir/$file") or die "Can't open $file for reading: $!";
 # $ven{$1}{$2} = "$wdir/$file" if <FILE> =~ /^\<\!-- (.*), (.*) --\>$/;
    $ven{lc($1)}{$2} = "$wdir/$file" if <FILE> =~ /^\<\!-- (.*), (.*) --\>$/;
  close(FILE);
}

# process template
open(IDXIN, "<$indextpl") or die "Can't open $indextpl for reading: $!";
open(IDXOUT, ">$indexhtm") or die "Can't open $indexhtm for writing: $!";
while(<IDXIN>) {
  if(/^\<\!-- (.*) --\>$/) {
    my @vkeys = sort keys %{$ven{lc($1)}};
    print "$1: " . @vkeys . "\n";
    print IDXOUT map {"<A href=\"$ven{lc($1)}{$_}\">$_ event tickets</A><BR>\n"} @vkeys;
  } else {
    print IDXOUT;
  }
}
close(IDXIN);
close(IDXOUT);
0
Comment
Question by:Caiapfas
  • 5
  • 4
9 Comments
 
LVL 2

Author Comment

by:Caiapfas
ID: 10741611
oo yea , could we do this in javascript<<would perfer it.. I'm hooked on javascript..
0
 
LVL 11

Expert Comment

by:lbertacco
ID: 10741808
This will handle the dump thing:

#!/usr/bin/perl

# $dir is the directory with the html pages (e.g. c:\\concert)
# $wdir is a path the path to be inserted in the URLs (e.g. /whatever/concerts)
# $indextpl is the index template file (e.g. C:\\test\\index.tpl)
# $indexhtm is the final index file (e.g. c:\\test\\index.htm)
# $dumphtm is the output file containing links to pages not inserted in the index (e.g. c:\\test\\dump.htm)

# The script has been tested under unix. For WIn32 you might need to change the slashes in paths from / to \\, and also the perl path in the first
# line.

$dir="C:\\web\\ticketstogo.com\\venues";
$wdir="http://www.ticketstogo.com/venues";
$indextpl = "C:\\web\\ticketstogo.com\\venues\\states\\tempindex.htm";
$indexhtm = "C:\\web\\ticketstogo.com\\venues\\states\\aindex.htm";
$dumphtm = "C:\\web\\ticketstogo.com\\venues\\states\\dump.htm";

# read venue files
opendir(DH, $dir) or die "Can't open $dir for reading: $!";
while(defined($file=readdir(DH))) {
  next unless $file =~ /\.htm$/i;
  open(FILE, "<$dir/$file") or die "Can't open $file for reading: $!";
 # $ven{$1}{$2} = "$wdir/$file" if <FILE> =~ /^\<\!-- (.*), (.*) --\>$/;
    $ven{lc($1)}{$2} = "$wdir/$file" if <FILE> =~ /^\<\!-- (.*), (.*) --\>$/;
  close(FILE);
}

# process template
my $tot=0;
open(IDXIN, "<$indextpl") or die "Can't open $indextpl for reading: $!";
open(IDXOUT, ">$indexhtm") or die "Can't open $indexhtm for writing: $!";
while(<IDXIN>) {
  if(/^\<\!-- (.*) --\>$/) {
    my @vkeys = sort keys %{$ven{lc($1)}};
    print IDXOUT map {"<A href=\"$ven{lc($1)}{$_}\">$_ event tickets</A><BR>\n"} @vkeys;
    $tot += @vkeys;
    print "$1: " . @vkeys . "\n";
    delete($ven{$1});
  } else {
    print IDXOUT;
  }
}
close(IDXIN);
close(IDXOUT);
print "Total linked: $tot\n";

# dump leftovers
print "\nOrphans:\n";
$tot=0;
open(ORPOUT, ">$dumphtm") or die "Can't open $dumphtm for writing: $!";
foreach $k (sort keys %ven) {
    my @vkeys = sort keys %{$ven{$k}};
    print ORPOUT map {"<A href=\"$ven{$k}{$_}\">$_</A><BR>\n"} @vkeys;
    $tot += @vkeys;
    print "$k: " . @vkeys . "\n";
}
close(ORPOUT);
print "Total dumped: $tot\n";


-----
for the "each category index page" , I think it's better to just make a (slightly different) separate script. I'll try to do that later.
0
 
LVL 2

Author Comment

by:Caiapfas
ID: 10742093
ok, the dump is outputing the good linked also.. so I dont know which is good and which is bad...example :

Total linked : 2993
Total dumped : 3209


there is only 3209 pages..?

its mistaking everything as orphaned..
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 11

Expert Comment

by:lbertacco
ID: 10742286
you are right, change line
delete($ven{$1});
to
delete($ven{lc($1)});
0
 
LVL 2

Author Comment

by:Caiapfas
ID: 10742474
ok , perfect..
how would i go about adding which catogory ..it belongs to ...example in the dump.htm
just a long list of urls/links. but before the link i'd like the catogroy it belongs too. for reconigtion.
example :

Alaska << this is the catogroy - bal bal << this is the link
Alaska - lala bala
Alaska - laoal
Texas - meme
Alabama - lolhe
Spain - youto


for the "each category index page" , I think it's better to just make a (slightly different) separate script. I'll try to do that later. <<< Wouldnt it be easier to add to this script..i have been working on it, but made very little progress, unless you call errors progress...lol

0
 
LVL 11

Expert Comment

by:lbertacco
ID: 10742528
add this line after the "foreach" line:
print ORPOUT "<HR><P>\nOrphaned links under category -- <BIG><B> $k </B></BIG> --</P>\n";
0
 
LVL 2

Author Comment

by:Caiapfas
ID: 10758860
lbertacco,

Any luck on adding the new feature?
0
 
LVL 11

Accepted Solution

by:
lbertacco earned 500 total points
ID: 10760340
I belived you were going to open a new question, anyway here is it. Not much tested.
Run with command
<scriptname> M
for old beheviour
and
<scriptname> E
for new "each category" behaviour

In the latter case, two new variable define file and paths:
$cattpl the template for each category with the string <!-- START --> inside
$catdir path where category files should be created

#!/usr/bin/perl

$dir="/win/tmp";
$wdir="ven";
$indextpl = "/win/indextpl.htm";
$indexhtm = "/win/index.htm";
$dumphtm = "/win/dump.htm";
$cattpl = "/win/cat.tpl";
$catdir = "/win/tmp/o";

if($#ARGV != 0 || $ARGV[0] !~ /^[me]$/i) {
  print "Usage: $0 {m|e}\n";
  exit;
}

# read venue files
opendir(DH, $dir) or die "Can't open $dir for reading: $!";
while(defined($file=readdir(DH))) {
  next unless $file =~ /\.htm$/i;
  open(FILE, "<$dir/$file") or die "Can't open $file for reading: $!";
  $ven{lc($1)}{$2} = "$wdir/$file" if <FILE> =~ /^\<\!-- (.*), (.*) --\>$/;
  close(FILE);
}

# process template
my $tot=0;
if(lc($ARGV[0]) eq "m") {
  open(IDXIN, "<$indextpl") or die "Can't open $indextpl for reading: $!";
  open(IDXOUT, ">$indexhtm") or die "Can't open $indexhtm for writing: $!";
  while(<IDXIN>) {
    if(/^\<\!-- (.*) --\>$/) {
      my @vkeys = sort keys %{$ven{lc($1)}};
      print IDXOUT map {"<A href=\"$ven{lc($1)}{$_}\">$_</A><BR>\n"} @vkeys;
      $tot += @vkeys;
      print "\L$1: " . @vkeys . "\n";
      delete($ven{lc($1)});
    } else {
      print IDXOUT;
    }
  }
  close(IDXIN);
  close(IDXOUT);
  print "Total: $tot\n";

  # dump leftovers
  print "\nOrphans:\n";
  $tot=0;
  open(ORPOUT, ">$dumphtm") or die "Can't open $indexhtm for writing: $!";
  foreach $k (sort keys %ven) {
    my @vkeys = sort keys %{$ven{$k}};
    print ORPOUT map {"<A href=\"$ven{$k}{$_}\">$_</A><BR>\n"} @vkeys;
    $tot += @vkeys;
    print "$k: " . @vkeys . "\n";
  }
  close(ORPOUT);
  print "Total: $tot\n";

} else {
      
  open(CATIN, "<$cattpl") or die "Can't open $cattpl for reading: $!";
  {
    local $/;
    $catfile = <CATIN>;
  }
  close(CATIN);
  $catfile =~ /^(.*)\<\!-- START --\>(.*)$/si;
  foreach $k (sort keys %ven) {
    my @vkeys = sort keys %{$ven{$k}};
    open(CATOUT, ">$catdir/$k.htm") or die "Can't open $catdir/$k.htm for writing: $!";
    print CATOUT $1;
    print CATOUT map {"<A href=\"$ven{$k}{$_}\">$_</A><BR>\n"} @vkeys;
    print CATOUT $2;
    close(CATOUT);
    $tot += @vkeys;
    print "$k: " . @vkeys . "\n";
  }
  print "Total: $tot\n";
}
0
 
LVL 2

Author Comment

by:Caiapfas
ID: 10761003
thanks, I opened a new question. If possible i would like it to do a find and replace..when under option e


http://www.experts-exchange.com/Programming/Q_20944346.html
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
If you’re thinking to yourself “That description sounds a lot like two people doing the work that one could accomplish,” you’re not alone.
With the power of JIRA, there's an unlimited number of ways you can customize it, use it and benefit from it. With that in mind, there's bound to be things that I wasn't able to cover in this course. With this summary we'll look at some places to go…

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question