Solved

Output and page creation from html pages and a template...PERL/would mind making it in javascript

Posted on 2004-04-02
9
297 Views
Last Modified: 2011-09-20
Ok, This perl script scans a directory looking for <!-- balbal, yad yad --> on the first line of every .htm file and then looks at a template and matches the <!-- balbal, yad yad -->'s and inserts links.. Then displays the totals for each <!-- balbal, yad yad --> by displaying the blabla : 1000

What i would like to do with it now is make it display the GRAND total of all the links, and how many it had to dump due to no home. after the display of how many in each catogory.
Example :

Total Links : 2474
Total lost/dumped : 123

and with the dumps, dump them into a text file in link format <a href="http://www.mysite.com/whatever/me.htm>example</a>
for easy cut and paste

and the last option to perfect this script,
I need a prompt that ask ( Do you want to use the template to make a page for (E)each catogroy or (M)one main index page? )
The idea, main catorory is the way its designed now, with the one page..with all the links on it.
but, catogory would be a page for each <!-- new catorgory, blabla --> ...so in effect it takes the template and inserts the link after <!-- START --> instead of matching... reason i would like this option is in the times i have 1000+ links the htm file is pushing 1mb.. so i'd like a page for each catogory..may it be venues/artist..etc

------ add.pl ---------

#!/usr/bin/perl

# $dir is the directory with the html pages (e.g. c:\\concert)
# $wdir is a path the path to be inserted in the URLs (e.g. /whatever/concerts)
# $indextpl is the index template file (e.g. C:\\test\\index.tpl)
# $indexhtm is the final index file (e.g. c:\\test\\index.htm)
# The script has been tested under unix. For WIn32 you might need to change the slashes in paths from / to \\, and also the perl path in the first
# line.

$dir="C:\\web\\ticketstogo.com\\venues";
$wdir="http://www.ticketstogo.com/venues";
$indextpl = "C:\\web\\ticketstogo.com\\venues\\states\\tempindex.htm";
$indexhtm = "C:\\web\\ticketstogo.com\\venues\\states\\aindex.htm";

# read venue files
opendir(DH, $dir) or die "Can't open $dir for reading: $!";
while(defined($file=readdir(DH))) {
  next unless $file =~ /\.htm$/i;
  open(FILE, "<$dir/$file") or die "Can't open $file for reading: $!";
 # $ven{$1}{$2} = "$wdir/$file" if <FILE> =~ /^\<\!-- (.*), (.*) --\>$/;
    $ven{lc($1)}{$2} = "$wdir/$file" if <FILE> =~ /^\<\!-- (.*), (.*) --\>$/;
  close(FILE);
}

# process template
open(IDXIN, "<$indextpl") or die "Can't open $indextpl for reading: $!";
open(IDXOUT, ">$indexhtm") or die "Can't open $indexhtm for writing: $!";
while(<IDXIN>) {
  if(/^\<\!-- (.*) --\>$/) {
    my @vkeys = sort keys %{$ven{lc($1)}};
    print "$1: " . @vkeys . "\n";
    print IDXOUT map {"<A href=\"$ven{lc($1)}{$_}\">$_ event tickets</A><BR>\n"} @vkeys;
  } else {
    print IDXOUT;
  }
}
close(IDXIN);
close(IDXOUT);
0
Comment
Question by:Caiapfas
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
9 Comments
 
LVL 2

Author Comment

by:Caiapfas
ID: 10741611
oo yea , could we do this in javascript<<would perfer it.. I'm hooked on javascript..
0
 
LVL 11

Expert Comment

by:lbertacco
ID: 10741808
This will handle the dump thing:

#!/usr/bin/perl

# $dir is the directory with the html pages (e.g. c:\\concert)
# $wdir is a path the path to be inserted in the URLs (e.g. /whatever/concerts)
# $indextpl is the index template file (e.g. C:\\test\\index.tpl)
# $indexhtm is the final index file (e.g. c:\\test\\index.htm)
# $dumphtm is the output file containing links to pages not inserted in the index (e.g. c:\\test\\dump.htm)

# The script has been tested under unix. For WIn32 you might need to change the slashes in paths from / to \\, and also the perl path in the first
# line.

$dir="C:\\web\\ticketstogo.com\\venues";
$wdir="http://www.ticketstogo.com/venues";
$indextpl = "C:\\web\\ticketstogo.com\\venues\\states\\tempindex.htm";
$indexhtm = "C:\\web\\ticketstogo.com\\venues\\states\\aindex.htm";
$dumphtm = "C:\\web\\ticketstogo.com\\venues\\states\\dump.htm";

# read venue files
opendir(DH, $dir) or die "Can't open $dir for reading: $!";
while(defined($file=readdir(DH))) {
  next unless $file =~ /\.htm$/i;
  open(FILE, "<$dir/$file") or die "Can't open $file for reading: $!";
 # $ven{$1}{$2} = "$wdir/$file" if <FILE> =~ /^\<\!-- (.*), (.*) --\>$/;
    $ven{lc($1)}{$2} = "$wdir/$file" if <FILE> =~ /^\<\!-- (.*), (.*) --\>$/;
  close(FILE);
}

# process template
my $tot=0;
open(IDXIN, "<$indextpl") or die "Can't open $indextpl for reading: $!";
open(IDXOUT, ">$indexhtm") or die "Can't open $indexhtm for writing: $!";
while(<IDXIN>) {
  if(/^\<\!-- (.*) --\>$/) {
    my @vkeys = sort keys %{$ven{lc($1)}};
    print IDXOUT map {"<A href=\"$ven{lc($1)}{$_}\">$_ event tickets</A><BR>\n"} @vkeys;
    $tot += @vkeys;
    print "$1: " . @vkeys . "\n";
    delete($ven{$1});
  } else {
    print IDXOUT;
  }
}
close(IDXIN);
close(IDXOUT);
print "Total linked: $tot\n";

# dump leftovers
print "\nOrphans:\n";
$tot=0;
open(ORPOUT, ">$dumphtm") or die "Can't open $dumphtm for writing: $!";
foreach $k (sort keys %ven) {
    my @vkeys = sort keys %{$ven{$k}};
    print ORPOUT map {"<A href=\"$ven{$k}{$_}\">$_</A><BR>\n"} @vkeys;
    $tot += @vkeys;
    print "$k: " . @vkeys . "\n";
}
close(ORPOUT);
print "Total dumped: $tot\n";


-----
for the "each category index page" , I think it's better to just make a (slightly different) separate script. I'll try to do that later.
0
 
LVL 2

Author Comment

by:Caiapfas
ID: 10742093
ok, the dump is outputing the good linked also.. so I dont know which is good and which is bad...example :

Total linked : 2993
Total dumped : 3209


there is only 3209 pages..?

its mistaking everything as orphaned..
0
Transaction Monitoring Vs. Real User Monitoring

Synthetic Transaction Monitoring Vs. Real User Monitoring: When To Use Each Approach? In this article, we will discuss two major monitoring approaches: Synthetic Transaction and Real User Monitoring.

 
LVL 11

Expert Comment

by:lbertacco
ID: 10742286
you are right, change line
delete($ven{$1});
to
delete($ven{lc($1)});
0
 
LVL 2

Author Comment

by:Caiapfas
ID: 10742474
ok , perfect..
how would i go about adding which catogory ..it belongs to ...example in the dump.htm
just a long list of urls/links. but before the link i'd like the catogroy it belongs too. for reconigtion.
example :

Alaska << this is the catogroy - bal bal << this is the link
Alaska - lala bala
Alaska - laoal
Texas - meme
Alabama - lolhe
Spain - youto


for the "each category index page" , I think it's better to just make a (slightly different) separate script. I'll try to do that later. <<< Wouldnt it be easier to add to this script..i have been working on it, but made very little progress, unless you call errors progress...lol

0
 
LVL 11

Expert Comment

by:lbertacco
ID: 10742528
add this line after the "foreach" line:
print ORPOUT "<HR><P>\nOrphaned links under category -- <BIG><B> $k </B></BIG> --</P>\n";
0
 
LVL 2

Author Comment

by:Caiapfas
ID: 10758860
lbertacco,

Any luck on adding the new feature?
0
 
LVL 11

Accepted Solution

by:
lbertacco earned 500 total points
ID: 10760340
I belived you were going to open a new question, anyway here is it. Not much tested.
Run with command
<scriptname> M
for old beheviour
and
<scriptname> E
for new "each category" behaviour

In the latter case, two new variable define file and paths:
$cattpl the template for each category with the string <!-- START --> inside
$catdir path where category files should be created

#!/usr/bin/perl

$dir="/win/tmp";
$wdir="ven";
$indextpl = "/win/indextpl.htm";
$indexhtm = "/win/index.htm";
$dumphtm = "/win/dump.htm";
$cattpl = "/win/cat.tpl";
$catdir = "/win/tmp/o";

if($#ARGV != 0 || $ARGV[0] !~ /^[me]$/i) {
  print "Usage: $0 {m|e}\n";
  exit;
}

# read venue files
opendir(DH, $dir) or die "Can't open $dir for reading: $!";
while(defined($file=readdir(DH))) {
  next unless $file =~ /\.htm$/i;
  open(FILE, "<$dir/$file") or die "Can't open $file for reading: $!";
  $ven{lc($1)}{$2} = "$wdir/$file" if <FILE> =~ /^\<\!-- (.*), (.*) --\>$/;
  close(FILE);
}

# process template
my $tot=0;
if(lc($ARGV[0]) eq "m") {
  open(IDXIN, "<$indextpl") or die "Can't open $indextpl for reading: $!";
  open(IDXOUT, ">$indexhtm") or die "Can't open $indexhtm for writing: $!";
  while(<IDXIN>) {
    if(/^\<\!-- (.*) --\>$/) {
      my @vkeys = sort keys %{$ven{lc($1)}};
      print IDXOUT map {"<A href=\"$ven{lc($1)}{$_}\">$_</A><BR>\n"} @vkeys;
      $tot += @vkeys;
      print "\L$1: " . @vkeys . "\n";
      delete($ven{lc($1)});
    } else {
      print IDXOUT;
    }
  }
  close(IDXIN);
  close(IDXOUT);
  print "Total: $tot\n";

  # dump leftovers
  print "\nOrphans:\n";
  $tot=0;
  open(ORPOUT, ">$dumphtm") or die "Can't open $indexhtm for writing: $!";
  foreach $k (sort keys %ven) {
    my @vkeys = sort keys %{$ven{$k}};
    print ORPOUT map {"<A href=\"$ven{$k}{$_}\">$_</A><BR>\n"} @vkeys;
    $tot += @vkeys;
    print "$k: " . @vkeys . "\n";
  }
  close(ORPOUT);
  print "Total: $tot\n";

} else {
      
  open(CATIN, "<$cattpl") or die "Can't open $cattpl for reading: $!";
  {
    local $/;
    $catfile = <CATIN>;
  }
  close(CATIN);
  $catfile =~ /^(.*)\<\!-- START --\>(.*)$/si;
  foreach $k (sort keys %ven) {
    my @vkeys = sort keys %{$ven{$k}};
    open(CATOUT, ">$catdir/$k.htm") or die "Can't open $catdir/$k.htm for writing: $!";
    print CATOUT $1;
    print CATOUT map {"<A href=\"$ven{$k}{$_}\">$_</A><BR>\n"} @vkeys;
    print CATOUT $2;
    close(CATOUT);
    $tot += @vkeys;
    print "$k: " . @vkeys . "\n";
  }
  print "Total: $tot\n";
}
0
 
LVL 2

Author Comment

by:Caiapfas
ID: 10761003
thanks, I opened a new question. If possible i would like it to do a find and replace..when under option e


http://www.experts-exchange.com/Programming/Q_20944346.html
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Does the idea of dealing with bits scare or confuse you? Does it seem like a waste of time in an age where we all have terabytes of storage? If so, you're missing out on one of the core tools in every professional programmer's toolbox. Learn how to …
Computer science students often experience many of the same frustrations when going through their engineering courses. This article presents seven tips I found useful when completing a bachelors and masters degree in computing which I believe may he…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Starting up a Project

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question