Solved

RETRIEVING TEXT FILE CONTENT

Posted on 2001-07-15
2
148 Views
Last Modified: 2013-12-25
i have a lot of html files.i want a perl script that will loop through each file and store the title and description values in two different arrays.

please find below the template of the html pages.

<html>
<head>
<title> i am the title</title>
<description> i describe this html page</description>
</head>
<body>
</body>
</html>

0
Comment
Question by:augblay
2 Comments
 
LVL 2

Accepted Solution

by:
psogaa earned 200 total points
ID: 6285949
use the perl script below, give as argument the target directory.

****************************************************

$targetDir = $ARGV[0];
opendir( DIR, $targetDir );
@files = grep( /\.html?$/,readdir( DIR )) or die "can't open dir: $!";
closedir( DIR );
@titles;
@descriptions;
foreach $file (@files){
  open( FILE, "$targetDir/$file" ) or die "can't open file: $!";  
  {
    undef( $/ );
    $fileContent = <FILE>;
  }  
  close FILE;
  $fileContent =~ /<title>(.*?)<\/title>.*?<description>(.*?)<\/description>/si;
  push( @titles, $1 );
  push( @descriptions, $2);    
}
0
 
LVL 1

Expert Comment

by:Moondancer
ID: 6419722
Open today, need more?
Moondancer
Community Support Moderator @ Experts Exchange
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Introduction This tutorial will give you a fast look what you can do with WhizBase. I expect you already know how to work with HTML at least, and that you understand the basics of the internet and how the internet works. WhizBase is a server-s…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now