Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

RETRIEVING TEXT FILE CONTENT

Posted on 2001-07-15
2
Medium Priority
?
157 Views
Last Modified: 2013-12-25
i have a lot of html files.i want a perl script that will loop through each file and store the title and description values in two different arrays.

please find below the template of the html pages.

<html>
<head>
<title> i am the title</title>
<description> i describe this html page</description>
</head>
<body>
</body>
</html>

0
Comment
Question by:augblay
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 2

Accepted Solution

by:
psogaa earned 800 total points
ID: 6285949
use the perl script below, give as argument the target directory.

****************************************************

$targetDir = $ARGV[0];
opendir( DIR, $targetDir );
@files = grep( /\.html?$/,readdir( DIR )) or die "can't open dir: $!";
closedir( DIR );
@titles;
@descriptions;
foreach $file (@files){
  open( FILE, "$targetDir/$file" ) or die "can't open file: $!";  
  {
    undef( $/ );
    $fileContent = <FILE>;
  }  
  close FILE;
  $fileContent =~ /<title>(.*?)<\/title>.*?<description>(.*?)<\/description>/si;
  push( @titles, $1 );
  push( @descriptions, $2);    
}
0
 
LVL 1

Expert Comment

by:Moondancer
ID: 6419722
Open today, need more?
Moondancer
Community Support Moderator @ Experts Exchange
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question