Link to home
Start Free TrialLog in
Avatar of faithless1
faithless1

asked on

Content Extraction

Hi,

I'm looking for a way extract content from a file containing a list of URLs on each line and  output all content to a file.

Thank you.
Avatar of Cong Minh Vo
Cong Minh Vo
Flag of Viet Nam image

TEST = "test.txt";
open(TEST) or die("Could not open log file.");
foreach $line (<TEST>) {
    chomp($line);              # remove the newline from $line.
    # do line-by-line processing.
}
Avatar of Tintin
Tintin

What format is the file with URL's in and what format do you want the output to be?
Avatar of faithless1

ASKER

Both files are .txt format

Thank you
It's still not clear what you are wanting to do.

OK, you have a file with a list of URL's.  Are they listed one per line, eg:

http://example.com
http://example.com/page

Then, what do you want written to the new file?
Hi,

I want to extract all content from each URL and pipe the results to a .txt file. Only words and numbers and exclude images.

Thank you
Minhvc,

I'm getting this error:
Can't modify constant item in scalar assignment at content.pl line 9, near ""test.txt";"
Global symbol "$line" requires explicit package name at content.pl line 11.
Global symbol "$line" requires explicit package name at content.pl line 12.
Bareword "TEST" not allowed while "strict subs" in use at content.pl line 9.
Execution of content.pl aborted due to compilation errors.

When running:

#!/usr/local/bin/perl

use strict;
use warnings;




TEST = "test.txt";
open(TEST) or die("Could not open log file.");
foreach $line (<TEST>) {
    chomp($line);              # remove the newline from $line.
    # do line-by-line processing.
}
ASKER CERTIFIED SOLUTION
Avatar of Fero45
Fero45

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial