Link to home
Start Free TrialLog in
Avatar of savantmarketing
savantmarketing

asked on

Perl WWW-Mechanize issue when downloading CSV files

I'm trying to use Perl and WWW::Mechanize to login to webmail and download an attachemt (CSV file).

The script is able to get the file but the problem is, I'm getting garbled characters on the CSV file.  This does not happen if I download the CSV file using a browser.

Here is the code that I am using:

use WWW::Mechanize;
use HTTP::Cookies;

$mech = WWW::Mechanize->new();
$cook = $zangomech->cookie_jar(HTTP::Cookies->new(file => "cookies.txt", autosave => 1,));
$zangomech->get('http://mail.mydomain.com/login.php');

$mech->form_number(1);
$mech->field('login_username' => 'username');
$mech->field('secretkey' => 'pass');
$mech->click();

$mech->add_header('Content-Type' => 'text/plain',  'charset' => 'utf-8');
$mech->get('http://mail.mydomain.com');

$mech->follow_link( text => "emaillink", n => 1);
$mech->follow_link( text => "Download", n => 2);
$output = $mech->content();

open(OUTFILE, ">file.csv");
print OUTFILE "$output";
close(OUTFILE);

Below is what I see when I open up the CSV file, i just changed the filename ext from .csv to .txt so it could be loaded on a browser.:
- - - - - -

http://traffic-director.net/testfiles/test.txt

I'm really not sure what these are so I couldn't d any cleanup programatically.

Thanks in advance.
Avatar of Adam314
Adam314

If you use a browser, and inspect the headers, does it use charset utf-8?
What do you get when you download the file with a browser
Avatar of savantmarketing

ASKER

Yeah, the headers say that it uses utf8.
When I download the file using a browser and open up the file in notepad, the encoding is in unicode though.
There are 2 bytes to represent every character.  Do you want to convert it to ASCII?  You will lose info if any characters have ascii codes >255.  Otherwise you just need a viewer that can view unicode files.
Here is what's happening.

If I download the file manually, I still get the same issue.  I need to open up the file in notepad and then go to File > Save As, then change the file encoding:from unicode to ANSI of UTF-8.

Question:

It is possible to change the encoding programatically using PERL or PHP?
use Text::Unidecode;
$unaccented = unidecode($output);
print OUTFILE $unaccented;

Thanks for the suggetion.

I tried the code and I'm getting almost  the same results.  The only difference is that the garbled characters are being written in between each letter on the CSV file.
ASKER CERTIFIED SOLUTION
Avatar of Adam314
Adam314

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That did it.

Plus I had to specify the correct encoding when writing to the file itself.
Ex:

open(OUTFILE, ">:utf8", "file.csv");

Thanks for your help.