Perl WWW-Mechanize issue when downloading CSV files

savantmarketing
savantmarketing used Ask the Experts™
on
I'm trying to use Perl and WWW::Mechanize to login to webmail and download an attachemt (CSV file).

The script is able to get the file but the problem is, I'm getting garbled characters on the CSV file.  This does not happen if I download the CSV file using a browser.

Here is the code that I am using:

use WWW::Mechanize;
use HTTP::Cookies;

$mech = WWW::Mechanize->new();
$cook = $zangomech->cookie_jar(HTTP::Cookies->new(file => "cookies.txt", autosave => 1,));
$zangomech->get('http://mail.mydomain.com/login.php');

$mech->form_number(1);
$mech->field('login_username' => 'username');
$mech->field('secretkey' => 'pass');
$mech->click();

$mech->add_header('Content-Type' => 'text/plain',  'charset' => 'utf-8');
$mech->get('http://mail.mydomain.com');

$mech->follow_link( text => "emaillink", n => 1);
$mech->follow_link( text => "Download", n => 2);
$output = $mech->content();

open(OUTFILE, ">file.csv");
print OUTFILE "$output";
close(OUTFILE);

Below is what I see when I open up the CSV file, i just changed the filename ext from .csv to .txt so it could be loaded on a browser.:
- - - - - -

http://traffic-director.net/testfiles/test.txt

I'm really not sure what these are so I couldn't d any cleanup programatically.

Thanks in advance.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Top Expert 2009

Commented:
If you use a browser, and inspect the headers, does it use charset utf-8?
What do you get when you download the file with a browser

Author

Commented:
Yeah, the headers say that it uses utf8.
When I download the file using a browser and open up the file in notepad, the encoding is in unicode though.
Top Expert 2009

Commented:
There are 2 bytes to represent every character.  Do you want to convert it to ASCII?  You will lose info if any characters have ascii codes >255.  Otherwise you just need a viewer that can view unicode files.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
Here is what's happening.

If I download the file manually, I still get the same issue.  I need to open up the file in notepad and then go to File > Save As, then change the file encoding:from unicode to ANSI of UTF-8.

Question:

It is possible to change the encoding programatically using PERL or PHP?
Top Expert 2009

Commented:
use Text::Unidecode;
$unaccented = unidecode($output);
print OUTFILE $unaccented;

Author

Commented:
Thanks for the suggetion.

I tried the code and I'm getting almost  the same results.  The only difference is that the garbled characters are being written in between each letter on the CSV file.
Top Expert 2009
Commented:
I haven't used that module before, but according to it's documentation, it should have worked.
Anyways, you can use a regex to remove the extra characters... it'll work in almost all cases from the looks of your file:

$output =~ s/\x00//g;

Author

Commented:
That did it.

Plus I had to specify the correct encoding when writing to the file itself.
Ex:

open(OUTFILE, ">:utf8", "file.csv");

Thanks for your help.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial