Avatar of fabiano petrone
fabiano petrone
Flag for Italy asked on

xlsx/xls -> csv conversion

Hello,

***UPDATE****
As I use OpenOffice on Windows 7 for manually making the work that follows, maybe that an OpenOffice Guru can find a better solution than a Perl script, so I've added also OpenOffice in the topics
***************

I'm searching for a good solution for converting Windows xlsx/xls files (Windows 1252 WinLatin 1) in csv (UTF-8) files with a script like:

perl script.pl<file.xlsx>file.csv

my environment is Windows 7 with ActiveState Perl 5.24.0

The script should also "cut-off" the first line & the first column of every Excel spreadsheet

Thanks a lot,

Fabiano
PerlProgrammingOffice Suites-OtherScripting Languages

Avatar of undefined
Last Comment
Bill Prew

8/22/2022 - Mon
Bill Prew

Just to be clear, you don't have Microsoft Office installed, only Apache OpenOffice?


»bp
tel2

Hi fabiano,

After you've answered Bill's question, here's a possible starting point for .xls files:

Have a look at this script on CPAN:
  http://search.cpan.org/~ken/xls2csv-1.06/script/xls2csv
Here's how it's run:
  xls2csv -x "1252spreadsheet.xls" -b WINDOWS-1252 -c "ut8csvfile.csv" -a UTF-8

However, this is just a starting point, because:
- It seems to handle .xls files only, not .xlsx.
- It would need some modification to cut off row 1 & column A.
If you don't get any better offers, you might like to try it to see if it works in all other respects, though, and then report back.

For xlsx files, maybe these posts from PerlMonks might help:
  http://perlmonks.org/?node=xlsx+csv

And the Spreadsheet::Read module seems to be compatible with .xls, .xlsx, etc.
  http://search.cpan.org/~hmbrand/Spreadsheet-Read-0.74/Read.pm

What sheet of the workbook should your script work on?  Just the first one?  What would that sheet be called?  "Sheet1"?
fabiano petrone

ASKER
Hi, Bill
yes: for now only OOffice and the files are now "manually saved" in UTF8 csv format (with "tab" as field separator).
To tell the truth I can also go to Excel if no solutions are available: do you have a script for this platform?
to summarize, I need:
1) a command line script that has the xls file as an input and the csv file as an output, say something:
script <file.xls>file.csv
In fact I've a lot of files to process, and I want to put all the lines in an unique .bat
2) the script should crop the first line & the first column of every xls file
2) the content of the xls file should be saved in csv, UTF8 encoding & "tab" separator

Thanks a lot,
Fabiano
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
fabiano petrone

ASKER
Hi, tel2
Thanks for the suggestions.
Regarding xlx2csv I've installed it on my active perl environment, but your sample give me a "bad parameters" error.
calling the .pl file alone I obtain the following output:

Usage: C:\Perl64\eg\alma\personale\test\xls2csv.pl filename sheet-colrow:colrow
[-rotate]
   eg: C:\Perl64\eg\alma\personale\test\xls2csv.pl filename.xls 1-A1:B12
       C:\Perl64\eg\alma\personale\test\xls2csv.pl filename.xls A1:M1 -rotate

Open in new window


so it seems that I must know the number of the rows and of the columns before launching the script, and that's no good.
I'll give a try also to your other suggestions: Thanks
Fabiano
Bill Prew

In Windows, with Microsoft Excel installed, this would be pretty easy to do in a VBscript (VBS file).  Since you have a number of files to process, I would say let the script process all the files in a given directory, this would be more efficient since a copy of Excel would not have to be started and stopped for each file, the same instance could be used.

I don't have one that would be exactly what you are looking for, but this is pretty close and could be a good starting point if we went this way.  Just to give you an idea of the coding.  It could be enhanced to make the other changes to the file you need inside the loop.

Const cExcelCSV = 6

strDir = "C:\Temp\Files"

Set objFSO = CreateObject("Scripting.FileSystemObject")

Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = False
objExcel.DisplayAlerts = False

For Each objFile In objFSO.GetFolder(strDir).Files
    strExt = objFSO.GetExtensionName(objFile.Path)
    If LCase(Left(strExt, 3)) = "xls" Then
        Set objWorkbook = objExcel.Workbooks.Open(objFile.Path, False, True)
        objWorkbook.SaveAs Replace(strPath, "." & strExt, ".csv"), cExcelCSV
        objWorkbook.Close False
        Set objWorkbook = Nothing
    End If
Next

objExcel.Quit
Set objExcel = Nothing
Set objFSO = Nothing

Open in new window



»bp
fabiano petrone

ASKER
Hi, Bill
Thanks a lot for your script. I've modified only the line:
strDir = "C:\Perl64\eg\alma\files"
Then I've launched the script and I've obtained the attached error (an 800A03EC error)
sorry for the italian.
I try to traslate:
row 15 char 9: impossible to open the file.
try one of the following options: etc.
Thanks,
Fabiano
error.jpg
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Bill Prew

Sorry, missed a change as I re-purposed a prior script, adjust the following line:

       objWorkbook.SaveAs Replace(objFilePath, "." & strExt, ".csv"), cExcelCSV


»bp
fabiano petrone

ASKER
hi,
just substituted the line but I've obtained the same error.
I enclose also a sample.xls so you can test your script
Thanks,
Fabiano
sample.xls
Bill Prew

Very sorry, seem to have lost a period there...

objWorkbook.SaveAs Replace(objFile.Path, "." & strExt, ".csv"), cExcelCSV

Assuming you are running this on Windows you might want to run from a command prompt window too, with a command line like:

cscript yourscript.vbs

That way you will see any errors or other logging we might add to the console.


»bp
Your help has saved me hundreds of hours of internet surfing.
fblack61
fabiano petrone

ASKER
Hi,
done, but the output is more or less the same.
I enclose a screenshot
bye,
Fabiano
console.jpg
Bill Prew

Did you add the period into line 15 per my last post, that fixed it here.


»bp
Bill Prew

Here is a version with a couple of displays of the input and output files it processes, run this if still getting error.

Const cExcelCSV = 6

strDir = "B:\EE\EE29066306\Files"

Set objFSO = CreateObject("Scripting.FileSystemObject")

Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = False
objExcel.DisplayAlerts = False

For Each objFile In objFSO.GetFolder(strDir).Files
    strExt = objFSO.GetExtensionName(objFile.Path)
    If LCase(Left(strExt, 3)) = "xls" Then
        Wscript.Echo "Reading file: [" & objFile.Path & "]"
        Set objWorkbook = objExcel.Workbooks.Open(objFile.Path, False, True)
        strPath = Replace(objFile.Path, "." & strExt, ".csv")
        Wscript.Echo "Writing file: [" & strPath & "]"
        objWorkbook.SaveAs strPath, cExcelCSV
        objWorkbook.Close False
        Set objWorkbook = Nothing
    End If
Next

objExcel.Quit
Set objExcel = Nothing
Set objFSO = Nothing

Open in new window


»bp
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
fabiano petrone

ASKER
Hi, Bill
now it's OK: but the separator should be the "tab" and not the "comma" (see my 2h ago comment).
Thanks,
Fabiano
Bill Prew

Yes, I wasn't proposing this as a full solution, just a sample of how it would look.

If you think this could work for you I will adjust further.


»bp
fabiano petrone

ASKER
Hi,
yes, surely it could work for me: if you insert the "tab" separator instead of the "comma" it's perfectly suited to my needs
Thanks,
Fabiano
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
ASKER CERTIFIED SOLUTION
Bill Prew

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
fabiano petrone

ASKER
Hi, perfect Thanks :=)
tel2

Hi Bill,
Well done in your skill and persistence in solving this problem!
I don't know much VBA, but what part of your code does this:
   '"cut-off" the first line & the first column of every Excel spreadsheet'?
And what part does converts from Windows 1252 WinLatin 1 to UTF-8?
And does your code just process the first sheet of each workbook, or what?

Hi Fabiano,
> 'Regarding xlx2csv I've installed it on my active perl environment, but your sample give me a "bad parameters" error.'
Without seeing the exact command you issued, I can't help you.  I've just tested it myself, with the sample parameters I gave in my post, and it worked for me on Linux.  But it seems you have a simpler and more complete solution from Bill now, anyway.

tel2
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
tel2

Nice work, Bill!  Seems to work for me.  Give yourself a pay rise.

BTW, what's the main advantage of doing this in VBScript vs VBA?  Is it just to be able to have the program in a little plain text file without the need to embed it in a spreadsheet?
Would the code be much different to doing in VBA?

TRS
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Bill Prew

Code would be very similar, yes.  For "batch" things like this I find VBS a better fit, where there is no need for a UI, etc.


»bp