xlsx/xls -> csv conversion

Hello,

***UPDATE****
As I use OpenOffice on Windows 7 for manually making the work that follows, maybe that an OpenOffice Guru can find a better solution than a Perl script, so I've added also OpenOffice in the topics
***************

I'm searching for a good solution for converting Windows xlsx/xls files (Windows 1252 WinLatin 1) in csv (UTF-8) files with a script like:

perl script.pl<file.xlsx>file.csv

my environment is Windows 7 with ActiveState Perl 5.24.0

The script should also "cut-off" the first line & the first column of every Excel spreadsheet

Thanks a lot,

Fabiano
fabiano petroneAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Bill PrewCommented:
Just to be clear, you don't have Microsoft Office installed, only Apache OpenOffice?


»bp
0
tel2Commented:
Hi fabiano,

After you've answered Bill's question, here's a possible starting point for .xls files:

Have a look at this script on CPAN:
  http://search.cpan.org/~ken/xls2csv-1.06/script/xls2csv
Here's how it's run:
  xls2csv -x "1252spreadsheet.xls" -b WINDOWS-1252 -c "ut8csvfile.csv" -a UTF-8

However, this is just a starting point, because:
- It seems to handle .xls files only, not .xlsx.
- It would need some modification to cut off row 1 & column A.
If you don't get any better offers, you might like to try it to see if it works in all other respects, though, and then report back.

For xlsx files, maybe these posts from PerlMonks might help:
  http://perlmonks.org/?node=xlsx+csv

And the Spreadsheet::Read module seems to be compatible with .xls, .xlsx, etc.
  http://search.cpan.org/~hmbrand/Spreadsheet-Read-0.74/Read.pm

What sheet of the workbook should your script work on?  Just the first one?  What would that sheet be called?  "Sheet1"?
0
fabiano petroneAuthor Commented:
Hi, Bill
yes: for now only OOffice and the files are now "manually saved" in UTF8 csv format (with "tab" as field separator).
To tell the truth I can also go to Excel if no solutions are available: do you have a script for this platform?
to summarize, I need:
1) a command line script that has the xls file as an input and the csv file as an output, say something:
script <file.xls>file.csv
In fact I've a lot of files to process, and I want to put all the lines in an unique .bat
2) the script should crop the first line & the first column of every xls file
2) the content of the xls file should be saved in csv, UTF8 encoding & "tab" separator

Thanks a lot,
Fabiano
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

fabiano petroneAuthor Commented:
Hi, tel2
Thanks for the suggestions.
Regarding xlx2csv I've installed it on my active perl environment, but your sample give me a "bad parameters" error.
calling the .pl file alone I obtain the following output:

Usage: C:\Perl64\eg\alma\personale\test\xls2csv.pl filename sheet-colrow:colrow
[-rotate]
   eg: C:\Perl64\eg\alma\personale\test\xls2csv.pl filename.xls 1-A1:B12
       C:\Perl64\eg\alma\personale\test\xls2csv.pl filename.xls A1:M1 -rotate

Open in new window


so it seems that I must know the number of the rows and of the columns before launching the script, and that's no good.
I'll give a try also to your other suggestions: Thanks
Fabiano
0
Bill PrewCommented:
In Windows, with Microsoft Excel installed, this would be pretty easy to do in a VBscript (VBS file).  Since you have a number of files to process, I would say let the script process all the files in a given directory, this would be more efficient since a copy of Excel would not have to be started and stopped for each file, the same instance could be used.

I don't have one that would be exactly what you are looking for, but this is pretty close and could be a good starting point if we went this way.  Just to give you an idea of the coding.  It could be enhanced to make the other changes to the file you need inside the loop.

Const cExcelCSV = 6

strDir = "C:\Temp\Files"

Set objFSO = CreateObject("Scripting.FileSystemObject")

Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = False
objExcel.DisplayAlerts = False

For Each objFile In objFSO.GetFolder(strDir).Files
    strExt = objFSO.GetExtensionName(objFile.Path)
    If LCase(Left(strExt, 3)) = "xls" Then
        Set objWorkbook = objExcel.Workbooks.Open(objFile.Path, False, True)
        objWorkbook.SaveAs Replace(strPath, "." & strExt, ".csv"), cExcelCSV
        objWorkbook.Close False
        Set objWorkbook = Nothing
    End If
Next

objExcel.Quit
Set objExcel = Nothing
Set objFSO = Nothing

Open in new window



»bp
0
fabiano petroneAuthor Commented:
Hi, Bill
Thanks a lot for your script. I've modified only the line:
strDir = "C:\Perl64\eg\alma\files"
Then I've launched the script and I've obtained the attached error (an 800A03EC error)
sorry for the italian.
I try to traslate:
row 15 char 9: impossible to open the file.
try one of the following options: etc.
Thanks,
Fabiano
error.jpg
0
Bill PrewCommented:
Sorry, missed a change as I re-purposed a prior script, adjust the following line:

       objWorkbook.SaveAs Replace(objFilePath, "." & strExt, ".csv"), cExcelCSV


»bp
0
fabiano petroneAuthor Commented:
hi,
just substituted the line but I've obtained the same error.
I enclose also a sample.xls so you can test your script
Thanks,
Fabiano
sample.xls
0
Bill PrewCommented:
Very sorry, seem to have lost a period there...

objWorkbook.SaveAs Replace(objFile.Path, "." & strExt, ".csv"), cExcelCSV

Assuming you are running this on Windows you might want to run from a command prompt window too, with a command line like:

cscript yourscript.vbs

That way you will see any errors or other logging we might add to the console.


»bp
0
fabiano petroneAuthor Commented:
Hi,
done, but the output is more or less the same.
I enclose a screenshot
bye,
Fabiano
console.jpg
0
Bill PrewCommented:
Did you add the period into line 15 per my last post, that fixed it here.


»bp
0
Bill PrewCommented:
Here is a version with a couple of displays of the input and output files it processes, run this if still getting error.

Const cExcelCSV = 6

strDir = "B:\EE\EE29066306\Files"

Set objFSO = CreateObject("Scripting.FileSystemObject")

Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = False
objExcel.DisplayAlerts = False

For Each objFile In objFSO.GetFolder(strDir).Files
    strExt = objFSO.GetExtensionName(objFile.Path)
    If LCase(Left(strExt, 3)) = "xls" Then
        Wscript.Echo "Reading file: [" & objFile.Path & "]"
        Set objWorkbook = objExcel.Workbooks.Open(objFile.Path, False, True)
        strPath = Replace(objFile.Path, "." & strExt, ".csv")
        Wscript.Echo "Writing file: [" & strPath & "]"
        objWorkbook.SaveAs strPath, cExcelCSV
        objWorkbook.Close False
        Set objWorkbook = Nothing
    End If
Next

objExcel.Quit
Set objExcel = Nothing
Set objFSO = Nothing

Open in new window


»bp
0
fabiano petroneAuthor Commented:
Hi, Bill
now it's OK: but the separator should be the "tab" and not the "comma" (see my 2h ago comment).
Thanks,
Fabiano
0
Bill PrewCommented:
Yes, I wasn't proposing this as a full solution, just a sample of how it would look.

If you think this could work for you I will adjust further.


»bp
0
fabiano petroneAuthor Commented:
Hi,
yes, surely it could work for me: if you insert the "tab" separator instead of the "comma" it's perfectly suited to my needs
Thanks,
Fabiano
0
Bill PrewCommented:
This would be TAB delimited, see if this works for you.

Const xlCSV = 6
Const xlTextWindows = 20

strDir = "B:\EE\EE29066306\Files"

Set objFSO = CreateObject("Scripting.FileSystemObject")

Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = False
objExcel.DisplayAlerts = False

For Each objFile In objFSO.GetFolder(strDir).Files
    strExt = objFSO.GetExtensionName(objFile.Path)
    If LCase(Left(strExt, 3)) = "xls" Then
        Wscript.Echo "Reading file: [" & objFile.Path & "]"
        Set objWorkbook = objExcel.Workbooks.Open(objFile.Path, False, True)
        strPath = Replace(objFile.Path, "." & strExt, ".csv")
        Wscript.Echo "Writing file: [" & strPath & "]"
        objWorkbook.SaveAs strPath, xlTextWindows
        objWorkbook.Close False
        Set objWorkbook = Nothing
    End If
Next

objExcel.Quit
Set objExcel = Nothing
Set objFSO = Nothing

Open in new window


»bp
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
fabiano petroneAuthor Commented:
Hi, perfect Thanks :=)
0
tel2Commented:
Hi Bill,
Well done in your skill and persistence in solving this problem!
I don't know much VBA, but what part of your code does this:
   '"cut-off" the first line & the first column of every Excel spreadsheet'?
And what part does converts from Windows 1252 WinLatin 1 to UTF-8?
And does your code just process the first sheet of each workbook, or what?

Hi Fabiano,
> 'Regarding xlx2csv I've installed it on my active perl environment, but your sample give me a "bad parameters" error.'
Without seeing the exact command you issued, I can't help you.  I've just tested it myself, with the sample parameters I gave in my post, and it worked for me on Linux.  But it seems you have a simpler and more complete solution from Bill now, anyway.

tel2
0
Bill PrewCommented:
Here is a version that will remove the tor row and first column.

The code page conversion should have been happening by default, but just in case here is a small adjustment to make sure.

Const xlCSV = 6
Const xlTex = -4158
Const xlTextWindows = 20
Const msoEncodingUTF8 = 65001

strDir = "B:\EE\EE29066306\Files"

Set objFSO = CreateObject("Scripting.FileSystemObject")

With CreateObject("Excel.Application")
    .Visible = False
    .DisplayAlerts = False

    For Each objFile In objFSO.GetFolder(strDir).Files
        strExt = objFSO.GetExtensionName(objFile.Path)
        If LCase(Left(strExt, 3)) = "xls" Then
            Wscript.Echo "Reading file: [" & objFile.Path & "]"
            With .Workbooks.Open(objFile.Path, False, True)
                With .Sheets(1)
                    .Rows(1).Delete
                    .Columns(1).Delete
                End With
                strPath = Replace(objFile.Path, "." & strExt, ".csv")
                Wscript.Echo "Writing file: [" & strPath & "]"
                .WebOptions.Encoding = msoEncodingUTF8
                .SaveAs strPath, xlTextWindows
                .Close False
            End With
        End If
    Next

    .Quit
End With

Set objFSO = Nothing

Open in new window



»bp
0
tel2Commented:
Nice work, Bill!  Seems to work for me.  Give yourself a pay rise.

BTW, what's the main advantage of doing this in VBScript vs VBA?  Is it just to be able to have the program in a little plain text file without the need to embed it in a spreadsheet?
Would the code be much different to doing in VBA?

TRS
0
Bill PrewCommented:
Code would be very similar, yes.  For "batch" things like this I find VBS a better fit, where there is no need for a UI, etc.


»bp
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.