Link to home
Create AccountLog in
Avatar of Peter Chan
Peter ChanFlag for Hong Kong

asked on

Slow process to read Excel

Hi,

It does take nearly 30 minutes to finish scanning these two sheets
https://app.box.com/s/zkm75d29hef82zct7bkw784iro8kmn64

using such codes
                    foreach (Excel.Worksheet sheet in Book0.Worksheets)
                    {
                        Excel.Range usedRange = sheet.UsedRange;

                        RowID++; Worksheet0.Cells[RowID, 1] = "^^^^" + sheet.Name + "^^^^"; RowID++;

                        foreach (Excel.Range row in usedRange.Rows)
                        {
                            RowID++;
                            for (int i = 0; i < row.Columns.Count; i++)
                                /*if (!string.IsNullOrEmpty(rowData[i]))
                                    j = 0;
                                else
                                    j++;*/

                                if (!string.IsNullOrEmpty(Convert.ToString(row.Cells[1, i + 1].Value2)))
                                {
                                    Worksheet0.Cells[RowID, i + 1] = row.Cells[1, i + 1].Value2;
                                    Worksheet0.Cells[RowID, i + 1].Interior.Pattern = row.Cells[1, i + 1].Interior.Pattern;
                                    ...

Open in new window

Why? Any other way to efficiently/quickly read the attached two Excel Worksheets?
Avatar of Qlemo
Qlemo
Flag of Germany image

Using the Cell collation and indexing is slow via Automation. You should use a Range var for the current row of Worksheet0 to set properties, and best to just .Offset(1.0) it in each go. You could do the same down to a cell, but that is getting tedious.
Avatar of Peter Chan

ASKER

Can I have more details to your advice? Thanks a lot
I'm not a csharpa, but I guess you get the idea:
                    Excel.Range ws0cell0 = Worksheet0.Cells(1,1);
                    Excel.Range ws0cell;

                    foreach (Excel.Worksheet sheet in Book0.Worksheets)
                    {
                        Excel.Range usedRange = sheet.UsedRange;

                        RowID++;
                        ws0cell0 = "^^^^" + sheet.Name + "^^^^";
                        RowID++;

                        foreach (Excel.Range row in usedRange.Rows)
                        {
                            RowID++;
                            ws0cell = ws0Cell0 = ws0Cell0.Offset(1,0);    // one row down, first cell
                            for (int i = 0; i < row.Columns.Count; i++)
                                /*if (!string.IsNullOrEmpty(rowData[i]))
                                    j = 0;
                                else
                                    j++;*/

                                if (!string.IsNullOrEmpty(Convert.ToString(row.Cells[1, i + 1].Value2)))
                                {
                                    ws0cell.Value2 = row.Cells[1, i + 1].Value2;
                                    ws0cell.Interior.Pattern = row.Cells[1, i + 1].Interior.Pattern;
                                    ...
                                    ws0cell = ws0cell.Offset(0,1);    // one cell to the rightm sane row.

Open in new window

Doing the same for the source cell should improve performance significantly.
I use these
                    Excel.Worksheet Worksheet0;
                    Worksheet0 = (Excel.Worksheet)Book0.Worksheets.Add();
                    Excel.Range ws0cell0 = Worksheet0.Cells[1, 1];
                    Excel.Range ws0cell;

                    foreach (Excel.Worksheet sheet in Book0.Worksheets)
                    {
                        Excel.Range usedRange = sheet.UsedRange;

                        RowID++; ws0cell0.Value2 = "^^^^" + sheet.Name + "^^^^"; RowID++;

                        foreach (Excel.Range row in usedRange.Rows)
                        {
                            RowID++;
                            ws0cell = ws0cell0 = ws0cell0.Offset[1, 0];    // one row down, first cell

                            for (int i = 0; i < row.Columns.Count; i++)
                                /*if (!string.IsNullOrEmpty(rowData[i]))
                                    j = 0;
                                else
                                    j++;*/

                                if (!string.IsNullOrEmpty(Convert.ToString(row.Cells[1, i + 1].Value2)))
                                {
                                    ws0cell.Value2 = row.Cells[1, i + 1].Value2;
                                    ws0cell.Interior.Pattern = row.Cells[1, i + 1].Interior.Pattern;
                                    ws0cell.ColumnWidth = row.Cells[1, i + 1].ColumnWidth;
                                    ws0cell.Font.Color = row.Cells[1, i + 1].Font.Color;
                                    ws0cell = ws0cell.Offset[0, 1]; 
                                    ...

Open in new window

but it does still take 20 minutes to process the same Excel file. It is still too long time to process only 2 worksheets, right?
ASKER CERTIFIED SOLUTION
Avatar of Qlemo
Qlemo
Flag of Germany image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Yes, I cannot use

( )

and I'm using C#.
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
I cannot use C# that easily, so have to rely on what I know from VB Script and PowerShell (which again uses COM). In PowerShell I have to use the function notation, but that is just an implementation detail.

There is a lot of conversion taking place with Automation. E.g. setting Value2 always requires to convert from (display) string to internal format. It is best to keep Automation out of doing much. With PowerShell, the fastest copy of individual values between cells happens at 18 ms per cell (measured over 10 000 iterations).

The usual VBA ways to increase speed are to set Excel.Application.ScreenUpdating and Excel.WorkSheet.EnableCalculation to false while doing bulk operations.

Any reason you are not using the (Special)Paste function of Excel VBA, copying over a range at once? If you need to apply some processing on the copies, it might help to copy over the range, and then iterate thru the resulting cells instead.
Valliappan AN is correct, I didn't spot that. row access should be thru another variable for the current cell, similar to the changes already applied.
Thanks a lot to all.

Valliappan AN,
Can I have more details to your suggestion?
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
By these
                OpenFileDialog openFileDialog1 = new OpenFileDialog();
                if (openFileDialog1.ShowDialog() == System.Windows.Forms.DialogResult.OK)
                {
                    var ExcelObj = new Excel.Application();
                    Excel.Workbook Book0 = ExcelObj.Workbooks.Open(openFileDialog1.FileName, 0, false, 5, "", "", true, Excel.XlPlatform.xlWindows, "\t", false, false, 0, true);
                    ExcelObj.ScreenUpdating = false;

                    Excel.Worksheet Worksheet0;
                    Worksheet0 = (Excel.Worksheet)Book0.Worksheets.Add();
                    Excel.Range ws0cell0 = Worksheet0.Cells[1, 1];
                    Excel.Range ws0cell;

                    foreach (Excel.Worksheet sheet in Book0.Worksheets)
                    {
                        Excel.Range usedRange = sheet.UsedRange;

                        RowID++; ws0cell0.Value2 = "^^^^" + sheet.Name + "^^^^"; RowID++;

                        foreach (Excel.Range row in usedRange.Rows)
                        {
                            RowID++;
                            row.Copy();
                            Worksheet0.Cells[RowID, 1].PasteSpecial(-4104); 
                            ...

Open in new window

                           
1. It is not copying whatever cell property to each row
2. It takes about 9 minutes to process the file attached

How to improve it further?
F20---Copy.xlsx
Since your code snippet makes not much sense, we cannot help much more. For example you are going thru all worksheets to copy rows, including the newly created one where all copies go to.
If I use the following sequence instead of your innermost ForEach (line 19+):
RowID++;
sheet.UsedRange.Copy();
worksheet0.Cells[RowID, 1].PasteSpecial(-4104);

Open in new window

the copy is instantaneous, and cell formatting taken over.
Using PasteSpecial, it does take 9 minutes to copy the sheets within this
https://app.box.com/s/yrwohfmp713si364romjc29pb501y559
to one overall sheet, and such speed is still slow. Why?
Is this XLSM supposed to do anything? Anyway, a complete, complex Excel sheet is hard to check. You really should start with something more simple, and then add more as soon as speed is satisfying with test cases.