why I got null returned in c# code when I am trying to get values from excel sheet using linq and OpenXml?

hi,

I have an excel sheet, I want to get all cells and data from that excel sheet using linq and openxml library,

but I got problem that null records returned from excel sheet, here is my code and the excel sheet:

 
private void btnAddProductType_Click(object sender, EventArgs e)
        {
            string path = @"E:\Leamra Projects\ProductManagment\code\UsingLINQToQueryTablesInExcel2007_CODE\LinqSample.xlsx";
            DataTable dtProductType = ReadDataFromExcel(path);
        }

        private DataTable ReadDataFromExcel(string path)
        {
            //Declare variables to hold refernces to Excel objects
            Workbook workBook;
            SharedStringTable sharedStrings;
            IEnumerable<Sheet> workSheets;
            WorksheetPart sheet;

            //Declare helper variables
            string sheetId;

            DataTable dtProductType = CreateProductTypeTableHeaders();
            using (SpreadsheetDocument document = SpreadsheetDocument.Open(path, true))
            {
                //References to the workbook and Shared String Table.
                workBook = document.WorkbookPart.Workbook;
                workSheets = workBook.Descendants<Sheet>();
                sharedStrings = document.WorkbookPart.SharedStringTablePart.SharedStringTable;

                //Reference to Excel Worksheet with Customer data.
                sheetId = workSheets.First(s => s.Name == @"Customer").Id;
                sheet = (WorksheetPart)document.WorkbookPart.GetPartById(sheetId);

                IEnumerable<Row> dataRows = from row in sheet.Worksheet.Descendants<Row>()
                                            where row.RowIndex > 1
                                            select row;
                foreach (Row row in dataRows)
                {
                    IEnumerable<String> textValues = from cell in row.Descendants<Cell>()
                                                     where cell.CellValue != null
                                                     select
                                                       (cell.DataType != null
                                                         && cell.DataType.HasValue
                                                         && cell.DataType == CellValues.SharedString ? sharedStrings.ChildElements[int.Parse(cell.CellValue.InnerText)].InnerText : cell.CellValue.InnerText);
                    //Check to verify the row contained data.
//here I got textValues is null !!!!
                    if (textValues.Count() > 0)
                    {
                        //Create a Customer and add it to the list.
                        var textArray = textValues.ToArray();
                        if (!AddNewRowTodtProductType(textArray[0], textArray[1], textArray[2], textArray[3], dtProductType))
                        {
                            return null;
                        }
                    }
                    else
                    {
                        //If no Cells assume the end of the table has been reached.
                        break;
                    }
                }


            }
            return dtProductType;
        }

        private bool AddNewRowTodtProductType(string EnProductType, string ArProductType, string EnDescription, string ArDescription, DataTable dtProductType)
        {
            try
            {

                DataRow row = null;
                row = dtProductType.NewRow();
                row["EnProductType"] = EnProductType;
                row["ArProductType"] = ArProductType;
                row["EnDescription"] = EnDescription;
                row["ArDescription"] = ArDescription;
                dtProductType.Rows.Add(row);
                return true;
            }
            catch
            {
                return false;
            }
        }

        private DataTable CreateProductTypeTableHeaders()
        {
            DataTable dtProductType = new DataTable("ProductTypes");
            DataColumn clProduct;
            clProduct = new DataColumn();
            clProduct.DataType = Type.GetType("System.String");
            clProduct.ColumnName = "EnProductType";
            dtProductType.Columns.Add(clProduct);

            clProduct = new DataColumn();
            clProduct.DataType = Type.GetType("System.String");
            clProduct.ColumnName = "ArProductType";
            dtProductType.Columns.Add(clProduct);

            clProduct = new DataColumn();
            clProduct.DataType = Type.GetType("System.String");
            clProduct.ColumnName = "EnDescription";
            dtProductType.Columns.Add(clProduct);

            clProduct = new DataColumn();
            clProduct.DataType = Type.GetType("System.String");
            clProduct.ColumnName = "ArDescription";
            clProduct.DefaultValue = "0";
            dtProductType.Columns.Add(clProduct);
            return dtProductType ;
        }

Open in new window


 ProductTypes.xlsx


the strange thing the code works with original excel sample of msdn (where I got openxml and linq code), I thought the problem might be because I am using office 2007 (.xlsx) and I discovered later that the sample excel was using also office 2007 and the file extension is .xlsx

then I thought it might be because its contain unicode characters, so I have deleted the column which has unicode characters, but that was same problem


then I thought the problem because there are some cells they are having null value, so I fill all cells with dumy data, but that also does not solve the problem

any help please?
 
here is the sample url: http://msdn.microsoft.com/en-us/library/dd920313.aspx
njgroupAsked:
Who is Participating?
 
Fernando SotoConnect With a Mentor RetiredCommented:
Hi njgroup;

I have found two issues with the code. The first is that this queries function is to get all none header rows from the Excel work sheet. But your Excel work sheet has defined two rows as the header and NOT 1. Therefore you need to change the "where" clause from this, where row.RowIndex > 1, to what is shown in the code snippet below.

IEnumerable<Row> dataRows = from row in sheet.Worksheet.Descendants<Row>( )
                            where row.RowIndex > 2
                            select row;

Open in new window


The second issue is with this query:

IEnumerable<String> textValues = from cell in row.Descendants<Cell>( )
                                 where cell.CellValue != null
                                 select
                                   ( cell.DataType != null && cell.DataType.HasValue && cell.DataType == CellValues.SharedString) 
                                        ? sharedStrings.ChildElements[ int.Parse( cell.CellValue.InnerText ) ].InnerText 
                                        : cell.CellValue.InnerText ;

Open in new window


The where clause in it filters out all cells that do not have a value only returning the cells with values in them. In your Excel document columns 3 and 4 are all empty therefore it only returns columns 1 and 2 and when your code executes this statement, var textArray = textValues.ToArray( );, the textArray only contains two elements and NOT four. This causes an issue with your function AddNewRowTodtProductType which is always expecting four elements. At which point your program will throw an exception of index out of range. I re-wrote the query to return an empty strings for columns that do not have a value in them, see code snippet below.

IEnumerable<String> textValues = from cell in row.Descendants<Cell>( )
                                 select
                                   ( cell.CellValue != null ) ?
                                     ((cell.DataType != null && cell.DataType.HasValue && cell.DataType == CellValues.SharedString )
                                        ? sharedStrings.ChildElements[ int.Parse( cell.CellValue.InnerText ) ].InnerText
                                        : cell.CellValue.InnerText) 
                                     : String.Empty;

Open in new window


Fernando
0
 
njgroupAuthor Commented:
in function "ReadDataFromExcel", I got "textValues" variable is null and when I have breakpoint on it I can see the result is empty and it says: "Enumeration yielded no results"

please see the snapshot below

 Enumeration yielded no results
0
 
njgroupAuthor Commented:
I found the problems, I got two problems,

#1 there are some merged cells (that was in table headers)
#2 the linq select can not set empty string of null values (from the excel sheet there are two column do not have values, they are "EnProductTypeDescription", "ArProductTypeDescription")

so how to get value from merged cells (instead of returning null),
and how to force getting empty string from empty cells (because the array of cells returned contains only data about the cells that they are not empty, and I dont get the empty cell at all,
for example if I have 4 cells have values like this: "A", "B", "", "C", I got array of strings containing "A", "B", "C",
please see the following attachment)

 cells not comming in select statement
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
Sudhakar PulivarthiProject Lead - EngineeringCommented:
Hi,
Remove this where condition and check.. This filtering might be the reason for reduced array values.
where cell.CellValue != null
0
 
njgroupAuthor Commented:
if I remove the where statement, I got error textValues variable is null
0
 
Fernando SotoRetiredCommented:
Hi njgroup;

Can you post the LinqSample.xlsx The excel you posted was ProductTypes.xlsx

Fernando
0
 
njgroupAuthor Commented:
they are same, I just copy data from ProductTypes.xlsx to LinqSample.xlsx to keep the path just for quick test
0
 
Fernando SotoRetiredCommented:
Hi njgroup;

You say that you just copied the ProductTypes.xlsx to LinqSample.xlsx for a quick test, well that will not work unless ProductTypes.xlsx has the same names for sheets and column headers names as the original LinqSample.xlsx excel document has. For example the line of code from your post has this:

//Reference to Excel Worksheet with Customer data.
sheetId = workSheets.First( s => s.Name == @"Customer" ).Id;

Open in new window


but the excel workbook from ProductTypes.xlsx has the following three sheets, "CCTV", "Sheet2" and
"Sheet3" but the line of code is looking for a Sheet called Customer so you would get an exception. And if you look at the construction of your DataTable non of the column names match that of the Excel document. You need to re-write the program to match the Excel document that you are using.

Fernando
0
 
njgroupAuthor Commented:
hi Fernando,

sorry I have send you an old code, I have just changed sheet name and I did in my code, but this is not my problem,

the program pass from this line of code, but stuck in my previous posts.

the sheet name is not my problem but the previous posts,

can u please just change the sheet name (its only a string)
0
 
njgroupAuthor Commented:
thats perfect :)

thanks very much
0
 
Fernando SotoRetiredCommented:
Not a problem, glad I was able to help.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.