We help IT Professionals succeed at work.

c# using HtmlAgilityPack or xpath ... html data

I have test1.html in a folder and want to capture its embedded data shown on the image below:outputThe code I have now here is not producing what I need. Please see below for the code and the result it is producing:
           
            HtmlDocument hdoc = new HtmlDocument();
            HtmlNode abcTable;
            HtmlNodeCollection tableCells;

            hdoc.Load("test1.html");
            abcTable = hdoc.DocumentNode.SelectSingleNode("//table[@summary='ABC']");
            tableCells = abcTable.SelectNodes("//td");

            foreach (HtmlNode cell in tableCells)
            {
                Response.Write(cell.InnerText);
            }

The result using the above code:
Field1 Value1 Field6 Value6 Field2 Value2 Field7 Value7 Field3 Value3 Field8 Value8 Field4 Value4 Field9 Value9 Field5 Value5 Field10 Value10

Field11  Field12 Field13 Field14 Field15 Field16 Field17 Field18 Field19 
Valu11  Valu12  Valu13                             Valu16  Valu17   

Field1 Value1 Field6 Value6 Field2 Value2 Field7 Value7 Field3 Value3 Field8 Value8 Field4 Value4 Field9 Value9 Field5 Value5 Field10 Value10 

Field1 Value1 Field6 Value6 Field2 Value2 Field7 Value7 Field3 Value3 Field8 Value8 Field4 Value4 Field9 Value9 Field5 Value5 Field10 Value10 

Field11  Field12 Field13 Field14 Field15 Field16 Field17 Field18 Field19
Valu11  Valu12  Valu13                             Valu16  Valu17

Open in new window

Comment
Watch Question

Most Valuable Expert 2011
Top Expert 2015

Commented:
You need to take into account how your HTML is structured. The XPath I provided you previously just grabs ever <td> tag; it doesn't care about how those <td>'s are laid out. Your convention is that two adjacent <td>'s go together. You need to take that into account:

e.g.

using System;
using HtmlAgilityPack;

namespace _28627855
{
    class Program
    {
        static void Main(string[] args)
        {
            HtmlDocument hdoc = new HtmlDocument();
            HtmlNode abcTable;

            hdoc.Load("test1.html");
            abcTable = hdoc.DocumentNode.SelectSingleNode("//table[@summary='ABC']");

            foreach (HtmlNode row in abcTable.SelectNodes("tr"))
            {
                HtmlNodeCollection rowCells = row.SelectNodes("td");

                if (rowCells != null)
                {
                    for (int i = 0; i < rowCells.Count; i += 2)
                    {
                        Console.Write("{0} | {1} |", rowCells[i].InnerText, rowCells[i + 1].InnerText);
                    }
                }

                Console.WriteLine();
            }

            Console.ReadKey();
        }
    }
}

Open in new window

Mike EghtebasDatabase and Application Developer

Author

Commented:
Thanks for the post. I will try your post shortly. But in case there is a need, here is the html file:
<table summary="TopData">
      <tr> <td> Field1 </td>    <td>Value1</td> <td> Field6 </td>    <td>Value6</td></tr>
     <tr> <td > Field2 </td>    <td>Value2</td> <td> Field7 </td>    <td>Value7</td></tr></tr>
     <tr> <td > Field3 </td>    <td>Value3</td> <td> Field8 </td>    <td>Value8</td></tr></tr>
     <tr> <td > Field4 </td>    <td>Value4</td> <td> Field9 </td>    <td>Value9</td></tr></tr>
     <tr> <td > Field5 </td>    <td>Value5</td> <td> Field10 </td>    <td>Value10</td></tr></tr>
  </tr>
 </table>

<table summary="Details">
      <tr> <td> Field11 </td><td> Field12 </td><td> Field13 </td><td> Field14 </td><td> Field15 </td><td> Field16 </td><td> Field17 </td> <td> Field18 </td><td> Field19 </td></tr>
     <tr> <td> Value11 </td><td> Value12 </td><td> Value13 </td><td>          </td><td>                </td><td> Field16 </td><td> Field17 </td> <td>              </td><td>                </td></tr>
    <tr> <td>Value11 </td><td>Value12 </td><td> Value13</td><td>Value14</td><td>Value15</td><td> Field16 </td><td> Value17</td> <td>              </td><td>                </td></tr>
 </table>

Open in new window

Mike EghtebasDatabase and Application Developer

Author

Commented:
I am getting the following error because I have changed Console to Response:

 // Console.Write("{0} | {1} |", rowCells[i].InnerText, rowCells[i + 1].InnerText);
    Response.Write("{0} | {1} |", rowCells[i].InnerText, rowCells[i + 1].InnerText)
                                     (1)                  (2)                             (2)
(1): cannot convert from 'string' to 'char[]'
(2): cannot convert from 'string' to 'int'	
(3): cannot convert from 'string' to 'int'

Open in new window


Someday I will get hang of it all.

Thank you for the help.

Mike
Most Valuable Expert 2011
Top Expert 2015
Commented:
Response.Write(String.Format("{0} | {1} |", rowCells[i].InnerText, rowCells[i + 1].InnerText))

Open in new window

Mike EghtebasDatabase and Application Developer

Author

Commented:
Fantastic.