Link to home
Start Free TrialLog in
Avatar of Mike Eghtebas
Mike EghtebasFlag for United States of America

asked on

c# using HtmlAgilityPack or xpath ... html data

I have test1.html in a folder and want to capture its embedded data shown on the image below:User generated imageThe code I have now here is not producing what I need. Please see below for the code and the result it is producing:
           
            HtmlDocument hdoc = new HtmlDocument();
            HtmlNode abcTable;
            HtmlNodeCollection tableCells;

            hdoc.Load("test1.html");
            abcTable = hdoc.DocumentNode.SelectSingleNode("//table[@summary='ABC']");
            tableCells = abcTable.SelectNodes("//td");

            foreach (HtmlNode cell in tableCells)
            {
                Response.Write(cell.InnerText);
            }

The result using the above code:
Field1 Value1 Field6 Value6 Field2 Value2 Field7 Value7 Field3 Value3 Field8 Value8 Field4 Value4 Field9 Value9 Field5 Value5 Field10 Value10

Field11  Field12 Field13 Field14 Field15 Field16 Field17 Field18 Field19 
Valu11  Valu12  Valu13                             Valu16  Valu17   

Field1 Value1 Field6 Value6 Field2 Value2 Field7 Value7 Field3 Value3 Field8 Value8 Field4 Value4 Field9 Value9 Field5 Value5 Field10 Value10 

Field1 Value1 Field6 Value6 Field2 Value2 Field7 Value7 Field3 Value3 Field8 Value8 Field4 Value4 Field9 Value9 Field5 Value5 Field10 Value10 

Field11  Field12 Field13 Field14 Field15 Field16 Field17 Field18 Field19
Valu11  Valu12  Valu13                             Valu16  Valu17

Open in new window

Avatar of kaufmed
kaufmed
Flag of United States of America image

You need to take into account how your HTML is structured. The XPath I provided you previously just grabs ever <td> tag; it doesn't care about how those <td>'s are laid out. Your convention is that two adjacent <td>'s go together. You need to take that into account:

e.g.

using System;
using HtmlAgilityPack;

namespace _28627855
{
    class Program
    {
        static void Main(string[] args)
        {
            HtmlDocument hdoc = new HtmlDocument();
            HtmlNode abcTable;

            hdoc.Load("test1.html");
            abcTable = hdoc.DocumentNode.SelectSingleNode("//table[@summary='ABC']");

            foreach (HtmlNode row in abcTable.SelectNodes("tr"))
            {
                HtmlNodeCollection rowCells = row.SelectNodes("td");

                if (rowCells != null)
                {
                    for (int i = 0; i < rowCells.Count; i += 2)
                    {
                        Console.Write("{0} | {1} |", rowCells[i].InnerText, rowCells[i + 1].InnerText);
                    }
                }

                Console.WriteLine();
            }

            Console.ReadKey();
        }
    }
}

Open in new window

Avatar of Mike Eghtebas

ASKER

Thanks for the post. I will try your post shortly. But in case there is a need, here is the html file:
<table summary="TopData">
      <tr> <td> Field1 </td>    <td>Value1</td> <td> Field6 </td>    <td>Value6</td></tr>
     <tr> <td > Field2 </td>    <td>Value2</td> <td> Field7 </td>    <td>Value7</td></tr></tr>
     <tr> <td > Field3 </td>    <td>Value3</td> <td> Field8 </td>    <td>Value8</td></tr></tr>
     <tr> <td > Field4 </td>    <td>Value4</td> <td> Field9 </td>    <td>Value9</td></tr></tr>
     <tr> <td > Field5 </td>    <td>Value5</td> <td> Field10 </td>    <td>Value10</td></tr></tr>
  </tr>
 </table>

<table summary="Details">
      <tr> <td> Field11 </td><td> Field12 </td><td> Field13 </td><td> Field14 </td><td> Field15 </td><td> Field16 </td><td> Field17 </td> <td> Field18 </td><td> Field19 </td></tr>
     <tr> <td> Value11 </td><td> Value12 </td><td> Value13 </td><td>          </td><td>                </td><td> Field16 </td><td> Field17 </td> <td>              </td><td>                </td></tr>
    <tr> <td>Value11 </td><td>Value12 </td><td> Value13</td><td>Value14</td><td>Value15</td><td> Field16 </td><td> Value17</td> <td>              </td><td>                </td></tr>
 </table>

Open in new window

I am getting the following error because I have changed Console to Response:

 // Console.Write("{0} | {1} |", rowCells[i].InnerText, rowCells[i + 1].InnerText);
    Response.Write("{0} | {1} |", rowCells[i].InnerText, rowCells[i + 1].InnerText)
                                     (1)                  (2)                             (2)
(1): cannot convert from 'string' to 'char[]'
(2): cannot convert from 'string' to 'int'	
(3): cannot convert from 'string' to 'int'

Open in new window


Someday I will get hang of it all.

Thank you for the help.

Mike
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Fantastic.