c# using HtmlAgilityPack to read data from test.html in a folder

In C:\Folder_1 I have several html files:

test1.html
test2.html
test3.html
.
.
These html files all have the same structure:

  <table  summary="ABC">
      <tr>  
               <td >Field1 :</td>        <td>Valu1  </td>
                <td >Field2 :</td>        <td>Valu2  </td>
     </tr>
      <tr>  
               <td >Field3 :</td>        <td>Valu3  </td>
                <td >Field4 :</td>        <td>Valu4  </td>
     </tr>
  <tr>
  .
  .
  </tr>
 </table>

There are some other tables in this html file. This is why we need to make a reference to summary="ABC" property of the table.

Question: How to extract the values from this html file and store them in a table?
LVL 34
Mike EghtebasDatabase and Application DeveloperAsked:
Who is Participating?
 
käµfm³d 👽Commented:
HAP has good support for XPath expressions, which should give you flexibility in finding your tables within the HTML. For example, you could extract the table using:

using System;
using HtmlAgilityPack;

namespace _28627855
{
    class Program
    {
        static void Main(string[] args)
        {
            HtmlDocument hdoc = new HtmlDocument();
            HtmlNode abcTable;
            HtmlNodeCollection tableCells;

            hdoc.Load("test1.html");
            abcTable = hdoc.DocumentNode.SelectSingleNode("//table[@summary='ABC']");
            tableCells = abcTable.SelectNodes("//td");

            foreach (HtmlNode cell in tableCells)
            {
                Console.WriteLine(cell.InnerText);
            }

            Console.ReadKey();
        }
    }
}

Open in new window


The double-slash tells the parser to look anywhere within the document fragment. It's a tad on the slow side--on the grand scale--but you probably won't notice for your needs.
0
 
Mike EghtebasDatabase and Application DeveloperAuthor Commented:
How can I add | between the reads?

I am using:

// Console.WriteLine(cell.InnerText);
Response.Write(cell.InnerText + '|');

Also, I want to write in next line for some reason, the following doesn't work:

Response.Write(cell.InnerText + '|\n');
0
 
käµfm³d 👽Commented:
This:

Response.Write(cell.InnerText + '|');

...works fine for me. This:

Response.Write(cell.InnerText + '|\n');

...needs double quotes around the 2nd bit since that's two characters.
0
 
Mike EghtebasDatabase and Application DeveloperAuthor Commented:
Thank you
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.