c# using HtmlAgilityPack to read data from test.html in a folder

In C:\Folder_1 I have several html files:

test1.html
test2.html
test3.html
.
.
These html files all have the same structure:

  <table  summary="ABC">
      <tr>  
               <td >Field1 :</td>        <td>Valu1  </td>
                <td >Field2 :</td>        <td>Valu2  </td>
     </tr>
      <tr>  
               <td >Field3 :</td>        <td>Valu3  </td>
                <td >Field4 :</td>        <td>Valu4  </td>
     </tr>
  <tr>
  .
  .
  </tr>
 </table>

There are some other tables in this html file. This is why we need to make a reference to summary="ABC" property of the table.

Question: How to extract the values from this html file and store them in a table?
LVL 34
Mike EghtebasDatabase and Application DeveloperAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

käµfm³d 👽Commented:
HAP has good support for XPath expressions, which should give you flexibility in finding your tables within the HTML. For example, you could extract the table using:

using System;
using HtmlAgilityPack;

namespace _28627855
{
    class Program
    {
        static void Main(string[] args)
        {
            HtmlDocument hdoc = new HtmlDocument();
            HtmlNode abcTable;
            HtmlNodeCollection tableCells;

            hdoc.Load("test1.html");
            abcTable = hdoc.DocumentNode.SelectSingleNode("//table[@summary='ABC']");
            tableCells = abcTable.SelectNodes("//td");

            foreach (HtmlNode cell in tableCells)
            {
                Console.WriteLine(cell.InnerText);
            }

            Console.ReadKey();
        }
    }
}

Open in new window


The double-slash tells the parser to look anywhere within the document fragment. It's a tad on the slow side--on the grand scale--but you probably won't notice for your needs.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Mike EghtebasDatabase and Application DeveloperAuthor Commented:
How can I add | between the reads?

I am using:

// Console.WriteLine(cell.InnerText);
Response.Write(cell.InnerText + '|');

Also, I want to write in next line for some reason, the following doesn't work:

Response.Write(cell.InnerText + '|\n');
0
käµfm³d 👽Commented:
This:

Response.Write(cell.InnerText + '|');

...works fine for me. This:

Response.Write(cell.InnerText + '|\n');

...needs double quotes around the 2nd bit since that's two characters.
0
Mike EghtebasDatabase and Application DeveloperAuthor Commented:
Thank you
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C#

From novice to tech pro — start learning today.