Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 6183
  • Last Modified:

Parse Html table data in WebBrowser

I'm trying to get data from table cells in a web page through the WebBrowser control in C# 2005. Below are some codes I used:

HtmlDocument doc = WebBrowser1.Document;
HtmlElementCollection tables = doc.GetElementsByTagName("table");
if (tables.Count > 0)
{
       System.Web.UI.HtmlControls.HtmlTable table = tables[0] as System.Web.UI.HtmlControls.HtmlTable;    //can't work
       .....
}

As known, tables[0] is treated as HtmlElement and C# doesn't allow me to convert this HtmlElement to an HtmlTable object. But I really hope I can use the easy functions from HtmlTable such as Rows and Cells to get data much faster and easier.

Is there a way for me to get data from the html table cells using the functions similar like HtmlTable?
0
ficstar
Asked:
ficstar
  • 3
  • 2
2 Solutions
 
Bob LearnedCommented:
1) You need to get the DomElement and cast that.

2) Add a reference to Microsoft.mshtml

3) Do it like this:

if (tables.Count > 0)
{
       mshtml.HtmlTableElement table = (mshtml.HtmlTableElement)tables[0].DomElement;
}

Bob

0
 
ficstarAuthor Commented:
Thanks for your help. I have to use mshtml.HTMLTable instead of the one you used since I cannot call mshtml.HtmlTableElement in my VS 2005:

mshtml.HTMLTable table = (mshtml.HTMLTable)tables[i].DomElement

This can work and I get the table data successfully.

Can you tell me how to get the cell content?

I tried:

mshtml.HTMLTableCell cell = (mshtml.HTMLTableCell)table.cells.item(0, 0);
string cellText = cell.innerText;

The above codes didn't return the text in the first table cell but it's the whole table text body. How can I get the text in each table cell? Thanks.
0
 
sumixCommented:
'item' method only allows you to access an individual element from a collection if it has a 'name' attribute.
You may use instead:

 foreach (mshtml.HTMLTableCellClass tblCell in table.cells)
 {
        Console.WriteLine(tblCell.innerHTML);
  }
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
ficstarAuthor Commented:
This works great. However, I want to access the data for individual cells by using the row and column info. For example I want to know if cell(2, 5) contains the keyword "employee" or not. The foreach loop you gave me doesn't allow me to access individual cells in a way like that.
0
 
sumixCommented:

HtmlElementCollection does not have an indexer, so its elements cannot be accessed by indexes. It only implements IEnumerator interface, it is not an IList. A workaround may be if you loop through rows collection, and for each row loop its cells collection, like:

i=0;
foreach (mshtml.HTMLTableRowClass row in table.rows)
{
      i++;
      j=0;
      foreach (mshtml.HTMLTableCellClass cell in row.cells)
      {
            j++;
            Console.WriteLine(" Cell {0},{1} is: {2}",i,j,cell.innerHTML);
      }
}
0
 
ficstarAuthor Commented:
Great help. Thanks a lot.
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now