Solved

Trouble with converted code

Posted on 2014-12-16
6
137 Views
Last Modified: 2016-02-18
I was trying to find some parse code for an html table and came across this one:
http://blog.hypercomplex.co.uk/index.php/2010/05/parsing-html-tables-into-system-data-datatable/

But I need it in vb.net, so I used a popular converter and everything but the following two functions seems to have come over properly...
Any idea how to correct these two hiccups?


Private Shared Function ParseColumns(tableHtml As String) As DataColumn()
        Dim headerMatches As MatchCollection = Regex.Matches(tableHtml, HeaderPattern, ExpressionOptions)

        Return (From headerMatch In headerMatchesNew DataColumn(headerMatch.Groups(1).ToString())).ToArray()
    End Function

Open in new window

New is underlined " ')' expected" (although that was after I added a space after headerMatches)

    ''' <summary>
    ''' For tables which do not specify header cells we must generate DataColumns based on the number
    ''' of cells in a row (we assume all rows have the same number of cells).
    ''' </summary>
    ''' <param name="rowMatches">A collection of all the rows in the HTML table we wish to generate columns for</param>
    ''' <returns>A set of DataColumns based on the number of celss in the first row of the input HTML table</returns>
    Private Shared Function GenerateColumns(rowMatches As MatchCollection) As DataColumn()
        Dim columnCount As Integer = Regex.Matches(rowMatches(0).ToString(), CellPattern, ExpressionOptions).Count

        Return (From index In Enumerable.Range(0, columnCount)New DataColumn("Column " & Convert.ToString(index))).ToArray()
    End Function

Open in new window

New is underlined " ')' expected"
0
Comment
Question by:sirbounty
  • 3
  • 2
6 Comments
 
LVL 18

Expert Comment

by:UnifiedIS
ID: 40503306
Sounds like it wants () around the arguments for the headerMatches
From headerMatch In headerMatches (New DataColumn(headerMatch.Groups(1).ToString())).ToArray())
0
 
LVL 67

Author Comment

by:sirbounty
ID: 40503356
Return (From headerMatch In headerMatches(New DataColumn(headerMatch.Groups(1).ToString())).ToArray())
states headermatch is not declared
and

Return (From index In Enumerable.Range(0, columnCount)(New DataColumn("Column " & Convert.ToString(index))).ToArray())
states index is not accessible in this context because it is 'Friend'
0
 
LVL 18

Expert Comment

by:UnifiedIS
ID: 40503386
That's a good message, what is headermatch, when is it declared?
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 67

Author Comment

by:sirbounty
ID: 40503578
headerMatch isn't declared anywhere that I can see...
0
 
LVL 33

Accepted Solution

by:
it_saige earned 500 total points
ID: 40503700
Here is your converted class:
Imports System.Text.RegularExpressions

''' <summary>
''' HtmlTableParser parses the contents of an html string into a System.Data DataSet or DataTable.
''' </summary>
Public Class HtmlTableParser
	Private Const ExpressionOptions As RegexOptions = RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase

	Private Const CommentPattern As String = "<!--(.*?)-->"
	Private Const TablePattern As String = "<table[^>]*>(.*?)</table>"
	Private Const HeaderPattern As String = "<th[^>]*>(.*?)</th>"
	Private Const RowPattern As String = "<tr[^>]*>(.*?)</tr>"
	Private Const CellPattern As String = "<td[^>]*>(.*?)</td>"

	''' <summary>
	''' Given an HTML string containing n table tables, parse them into a DataSet containing n DataTables.
	''' </summary>
	''' <param name="html">An HTML string containing n HTML tables</param>
	''' <returns>A DataSet containing a DataTable for each HTML table in the input HTML</returns>
	Public Shared Function ParseDataSet(ByVal html As String) As DataSet
		Dim dataSet As New DataSet()
		Dim tableMatches As MatchCollection = Regex.Matches(WithoutComments(html), TablePattern, ExpressionOptions)

		For Each tableMatch As Match In tableMatches
			dataSet.Tables.Add(ParseTable(tableMatch.Value))
		Next

		Return dataSet
	End Function

	''' <summary>
	''' Given an HTML string containing a single table, parse that table to form a DataTable.
	''' </summary>
	''' <param name="tableHtml">An HTML string containing a single HTML table</param>
	''' <returns>A DataTable which matches the input HTML table</returns>
	Public Shared Function ParseTable(ByVal tableHtml As String) As DataTable
		Dim tableHtmlWithoutComments As String = WithoutComments(tableHtml)

		Dim dataTable As New DataTable()

		Dim rowMatches As MatchCollection = Regex.Matches(tableHtmlWithoutComments, RowPattern, ExpressionOptions)

		dataTable.Columns.AddRange(If(tableHtmlWithoutComments.Contains("<th"), ParseColumns(tableHtml), GenerateColumns(rowMatches)))

		ParseRows(rowMatches, dataTable)

		Return dataTable
	End Function

	''' <summary>
	''' Strip comments from an HTML stirng
	''' </summary>
	''' <param name="html">An HTML string potentially containing comments</param>
	''' <returns>The input HTML string with comments removed</returns>
	Private Shared Function WithoutComments(ByVal html As String) As String
		Return Regex.Replace(html, CommentPattern, String.Empty, ExpressionOptions)
	End Function

	''' <summary>
	''' Add a row to the input DataTable for each row match in the input MatchCollection
	''' </summary>
	''' <param name="rowMatches">A collection of all the rows to add to the DataTable</param>
	''' <param name="dataTable">The DataTable to which we add rows</param>
	Private Shared Sub ParseRows(ByVal rowMatches As MatchCollection, ByVal dataTable As DataTable)
		For Each rowMatch As Match In rowMatches
			' if the row contains header tags don't use it - it is a header not a row
			If Not rowMatch.Value.Contains("<th") Then
				Dim dataRow As DataRow = dataTable.NewRow()

				Dim cellMatches As MatchCollection = Regex.Matches(rowMatch.Value, CellPattern, ExpressionOptions)

				For columnIndex As Integer = 0 To cellMatches.Count - 1
					dataRow(columnIndex) = cellMatches(columnIndex).Groups(1).ToString()
				Next

				dataTable.Rows.Add(dataRow)
			End If
		Next
	End Sub

	''' <summary>
	''' Given a string containing an HTML table, parse the header cells to create a set of DataColumns
	''' which define the columns in a DataTable.
	''' </summary>
	''' <param name="tableHtml">An HTML string containing a single HTML table</param>
	''' <returns>A set of DataColumns based on the HTML table header cells</returns>
	Private Shared Function ParseColumns(ByVal tableHtml As String) As DataColumn()
		Dim headerMatches As MatchCollection = Regex.Matches(tableHtml, HeaderPattern, ExpressionOptions)

		Return (From headerMatch In headerMatches Select New DataColumn(headerMatch.Groups(1).ToString())).ToArray()
	End Function

	''' <summary>
	''' For tables which do not specify header cells we must generate DataColumns based on the number
	''' of cells in a row (we assume all rows have the same number of cells).
	''' </summary>
	''' <param name="rowMatches">A collection of all the rows in the HTML table we wish to generate columns for</param>
	''' <returns>A set of DataColumns based on the number of celss in the first row of the input HTML table</returns>
	Private Shared Function GenerateColumns(ByVal rowMatches As MatchCollection) As DataColumn()
		Dim columnCount As Integer = Regex.Matches(rowMatches(0).ToString(), CellPattern, ExpressionOptions).Count

		Return (From index In Enumerable.Range(0, columnCount) Select New DataColumn("Column " & Convert.ToString(index))).ToArray()
	End Function
End Class

Open in new window


The error was caused by the fact that the translator did not properly convert the Select portion of the linq statement's.

-saige-
0
 
LVL 67

Author Comment

by:sirbounty
ID: 40506869
Thanks - that seems to work, however I did have to convert the html string to lowercase (oddly enough this results in all cells as <TD).

Thanks for the help.
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Email security requires an ever evolving service that stays up to date with counter-evolving threats. The Email Laundry perform Research and Development to ensure their email security service evolves faster than cyber criminals. We apply our Threat…
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question