Trouble with converted code

I was trying to find some parse code for an html table and came across this one:
http://blog.hypercomplex.co.uk/index.php/2010/05/parsing-html-tables-into-system-data-datatable/

But I need it in vb.net, so I used a popular converter and everything but the following two functions seems to have come over properly...
Any idea how to correct these two hiccups?


Private Shared Function ParseColumns(tableHtml As String) As DataColumn()
        Dim headerMatches As MatchCollection = Regex.Matches(tableHtml, HeaderPattern, ExpressionOptions)

        Return (From headerMatch In headerMatchesNew DataColumn(headerMatch.Groups(1).ToString())).ToArray()
    End Function

Open in new window

New is underlined " ')' expected" (although that was after I added a space after headerMatches)

    ''' <summary>
    ''' For tables which do not specify header cells we must generate DataColumns based on the number
    ''' of cells in a row (we assume all rows have the same number of cells).
    ''' </summary>
    ''' <param name="rowMatches">A collection of all the rows in the HTML table we wish to generate columns for</param>
    ''' <returns>A set of DataColumns based on the number of celss in the first row of the input HTML table</returns>
    Private Shared Function GenerateColumns(rowMatches As MatchCollection) As DataColumn()
        Dim columnCount As Integer = Regex.Matches(rowMatches(0).ToString(), CellPattern, ExpressionOptions).Count

        Return (From index In Enumerable.Range(0, columnCount)New DataColumn("Column " & Convert.ToString(index))).ToArray()
    End Function

Open in new window

New is underlined " ')' expected"
LVL 67
sirbountyAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

UnifiedISCommented:
Sounds like it wants () around the arguments for the headerMatches
From headerMatch In headerMatches (New DataColumn(headerMatch.Groups(1).ToString())).ToArray())
0
sirbountyAuthor Commented:
Return (From headerMatch In headerMatches(New DataColumn(headerMatch.Groups(1).ToString())).ToArray())
states headermatch is not declared
and

Return (From index In Enumerable.Range(0, columnCount)(New DataColumn("Column " & Convert.ToString(index))).ToArray())
states index is not accessible in this context because it is 'Friend'
0
UnifiedISCommented:
That's a good message, what is headermatch, when is it declared?
0
Become a Microsoft Certified Solutions Expert

This course teaches how to install and configure Windows Server 2012 R2.  It is the first step on your path to becoming a Microsoft Certified Solutions Expert (MCSE).

sirbountyAuthor Commented:
headerMatch isn't declared anywhere that I can see...
0
it_saigeDeveloperCommented:
Here is your converted class:
Imports System.Text.RegularExpressions

''' <summary>
''' HtmlTableParser parses the contents of an html string into a System.Data DataSet or DataTable.
''' </summary>
Public Class HtmlTableParser
	Private Const ExpressionOptions As RegexOptions = RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase

	Private Const CommentPattern As String = "<!--(.*?)-->"
	Private Const TablePattern As String = "<table[^>]*>(.*?)</table>"
	Private Const HeaderPattern As String = "<th[^>]*>(.*?)</th>"
	Private Const RowPattern As String = "<tr[^>]*>(.*?)</tr>"
	Private Const CellPattern As String = "<td[^>]*>(.*?)</td>"

	''' <summary>
	''' Given an HTML string containing n table tables, parse them into a DataSet containing n DataTables.
	''' </summary>
	''' <param name="html">An HTML string containing n HTML tables</param>
	''' <returns>A DataSet containing a DataTable for each HTML table in the input HTML</returns>
	Public Shared Function ParseDataSet(ByVal html As String) As DataSet
		Dim dataSet As New DataSet()
		Dim tableMatches As MatchCollection = Regex.Matches(WithoutComments(html), TablePattern, ExpressionOptions)

		For Each tableMatch As Match In tableMatches
			dataSet.Tables.Add(ParseTable(tableMatch.Value))
		Next

		Return dataSet
	End Function

	''' <summary>
	''' Given an HTML string containing a single table, parse that table to form a DataTable.
	''' </summary>
	''' <param name="tableHtml">An HTML string containing a single HTML table</param>
	''' <returns>A DataTable which matches the input HTML table</returns>
	Public Shared Function ParseTable(ByVal tableHtml As String) As DataTable
		Dim tableHtmlWithoutComments As String = WithoutComments(tableHtml)

		Dim dataTable As New DataTable()

		Dim rowMatches As MatchCollection = Regex.Matches(tableHtmlWithoutComments, RowPattern, ExpressionOptions)

		dataTable.Columns.AddRange(If(tableHtmlWithoutComments.Contains("<th"), ParseColumns(tableHtml), GenerateColumns(rowMatches)))

		ParseRows(rowMatches, dataTable)

		Return dataTable
	End Function

	''' <summary>
	''' Strip comments from an HTML stirng
	''' </summary>
	''' <param name="html">An HTML string potentially containing comments</param>
	''' <returns>The input HTML string with comments removed</returns>
	Private Shared Function WithoutComments(ByVal html As String) As String
		Return Regex.Replace(html, CommentPattern, String.Empty, ExpressionOptions)
	End Function

	''' <summary>
	''' Add a row to the input DataTable for each row match in the input MatchCollection
	''' </summary>
	''' <param name="rowMatches">A collection of all the rows to add to the DataTable</param>
	''' <param name="dataTable">The DataTable to which we add rows</param>
	Private Shared Sub ParseRows(ByVal rowMatches As MatchCollection, ByVal dataTable As DataTable)
		For Each rowMatch As Match In rowMatches
			' if the row contains header tags don't use it - it is a header not a row
			If Not rowMatch.Value.Contains("<th") Then
				Dim dataRow As DataRow = dataTable.NewRow()

				Dim cellMatches As MatchCollection = Regex.Matches(rowMatch.Value, CellPattern, ExpressionOptions)

				For columnIndex As Integer = 0 To cellMatches.Count - 1
					dataRow(columnIndex) = cellMatches(columnIndex).Groups(1).ToString()
				Next

				dataTable.Rows.Add(dataRow)
			End If
		Next
	End Sub

	''' <summary>
	''' Given a string containing an HTML table, parse the header cells to create a set of DataColumns
	''' which define the columns in a DataTable.
	''' </summary>
	''' <param name="tableHtml">An HTML string containing a single HTML table</param>
	''' <returns>A set of DataColumns based on the HTML table header cells</returns>
	Private Shared Function ParseColumns(ByVal tableHtml As String) As DataColumn()
		Dim headerMatches As MatchCollection = Regex.Matches(tableHtml, HeaderPattern, ExpressionOptions)

		Return (From headerMatch In headerMatches Select New DataColumn(headerMatch.Groups(1).ToString())).ToArray()
	End Function

	''' <summary>
	''' For tables which do not specify header cells we must generate DataColumns based on the number
	''' of cells in a row (we assume all rows have the same number of cells).
	''' </summary>
	''' <param name="rowMatches">A collection of all the rows in the HTML table we wish to generate columns for</param>
	''' <returns>A set of DataColumns based on the number of celss in the first row of the input HTML table</returns>
	Private Shared Function GenerateColumns(ByVal rowMatches As MatchCollection) As DataColumn()
		Dim columnCount As Integer = Regex.Matches(rowMatches(0).ToString(), CellPattern, ExpressionOptions).Count

		Return (From index In Enumerable.Range(0, columnCount) Select New DataColumn("Column " & Convert.ToString(index))).ToArray()
	End Function
End Class

Open in new window


The error was caused by the fact that the translator did not properly convert the Select portion of the linq statement's.

-saige-
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
sirbountyAuthor Commented:
Thanks - that seems to work, however I did have to convert the html string to lowercase (oddly enough this results in all cells as <TD).

Thanks for the help.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.