Solved

Trouble with converted code

Posted on 2014-12-16
6
128 Views
Last Modified: 2016-02-18
I was trying to find some parse code for an html table and came across this one:
http://blog.hypercomplex.co.uk/index.php/2010/05/parsing-html-tables-into-system-data-datatable/

But I need it in vb.net, so I used a popular converter and everything but the following two functions seems to have come over properly...
Any idea how to correct these two hiccups?


Private Shared Function ParseColumns(tableHtml As String) As DataColumn()
        Dim headerMatches As MatchCollection = Regex.Matches(tableHtml, HeaderPattern, ExpressionOptions)

        Return (From headerMatch In headerMatchesNew DataColumn(headerMatch.Groups(1).ToString())).ToArray()
    End Function

Open in new window

New is underlined " ')' expected" (although that was after I added a space after headerMatches)

    ''' <summary>
    ''' For tables which do not specify header cells we must generate DataColumns based on the number
    ''' of cells in a row (we assume all rows have the same number of cells).
    ''' </summary>
    ''' <param name="rowMatches">A collection of all the rows in the HTML table we wish to generate columns for</param>
    ''' <returns>A set of DataColumns based on the number of celss in the first row of the input HTML table</returns>
    Private Shared Function GenerateColumns(rowMatches As MatchCollection) As DataColumn()
        Dim columnCount As Integer = Regex.Matches(rowMatches(0).ToString(), CellPattern, ExpressionOptions).Count

        Return (From index In Enumerable.Range(0, columnCount)New DataColumn("Column " & Convert.ToString(index))).ToArray()
    End Function

Open in new window

New is underlined " ')' expected"
0
Comment
Question by:sirbounty
  • 3
  • 2
6 Comments
 
LVL 18

Expert Comment

by:UnifiedIS
ID: 40503306
Sounds like it wants () around the arguments for the headerMatches
From headerMatch In headerMatches (New DataColumn(headerMatch.Groups(1).ToString())).ToArray())
0
 
LVL 67

Author Comment

by:sirbounty
ID: 40503356
Return (From headerMatch In headerMatches(New DataColumn(headerMatch.Groups(1).ToString())).ToArray())
states headermatch is not declared
and

Return (From index In Enumerable.Range(0, columnCount)(New DataColumn("Column " & Convert.ToString(index))).ToArray())
states index is not accessible in this context because it is 'Friend'
0
 
LVL 18

Expert Comment

by:UnifiedIS
ID: 40503386
That's a good message, what is headermatch, when is it declared?
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 67

Author Comment

by:sirbounty
ID: 40503578
headerMatch isn't declared anywhere that I can see...
0
 
LVL 32

Accepted Solution

by:
it_saige earned 500 total points
ID: 40503700
Here is your converted class:
Imports System.Text.RegularExpressions

''' <summary>
''' HtmlTableParser parses the contents of an html string into a System.Data DataSet or DataTable.
''' </summary>
Public Class HtmlTableParser
	Private Const ExpressionOptions As RegexOptions = RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase

	Private Const CommentPattern As String = "<!--(.*?)-->"
	Private Const TablePattern As String = "<table[^>]*>(.*?)</table>"
	Private Const HeaderPattern As String = "<th[^>]*>(.*?)</th>"
	Private Const RowPattern As String = "<tr[^>]*>(.*?)</tr>"
	Private Const CellPattern As String = "<td[^>]*>(.*?)</td>"

	''' <summary>
	''' Given an HTML string containing n table tables, parse them into a DataSet containing n DataTables.
	''' </summary>
	''' <param name="html">An HTML string containing n HTML tables</param>
	''' <returns>A DataSet containing a DataTable for each HTML table in the input HTML</returns>
	Public Shared Function ParseDataSet(ByVal html As String) As DataSet
		Dim dataSet As New DataSet()
		Dim tableMatches As MatchCollection = Regex.Matches(WithoutComments(html), TablePattern, ExpressionOptions)

		For Each tableMatch As Match In tableMatches
			dataSet.Tables.Add(ParseTable(tableMatch.Value))
		Next

		Return dataSet
	End Function

	''' <summary>
	''' Given an HTML string containing a single table, parse that table to form a DataTable.
	''' </summary>
	''' <param name="tableHtml">An HTML string containing a single HTML table</param>
	''' <returns>A DataTable which matches the input HTML table</returns>
	Public Shared Function ParseTable(ByVal tableHtml As String) As DataTable
		Dim tableHtmlWithoutComments As String = WithoutComments(tableHtml)

		Dim dataTable As New DataTable()

		Dim rowMatches As MatchCollection = Regex.Matches(tableHtmlWithoutComments, RowPattern, ExpressionOptions)

		dataTable.Columns.AddRange(If(tableHtmlWithoutComments.Contains("<th"), ParseColumns(tableHtml), GenerateColumns(rowMatches)))

		ParseRows(rowMatches, dataTable)

		Return dataTable
	End Function

	''' <summary>
	''' Strip comments from an HTML stirng
	''' </summary>
	''' <param name="html">An HTML string potentially containing comments</param>
	''' <returns>The input HTML string with comments removed</returns>
	Private Shared Function WithoutComments(ByVal html As String) As String
		Return Regex.Replace(html, CommentPattern, String.Empty, ExpressionOptions)
	End Function

	''' <summary>
	''' Add a row to the input DataTable for each row match in the input MatchCollection
	''' </summary>
	''' <param name="rowMatches">A collection of all the rows to add to the DataTable</param>
	''' <param name="dataTable">The DataTable to which we add rows</param>
	Private Shared Sub ParseRows(ByVal rowMatches As MatchCollection, ByVal dataTable As DataTable)
		For Each rowMatch As Match In rowMatches
			' if the row contains header tags don't use it - it is a header not a row
			If Not rowMatch.Value.Contains("<th") Then
				Dim dataRow As DataRow = dataTable.NewRow()

				Dim cellMatches As MatchCollection = Regex.Matches(rowMatch.Value, CellPattern, ExpressionOptions)

				For columnIndex As Integer = 0 To cellMatches.Count - 1
					dataRow(columnIndex) = cellMatches(columnIndex).Groups(1).ToString()
				Next

				dataTable.Rows.Add(dataRow)
			End If
		Next
	End Sub

	''' <summary>
	''' Given a string containing an HTML table, parse the header cells to create a set of DataColumns
	''' which define the columns in a DataTable.
	''' </summary>
	''' <param name="tableHtml">An HTML string containing a single HTML table</param>
	''' <returns>A set of DataColumns based on the HTML table header cells</returns>
	Private Shared Function ParseColumns(ByVal tableHtml As String) As DataColumn()
		Dim headerMatches As MatchCollection = Regex.Matches(tableHtml, HeaderPattern, ExpressionOptions)

		Return (From headerMatch In headerMatches Select New DataColumn(headerMatch.Groups(1).ToString())).ToArray()
	End Function

	''' <summary>
	''' For tables which do not specify header cells we must generate DataColumns based on the number
	''' of cells in a row (we assume all rows have the same number of cells).
	''' </summary>
	''' <param name="rowMatches">A collection of all the rows in the HTML table we wish to generate columns for</param>
	''' <returns>A set of DataColumns based on the number of celss in the first row of the input HTML table</returns>
	Private Shared Function GenerateColumns(ByVal rowMatches As MatchCollection) As DataColumn()
		Dim columnCount As Integer = Regex.Matches(rowMatches(0).ToString(), CellPattern, ExpressionOptions).Count

		Return (From index In Enumerable.Range(0, columnCount) Select New DataColumn("Column " & Convert.ToString(index))).ToArray()
	End Function
End Class

Open in new window


The error was caused by the fact that the translator did not properly convert the Select portion of the linq statement's.

-saige-
0
 
LVL 67

Author Comment

by:sirbounty
ID: 40506869
Thanks - that seems to work, however I did have to convert the html string to lowercase (oddly enough this results in all cells as <TD).

Thanks for the help.
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Today I had a very interesting conundrum that had to get solved quickly. Needless to say, it wasn't resolved quickly because when we needed it we were very rushed, but as soon as the conference call was over and I took a step back I saw the correct …
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now