?
Solved

Trouble with converted code

Posted on 2014-12-16
6
Medium Priority
?
150 Views
Last Modified: 2016-02-18
I was trying to find some parse code for an html table and came across this one:
http://blog.hypercomplex.co.uk/index.php/2010/05/parsing-html-tables-into-system-data-datatable/

But I need it in vb.net, so I used a popular converter and everything but the following two functions seems to have come over properly...
Any idea how to correct these two hiccups?


Private Shared Function ParseColumns(tableHtml As String) As DataColumn()
        Dim headerMatches As MatchCollection = Regex.Matches(tableHtml, HeaderPattern, ExpressionOptions)

        Return (From headerMatch In headerMatchesNew DataColumn(headerMatch.Groups(1).ToString())).ToArray()
    End Function

Open in new window

New is underlined " ')' expected" (although that was after I added a space after headerMatches)

    ''' <summary>
    ''' For tables which do not specify header cells we must generate DataColumns based on the number
    ''' of cells in a row (we assume all rows have the same number of cells).
    ''' </summary>
    ''' <param name="rowMatches">A collection of all the rows in the HTML table we wish to generate columns for</param>
    ''' <returns>A set of DataColumns based on the number of celss in the first row of the input HTML table</returns>
    Private Shared Function GenerateColumns(rowMatches As MatchCollection) As DataColumn()
        Dim columnCount As Integer = Regex.Matches(rowMatches(0).ToString(), CellPattern, ExpressionOptions).Count

        Return (From index In Enumerable.Range(0, columnCount)New DataColumn("Column " & Convert.ToString(index))).ToArray()
    End Function

Open in new window

New is underlined " ')' expected"
0
Comment
Question by:sirbounty
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 18

Expert Comment

by:UnifiedIS
ID: 40503306
Sounds like it wants () around the arguments for the headerMatches
From headerMatch In headerMatches (New DataColumn(headerMatch.Groups(1).ToString())).ToArray())
0
 
LVL 67

Author Comment

by:sirbounty
ID: 40503356
Return (From headerMatch In headerMatches(New DataColumn(headerMatch.Groups(1).ToString())).ToArray())
states headermatch is not declared
and

Return (From index In Enumerable.Range(0, columnCount)(New DataColumn("Column " & Convert.ToString(index))).ToArray())
states index is not accessible in this context because it is 'Friend'
0
 
LVL 18

Expert Comment

by:UnifiedIS
ID: 40503386
That's a good message, what is headermatch, when is it declared?
0
How To Reduce Deployment Times With Pre-Baked AMIs

Even if we can't include all the files in the base image, we can sometimes include some of the larger files that we would otherwise have to download, and we can also sometimes remove the most time-consuming steps. This can help a lot with reducing deployment times.

 
LVL 67

Author Comment

by:sirbounty
ID: 40503578
headerMatch isn't declared anywhere that I can see...
0
 
LVL 34

Accepted Solution

by:
it_saige earned 2000 total points
ID: 40503700
Here is your converted class:
Imports System.Text.RegularExpressions

''' <summary>
''' HtmlTableParser parses the contents of an html string into a System.Data DataSet or DataTable.
''' </summary>
Public Class HtmlTableParser
	Private Const ExpressionOptions As RegexOptions = RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase

	Private Const CommentPattern As String = "<!--(.*?)-->"
	Private Const TablePattern As String = "<table[^>]*>(.*?)</table>"
	Private Const HeaderPattern As String = "<th[^>]*>(.*?)</th>"
	Private Const RowPattern As String = "<tr[^>]*>(.*?)</tr>"
	Private Const CellPattern As String = "<td[^>]*>(.*?)</td>"

	''' <summary>
	''' Given an HTML string containing n table tables, parse them into a DataSet containing n DataTables.
	''' </summary>
	''' <param name="html">An HTML string containing n HTML tables</param>
	''' <returns>A DataSet containing a DataTable for each HTML table in the input HTML</returns>
	Public Shared Function ParseDataSet(ByVal html As String) As DataSet
		Dim dataSet As New DataSet()
		Dim tableMatches As MatchCollection = Regex.Matches(WithoutComments(html), TablePattern, ExpressionOptions)

		For Each tableMatch As Match In tableMatches
			dataSet.Tables.Add(ParseTable(tableMatch.Value))
		Next

		Return dataSet
	End Function

	''' <summary>
	''' Given an HTML string containing a single table, parse that table to form a DataTable.
	''' </summary>
	''' <param name="tableHtml">An HTML string containing a single HTML table</param>
	''' <returns>A DataTable which matches the input HTML table</returns>
	Public Shared Function ParseTable(ByVal tableHtml As String) As DataTable
		Dim tableHtmlWithoutComments As String = WithoutComments(tableHtml)

		Dim dataTable As New DataTable()

		Dim rowMatches As MatchCollection = Regex.Matches(tableHtmlWithoutComments, RowPattern, ExpressionOptions)

		dataTable.Columns.AddRange(If(tableHtmlWithoutComments.Contains("<th"), ParseColumns(tableHtml), GenerateColumns(rowMatches)))

		ParseRows(rowMatches, dataTable)

		Return dataTable
	End Function

	''' <summary>
	''' Strip comments from an HTML stirng
	''' </summary>
	''' <param name="html">An HTML string potentially containing comments</param>
	''' <returns>The input HTML string with comments removed</returns>
	Private Shared Function WithoutComments(ByVal html As String) As String
		Return Regex.Replace(html, CommentPattern, String.Empty, ExpressionOptions)
	End Function

	''' <summary>
	''' Add a row to the input DataTable for each row match in the input MatchCollection
	''' </summary>
	''' <param name="rowMatches">A collection of all the rows to add to the DataTable</param>
	''' <param name="dataTable">The DataTable to which we add rows</param>
	Private Shared Sub ParseRows(ByVal rowMatches As MatchCollection, ByVal dataTable As DataTable)
		For Each rowMatch As Match In rowMatches
			' if the row contains header tags don't use it - it is a header not a row
			If Not rowMatch.Value.Contains("<th") Then
				Dim dataRow As DataRow = dataTable.NewRow()

				Dim cellMatches As MatchCollection = Regex.Matches(rowMatch.Value, CellPattern, ExpressionOptions)

				For columnIndex As Integer = 0 To cellMatches.Count - 1
					dataRow(columnIndex) = cellMatches(columnIndex).Groups(1).ToString()
				Next

				dataTable.Rows.Add(dataRow)
			End If
		Next
	End Sub

	''' <summary>
	''' Given a string containing an HTML table, parse the header cells to create a set of DataColumns
	''' which define the columns in a DataTable.
	''' </summary>
	''' <param name="tableHtml">An HTML string containing a single HTML table</param>
	''' <returns>A set of DataColumns based on the HTML table header cells</returns>
	Private Shared Function ParseColumns(ByVal tableHtml As String) As DataColumn()
		Dim headerMatches As MatchCollection = Regex.Matches(tableHtml, HeaderPattern, ExpressionOptions)

		Return (From headerMatch In headerMatches Select New DataColumn(headerMatch.Groups(1).ToString())).ToArray()
	End Function

	''' <summary>
	''' For tables which do not specify header cells we must generate DataColumns based on the number
	''' of cells in a row (we assume all rows have the same number of cells).
	''' </summary>
	''' <param name="rowMatches">A collection of all the rows in the HTML table we wish to generate columns for</param>
	''' <returns>A set of DataColumns based on the number of celss in the first row of the input HTML table</returns>
	Private Shared Function GenerateColumns(ByVal rowMatches As MatchCollection) As DataColumn()
		Dim columnCount As Integer = Regex.Matches(rowMatches(0).ToString(), CellPattern, ExpressionOptions).Count

		Return (From index In Enumerable.Range(0, columnCount) Select New DataColumn("Column " & Convert.ToString(index))).ToArray()
	End Function
End Class

Open in new window


The error was caused by the fact that the translator did not properly convert the Select portion of the linq statement's.

-saige-
0
 
LVL 67

Author Comment

by:sirbounty
ID: 40506869
Thanks - that seems to work, however I did have to convert the html string to lowercase (oddly enough this results in all cells as <TD).

Thanks for the help.
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor (https://www.adremsoft.com/). Top Charts is a view in which you can set seve…
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…
Suggested Courses

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question