[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

Trouble with converted code

Posted on 2014-12-16
6
Medium Priority
?
159 Views
Last Modified: 2016-02-18
I was trying to find some parse code for an html table and came across this one:
http://blog.hypercomplex.co.uk/index.php/2010/05/parsing-html-tables-into-system-data-datatable/

But I need it in vb.net, so I used a popular converter and everything but the following two functions seems to have come over properly...
Any idea how to correct these two hiccups?


Private Shared Function ParseColumns(tableHtml As String) As DataColumn()
        Dim headerMatches As MatchCollection = Regex.Matches(tableHtml, HeaderPattern, ExpressionOptions)

        Return (From headerMatch In headerMatchesNew DataColumn(headerMatch.Groups(1).ToString())).ToArray()
    End Function

Open in new window

New is underlined " ')' expected" (although that was after I added a space after headerMatches)

    ''' <summary>
    ''' For tables which do not specify header cells we must generate DataColumns based on the number
    ''' of cells in a row (we assume all rows have the same number of cells).
    ''' </summary>
    ''' <param name="rowMatches">A collection of all the rows in the HTML table we wish to generate columns for</param>
    ''' <returns>A set of DataColumns based on the number of celss in the first row of the input HTML table</returns>
    Private Shared Function GenerateColumns(rowMatches As MatchCollection) As DataColumn()
        Dim columnCount As Integer = Regex.Matches(rowMatches(0).ToString(), CellPattern, ExpressionOptions).Count

        Return (From index In Enumerable.Range(0, columnCount)New DataColumn("Column " & Convert.ToString(index))).ToArray()
    End Function

Open in new window

New is underlined " ')' expected"
0
Comment
Question by:sirbounty
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
6 Comments
 
LVL 18

Expert Comment

by:UnifiedIS
ID: 40503306
Sounds like it wants () around the arguments for the headerMatches
From headerMatch In headerMatches (New DataColumn(headerMatch.Groups(1).ToString())).ToArray())
0
 
LVL 67

Author Comment

by:sirbounty
ID: 40503356
Return (From headerMatch In headerMatches(New DataColumn(headerMatch.Groups(1).ToString())).ToArray())
states headermatch is not declared
and

Return (From index In Enumerable.Range(0, columnCount)(New DataColumn("Column " & Convert.ToString(index))).ToArray())
states index is not accessible in this context because it is 'Friend'
0
 
LVL 18

Expert Comment

by:UnifiedIS
ID: 40503386
That's a good message, what is headermatch, when is it declared?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 67

Author Comment

by:sirbounty
ID: 40503578
headerMatch isn't declared anywhere that I can see...
0
 
LVL 34

Accepted Solution

by:
it_saige earned 2000 total points
ID: 40503700
Here is your converted class:
Imports System.Text.RegularExpressions

''' <summary>
''' HtmlTableParser parses the contents of an html string into a System.Data DataSet or DataTable.
''' </summary>
Public Class HtmlTableParser
	Private Const ExpressionOptions As RegexOptions = RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase

	Private Const CommentPattern As String = "<!--(.*?)-->"
	Private Const TablePattern As String = "<table[^>]*>(.*?)</table>"
	Private Const HeaderPattern As String = "<th[^>]*>(.*?)</th>"
	Private Const RowPattern As String = "<tr[^>]*>(.*?)</tr>"
	Private Const CellPattern As String = "<td[^>]*>(.*?)</td>"

	''' <summary>
	''' Given an HTML string containing n table tables, parse them into a DataSet containing n DataTables.
	''' </summary>
	''' <param name="html">An HTML string containing n HTML tables</param>
	''' <returns>A DataSet containing a DataTable for each HTML table in the input HTML</returns>
	Public Shared Function ParseDataSet(ByVal html As String) As DataSet
		Dim dataSet As New DataSet()
		Dim tableMatches As MatchCollection = Regex.Matches(WithoutComments(html), TablePattern, ExpressionOptions)

		For Each tableMatch As Match In tableMatches
			dataSet.Tables.Add(ParseTable(tableMatch.Value))
		Next

		Return dataSet
	End Function

	''' <summary>
	''' Given an HTML string containing a single table, parse that table to form a DataTable.
	''' </summary>
	''' <param name="tableHtml">An HTML string containing a single HTML table</param>
	''' <returns>A DataTable which matches the input HTML table</returns>
	Public Shared Function ParseTable(ByVal tableHtml As String) As DataTable
		Dim tableHtmlWithoutComments As String = WithoutComments(tableHtml)

		Dim dataTable As New DataTable()

		Dim rowMatches As MatchCollection = Regex.Matches(tableHtmlWithoutComments, RowPattern, ExpressionOptions)

		dataTable.Columns.AddRange(If(tableHtmlWithoutComments.Contains("<th"), ParseColumns(tableHtml), GenerateColumns(rowMatches)))

		ParseRows(rowMatches, dataTable)

		Return dataTable
	End Function

	''' <summary>
	''' Strip comments from an HTML stirng
	''' </summary>
	''' <param name="html">An HTML string potentially containing comments</param>
	''' <returns>The input HTML string with comments removed</returns>
	Private Shared Function WithoutComments(ByVal html As String) As String
		Return Regex.Replace(html, CommentPattern, String.Empty, ExpressionOptions)
	End Function

	''' <summary>
	''' Add a row to the input DataTable for each row match in the input MatchCollection
	''' </summary>
	''' <param name="rowMatches">A collection of all the rows to add to the DataTable</param>
	''' <param name="dataTable">The DataTable to which we add rows</param>
	Private Shared Sub ParseRows(ByVal rowMatches As MatchCollection, ByVal dataTable As DataTable)
		For Each rowMatch As Match In rowMatches
			' if the row contains header tags don't use it - it is a header not a row
			If Not rowMatch.Value.Contains("<th") Then
				Dim dataRow As DataRow = dataTable.NewRow()

				Dim cellMatches As MatchCollection = Regex.Matches(rowMatch.Value, CellPattern, ExpressionOptions)

				For columnIndex As Integer = 0 To cellMatches.Count - 1
					dataRow(columnIndex) = cellMatches(columnIndex).Groups(1).ToString()
				Next

				dataTable.Rows.Add(dataRow)
			End If
		Next
	End Sub

	''' <summary>
	''' Given a string containing an HTML table, parse the header cells to create a set of DataColumns
	''' which define the columns in a DataTable.
	''' </summary>
	''' <param name="tableHtml">An HTML string containing a single HTML table</param>
	''' <returns>A set of DataColumns based on the HTML table header cells</returns>
	Private Shared Function ParseColumns(ByVal tableHtml As String) As DataColumn()
		Dim headerMatches As MatchCollection = Regex.Matches(tableHtml, HeaderPattern, ExpressionOptions)

		Return (From headerMatch In headerMatches Select New DataColumn(headerMatch.Groups(1).ToString())).ToArray()
	End Function

	''' <summary>
	''' For tables which do not specify header cells we must generate DataColumns based on the number
	''' of cells in a row (we assume all rows have the same number of cells).
	''' </summary>
	''' <param name="rowMatches">A collection of all the rows in the HTML table we wish to generate columns for</param>
	''' <returns>A set of DataColumns based on the number of celss in the first row of the input HTML table</returns>
	Private Shared Function GenerateColumns(ByVal rowMatches As MatchCollection) As DataColumn()
		Dim columnCount As Integer = Regex.Matches(rowMatches(0).ToString(), CellPattern, ExpressionOptions).Count

		Return (From index In Enumerable.Range(0, columnCount) Select New DataColumn("Column " & Convert.ToString(index))).ToArray()
	End Function
End Class

Open in new window


The error was caused by the fact that the translator did not properly convert the Select portion of the linq statement's.

-saige-
0
 
LVL 67

Author Comment

by:sirbounty
ID: 40506869
Thanks - that seems to work, however I did have to convert the html string to lowercase (oddly enough this results in all cells as <TD).

Thanks for the help.
0

Featured Post

Veeam Disaster Recovery in Microsoft Azure

Veeam PN for Microsoft Azure is a FREE solution designed to simplify and automate the setup of a DR site in Microsoft Azure using lightweight software-defined networking. It reduces the complexity of VPN deployments and is designed for businesses of ALL sizes.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many of us here at EE write code. Many of us write exceptional code; just as many of us write exception-prone code. As we all should know, exceptions are a mechanism for handling errors which are typically out of our control. From database errors, t…
For those of you who don't follow the news, or just happen to live under rocks, Microsoft Research released a beta SDK (http://www.microsoft.com/en-us/download/details.aspx?id=27876) for the Xbox 360 Kinect. If you don't know what a Kinect is (http:…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question