Solved

Parse HTML for specific string pattern

Posted on 2014-03-09
3
491 Views
Last Modified: 2014-03-30
Greetings,
I am trying to compose a query in vb.net that will parse a website looking for all the strings in a particular pattern and placing those strings in a collection:
<div class="comment-author">
			 <img  src="/images/avatar.jpg" class="avatar photo"   width="44">
			<cite class="fn"><a href='/bio.html' class='url'>User1</a></cite>						</div>
						<div class="comment_content">
			<p>Sample comment 1.</p>
			</div>
<div class="comment-author">
			 <img  src="/images/avatar.jpg" class="avatar photo"   width="44">
			<cite class="fn"><a href='/bio.html' class='url'>User2</a></cite>						</div>
						<div class="comment_content">
			<p>Sample comment 2.</p>
			</div>
<div class="comment-author">
			 <img  src="/images/avatar.jpg" class="avatar photo"   width="44">
			<cite class="fn"><a href='/bio.html' class='url'>User3</a></cite>						</div>
						<div class="comment_content">
			<p>Sample comment 3.</p>
			</div>

Open in new window

I want to pull out:

User1   Sample comment 1.
User2   Sample comment 2.
User3   Sample comment 3.

Thanks in advance.

M
0
Comment
Question by:MaxKroy
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 25

Expert Comment

by:apeter
ID: 39917051
Can't you use Linq to xml to parse the xml ?

Use XDocument to parse the xml.  http://msdn.microsoft.com/en-us/library/bb918016.aspx
0
 
LVL 23

Accepted Solution

by:
Ioannis Paraskevopoulos earned 500 total points
ID: 39917105
You may use HtmlAgilityPack (available on Nuget). If you do not use NuGet you may get the binaries from CodePlex .

You may check the following sample code that gets an Enumerable of Anonymous objects that have a User and a Comment properties. Use it as you like:

	Dim html As String
	html = _
	"<div class=""comment-author"">" + _
	"	<img  src=""/images/avatar.jpg"" class=""avatar photo""   width=""44"">" + _
	"	<cite class=""fn""><a href='/bio.html' class='url'>User1</a></cite>		" + _				
	"</div>" + _
	"<div class=""comment_content"">" + _
	"	<p>Sample comment 1.</p>" + _
	"</div>" + _
	"<div class=""comment-author"">" + _
	"		 <img  src=""/images/avatar.jpg"" class=""avatar photo""   width=""44"">" + _
	"		<cite class=""fn""><a href='/bio.html' class='url'>User2</a></cite>" + _
	"</div>" + _
	"<div class=""comment_content"">" + _
	"		<p>Sample comment 2.</p>" + _
	"</div>" + _
	"<div class=""comment-author"">" + _
	"	<img  src=""/images/avatar.jpg"" class=""avatar photo""   width=""44"">" + _
	"	<cite class=""fn""><a href='/bio.html' class='url'>User3</a></cite>" + _				
	"</div>" + _
	"<div class=""comment_content"">" + _
	"	<p>Sample comment 3.</p>" + _
	"</div>"
	
	Dim htmlDoc = New HtmlAgilityPack.HtmlDocument
	htmlDoc.LoadHtml(html)

	Dim Result  = htmlDoc.DocumentNode.Elements("div").Where(Function(x) x.Attributes("class").Value = "comment-author").Select(Function(x) New With {.User = x.Element("cite").InnerText, .Comment = x.NextSibling.InnerText.Trim})
	For Each obj In Result
		Console.WriteLine("User={0}, Comment={1}",obj.User, obj.Comment)
	Next

Open in new window


Giannis
0

Featured Post

Turn Insights into Action

Communication across every corner of your business is essential to increase the velocity of your application delivery and support pipeline. Automate, standardize, and contextualize your communication processes with xMatters.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The ECB site provides FX rates for major currencies since its inception in 1999 in the form of an XML feed. The files have the following format (reducted for brevity) (CODE) There are three files available HERE (http://www.ecb.europa.eu/stats/exch…
This article shows how to deploy dynamic backgrounds to computers depending on the aspect ratio of display
Come and listen to Percona CEO Peter Zaitsev discuss what’s new in Percona open source software, including Percona Server for MySQL (https://www.percona.com/software/mysql-database/percona-server) and MongoDB (https://www.percona.com/software/mongo-…
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…

690 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question