Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Parse HTML for specific string pattern

Posted on 2014-03-09
3
Medium Priority
?
498 Views
Last Modified: 2014-03-30
Greetings,
I am trying to compose a query in vb.net that will parse a website looking for all the strings in a particular pattern and placing those strings in a collection:
<div class="comment-author">
			 <img  src="/images/avatar.jpg" class="avatar photo"   width="44">
			<cite class="fn"><a href='/bio.html' class='url'>User1</a></cite>						</div>
						<div class="comment_content">
			<p>Sample comment 1.</p>
			</div>
<div class="comment-author">
			 <img  src="/images/avatar.jpg" class="avatar photo"   width="44">
			<cite class="fn"><a href='/bio.html' class='url'>User2</a></cite>						</div>
						<div class="comment_content">
			<p>Sample comment 2.</p>
			</div>
<div class="comment-author">
			 <img  src="/images/avatar.jpg" class="avatar photo"   width="44">
			<cite class="fn"><a href='/bio.html' class='url'>User3</a></cite>						</div>
						<div class="comment_content">
			<p>Sample comment 3.</p>
			</div>

Open in new window

I want to pull out:

User1   Sample comment 1.
User2   Sample comment 2.
User3   Sample comment 3.

Thanks in advance.

M
0
Comment
Question by:MaxKroy
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 25

Expert Comment

by:apeter
ID: 39917051
Can't you use Linq to xml to parse the xml ?

Use XDocument to parse the xml.  http://msdn.microsoft.com/en-us/library/bb918016.aspx
0
 
LVL 23

Accepted Solution

by:
Ioannis Paraskevopoulos earned 2000 total points
ID: 39917105
You may use HtmlAgilityPack (available on Nuget). If you do not use NuGet you may get the binaries from CodePlex .

You may check the following sample code that gets an Enumerable of Anonymous objects that have a User and a Comment properties. Use it as you like:

	Dim html As String
	html = _
	"<div class=""comment-author"">" + _
	"	<img  src=""/images/avatar.jpg"" class=""avatar photo""   width=""44"">" + _
	"	<cite class=""fn""><a href='/bio.html' class='url'>User1</a></cite>		" + _				
	"</div>" + _
	"<div class=""comment_content"">" + _
	"	<p>Sample comment 1.</p>" + _
	"</div>" + _
	"<div class=""comment-author"">" + _
	"		 <img  src=""/images/avatar.jpg"" class=""avatar photo""   width=""44"">" + _
	"		<cite class=""fn""><a href='/bio.html' class='url'>User2</a></cite>" + _
	"</div>" + _
	"<div class=""comment_content"">" + _
	"		<p>Sample comment 2.</p>" + _
	"</div>" + _
	"<div class=""comment-author"">" + _
	"	<img  src=""/images/avatar.jpg"" class=""avatar photo""   width=""44"">" + _
	"	<cite class=""fn""><a href='/bio.html' class='url'>User3</a></cite>" + _				
	"</div>" + _
	"<div class=""comment_content"">" + _
	"	<p>Sample comment 3.</p>" + _
	"</div>"
	
	Dim htmlDoc = New HtmlAgilityPack.HtmlDocument
	htmlDoc.LoadHtml(html)

	Dim Result  = htmlDoc.DocumentNode.Elements("div").Where(Function(x) x.Attributes("class").Value = "comment-author").Select(Function(x) New With {.User = x.Element("cite").InnerText, .Comment = x.NextSibling.InnerText.Trim})
	For Each obj In Result
		Console.WriteLine("User={0}, Comment={1}",obj.User, obj.Comment)
	Next

Open in new window


Giannis
0

Featured Post

Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many of us here at EE write code. Many of us write exceptional code; just as many of us write exception-prone code. As we all should know, exceptions are a mechanism for handling errors which are typically out of our control. From database errors, t…
Parsing a CSV file is a task that we are confronted with regularly, and although there are a vast number of means to do this, as a newbie, the field can be confusing and the tools can seem complex. A simple solution to parsing a customized CSV fi…
This course is ideal for IT System Administrators working with VMware vSphere and its associated products in their company infrastructure. This course teaches you how to install and maintain this virtualization technology to store data, prevent vuln…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question