Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Regex and extract information from webpage (scraping)

Posted on 2010-09-06
10
Medium Priority
?
492 Views
Last Modified: 2012-05-10
Hi,
I am trying to scrape the ISIN Number from yahoo finance pages and need some help.

I have built the url that I need but can not figure out how to scrape this information. I have attached a text file (which essentially is the web page) and here is the link to the page.

http://uk.finance.yahoo.com/q?s=prty&m=L&d=
0
Comment
Question by:nepaluz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 2
10 Comments
 
LVL 12

Expert Comment

by:ErezMor
ID: 33614125
download the html page, then
use something like ReadAllText function to read the html page's source, then
use a InStr function to find "ISIN" - the value your after is just infront of this location in the text string,
and is ended by a closing paretheses - which you may easily find too

hope this makes sense to you, quite some work is yet to be done...good luck
0
 
LVL 3

Expert Comment

by:smash_pants
ID: 33614134
you could use xpath to access the text for the ISIN:

/html/body/div/div[2]/div[4]/div[3]/div/div/span

If you really need a regex for it then here you go:
^.*ISIN (.{12}) \).*$
0
 
LVL 17

Author Comment

by:nepaluz
ID: 33615880
ErezMor thanks for the contrib. I will give it a bash andsee what I come up with.

smash_pants (what a name!)I really am interested in the regex but can not get it to work. Heres the code I am using (and don't laugh!)


Dim fxk = Regex.Matches(sr.ReadToEnd.ToString, "^.*ISIN (.{12}) \).*$")

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 17

Author Comment

by:nepaluz
ID: 33648943
OK - I have been un-able to implement either way suggested here, and though they may be correct, I am un-able to award poits for them. I am thus closing this thread and will try another one.
0
 
LVL 12

Expert Comment

by:ErezMor
ID: 33657588
since the last post from user was "i'll give it a try and get back to you...", i dont think his closing reason is justified.
i happen to have some actual experience in doing just what i suggested the user, so in terms of if it works or not, there's no doubt here.
unless i didnt understand his question, or his closing request reason...

Erez.
0
 
LVL 17

Author Comment

by:nepaluz
ID: 33657700
OK then erez, I have been unable to use the instr as suggested (I think I have to repeat the obvious), could you post some code to this end?
0
 
LVL 3

Expert Comment

by:smash_pants
ID: 33659002
I've tested this code and it works.
Sorry i haven't been around much... I'm moving house.

string result = null;
string url = "http://uk.finance.yahoo.com/q?s=prty&m=L&d=";
WebResponse response = null;
StreamReader reader = null;

try {
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
    request.Method = "GET";
    response = request.GetResponse();
    reader = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
    result = reader.ReadToEnd();
}
catch (Exception ex) {
    // handle error
    Console.WriteLine(ex.Message);
}
finally {
    if (reader != null)
        reader.Close();
    if (response != null)
        response.Close();
}
foreach (Match txt in Regex.Matches(result, @"^.*ISIN (.{12}) \).*$", RegexOptions.Multiline)) {
    Console.WriteLine(txt.Groups[1]);
}

Open in new window

0
 
LVL 3

Accepted Solution

by:
smash_pants earned 2000 total points
ID: 33659006
Here it is in VB.net

Dim result As String = Nothing
Dim url As String = "http://uk.finance.yahoo.com/q?s=prty&m=L&d="
Dim response As WebResponse = Nothing
Dim reader As StreamReader = Nothing

Try
	Dim request As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
	request.Method = "GET"
	response = request.GetResponse()
	reader = New StreamReader(response.GetResponseStream(), Encoding.UTF8)
	result = reader.ReadToEnd()
Catch ex As Exception
	' handle error
	Console.WriteLine(ex.Message)
Finally
	If reader IsNot Nothing Then
		reader.Close()
	End If
	If response IsNot Nothing Then
		response.Close()
	End If
End Try
For Each txt As Match In Regex.Matches(result, "^.*ISIN (.{12}) \).*$", RegexOptions.Multiline)
	Console.WriteLine(txt.Groups(1))
Next

Open in new window

0
 
LVL 17

Author Closing Comment

by:nepaluz
ID: 33659055
Thanks a lot and happy moving! If I may ask for a tweak, is it possible to have the regex NOT limiting the result string to 12 characters as you have it, but pich up to the closing bracket?

The ISIN (could?) be longer than 12 characters long and it would thus return an incomplete numer (or I am wrong?)

I have awarded the marks anyhow.
0
 
LVL 3

Expert Comment

by:smash_pants
ID: 33698833
in the regex:
^.*ISIN (.{12}) \).*$

just change the .{12} to what you need.

Capital letters an numbers at least 12 chars long:
[A-Z0-9]{12,}

any character ,any length:
.*


0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction When many people think of the WebBrowser (http://msdn.microsoft.com/en-us/library/2te2y1x6%28v=VS.85%29.aspx) control, they immediately think of a control which allows the viewing and navigation of web pages. While this is true, it's a…
If you need to start windows update installation remotely or as a scheduled task you will find this very helpful.
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
Want to learn how to record your desktop screen without having to use an outside camera. Click on this video and learn how to use the cool google extension called "Screencastify"! Step 1: Open a new google tab Step 2: Go to the left hand upper corn…
Suggested Courses

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question