Yogesh_Agarwal
asked on
How to Extract URL
hi,
I want to extract urls from google search engine. I want to extract morethan 5k urls.. its fully in java script so i cant sort anything. please give me some codes or resource to build it.
I want to extract urls from google search engine. I want to extract morethan 5k urls.. its fully in java script so i cant sort anything. please give me some codes or resource to build it.
ASKER
yeah milking urls from search engine and adding it in listbox.. i will send the string(Search) via programming and it shud give me all urls that are present in google for that phrase.
did you check here?
http://urenjoy.blogspot.com/2008/10/extract-links-from-string.html
Google provide a javascript/ajax API to do search requests, where the results are easier to read:
http://code.google.com/apis/ajaxsearch/
http://code.google.com/apis/ajaxsearch/
ASKER
i will be glad if somebody give me direct code. i am confused to how to use. i will accept their solution hwo gives me full code..
Please clarify the language you want this code in
It sounds like your are asking 2 questions which have been answered quite a few times in this website.
how do I make an http request (HttpWebRequest or WebClient class)
how do I scrape data from html (normally using Regex class)
It sounds like your are asking 2 questions which have been answered quite a few times in this website.
how do I make an http request (HttpWebRequest or WebClient class)
how do I scrape data from html (normally using Regex class)
ASKER
I don know how to code them. I am a Noob. So i want some correct code in VB.Net.
ASKER
anyone to give me code? i have code but fetches only 1st page in google.. its written using httpclient and webrequest object..
Post the code and maybe we can work on it.
ASKER
yeah sure..
Public Class cls2
Public Function GetResults(ByVal query As String) As Uri()
' Encode url and replace spaces with +
query = HttpUtility.UrlEncode(query)
' Build query string
Dim url As String = "http://www.google.de/search?q=" & query
' We use a Webclient to query and impose as "Internet Explorer 7"
Dim client As New WebClient()
client.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.04506.30; .NET CLR 3.0.04506.648)")
' Read the html-page and select the root-node
Dim doc As New HtmlAgilityPack.HtmlDocument
doc.Load(client.OpenRead(url))
Dim rootNode As HtmlNode = doc.DocumentNode
' Now select all links by using xpath
Dim resultNodes As HtmlNodeCollection = rootNode.SelectNodes("//a[@class='l']")
' Loop over all results
Dim links As Uri() = New Uri(resultNodes.Count - 1) {}
For i As Integer = 0 To resultNodes.Count - 1
links(i) = New Uri(resultNodes(i).Attributes("href").Value)
Next
Return (links)
End Function
ListBox1.Items.Clear()
Dim results() As Uri
Dim s As String = TextBox1.Text.ToString
results = c.GetResults(s)
For Each result As Uri In results
ListBox1.Items.Add(result)
Next
End Sub
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
hey i got a error.. i have posted picture..
Untitled.jpg
Untitled.jpg
ASKER
please give me 2-4 suggestions of code.. i have to submit my project in next few days..
ASKER
i converted URI to string and used array.. now all is perfect.. thanks for help!!!!!!!!!!!!!!!!!!!
ASKER
nice solution.. very fast..
ASKER
how to make webclient to use proxy? i am getting blocked by google for making such request..
As this is indicating that this activity is breaking Googles TOS (Terms Of Service) I would rather not help in developing a way to try and fool Google.
ASKER
ok man can u tell me wherte i can find consistant and reliable proxies (ip and port)???
ASKER
mate y timeout options in not coming? client.timeout is not coming........
What is in javascript?
You placed the question in a vb.net and windows zone so is this going to be a windows application?
By search engine, are you talking about scraping urls from search engine results for particular search phrases?