ASP crawler

Hi
I need to know if there is any free ASP code or package available, which can crawl a given web site (link to link) and create/update a database of all it pages. I'm going to create one but before I start, I'd like to see other's experiences.
Your comments will be appreciated,
Huji
LVL 14
hujiAsked:
Who is Participating?
 
cjinsocal581Commented:
0
 
aprestoCommented:
Hey Huji, how you doing?

I havent seen one around, i looked into it a little bit a whileback but didnt get very far.  I suppose it would involved scanning directories with FSO for the pages but i dont know how you would identify the link for link relationships - how are you planning on doing it?
0
 
hongjunCommented:
I have not tried it but I somehow found this in my bookmarks :)
http://www.asp101.com/articles/chris/spider/default.asp

hongjun
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
hujiAuthor Commented:
Hi my friends,
@hongjun: Good link, though I've found it prior to reading your message here. But the fact is, I want to see one in action. The article speaks about how to do it, but is not giving an example to be downloaded.
@apresto: The article linked above by hongjun is a complete answer to your question. A brief answer would be like this: The crawler should look inside the first page of my site, store the text (i.e. the returned content from the web site, after stripping it's HTML tags and so) in a database (second column! the first column will be the URL of the page;) and find every link on that page, and store them in a list. When it finishes one page, it should do the same thing, with the next page in the list. This way, it will search all pages of my site, and store the resulted HTML in a database. So I can build a search engine for my site based on this.
You may say I can do all the same with indexing service! I say no! The indexing service searching is not that customizable. I can control the results of this type of search engine (for example control which page to apear on the top of the results), I can control which parts of the HTML of my pages to be stored in the database, in which parts to not to (to prevent getting results which are only happening because the keyword has been found in an advertisement in the bottom of the page!) and there are many more aspects where this type of search engine is prefered to any other. This can be called in mini-Google in my site. (And perhaps the hard part is to build a PageRank system! *L* )

@both: It is acceptable, that coding the whole system in ASP (only) can result in slow, CPU-consuming results, and I can simply face a page timeout just in the begining, or worst, a server crash down! For the same reason, I prefer the crawler system (and not the search engine) to be coded another way, for example in a COM object or so. I have very little ASP.Net understanding, but I can manage it to some extent. So if you post a link which is about a ASP.Net solution, it will be the same valuable to me.

Finally, I'd be glad if you can help me with this question as well:
http://www.experts-exchange.com/Web/Web_Languages/ASP/Q_21399124.html

Special thanks,
Huji
0
 
cjinsocal581Commented:
One more thing, it uses both Access and SQL and utilizes the Full text search of SQL.

Read the details and you can see it work.
0
 
hujiAuthor Commented:
Yider is a great example. I've focused on it by now. (Actually I didn't know it before you gave me the link on my other open question here: http://www.experts-exchange.com/Web/Web_Languages/ASP/Q_21399124.html)
I like more examples, and as I stated before, ASP.Net examples are welcome, if any.
Thanks a lot CJ.
Huji
0
 
cjinsocal581Commented:
Keep in mind that Yider is fully customizable. In fact I have made it to where you can type in the links you want parsed. The nice thing abour Yider is the Ranking it does on the page searches.
0
 
hujiAuthor Commented:
Again I should repeat that I'm not going to use Yider, or any other example. I'm going to develop one on my own in future, so I need to know others experience about it.
Huji
0
 
hujiAuthor Commented:
Thanks all, for your help
Huji
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.