?
Solved

Crawler in C#

Posted on 2003-10-30
7
Medium Priority
?
1,548 Views
Last Modified: 2008-02-01
I am writing an application in C#. There is table that contains lot of urls, what I want is, to take one url at a time and download the content until all of them downloaded.I think I should use thread when it send the request and download it.What to do when there is a url which is to be sent by post method.
   Please give me some rough idea how can I achieve in C#. I have little idea in VC++. Like I can use ISAPI session,connection...etc.
0
Comment
Question by:NavinKaushik
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 7

Expert Comment

by:psdavis
ID: 9650417
This is the code I use to run a cgi script and download an image.
Sorry I don't have time to give you more, running to airport in a few mins.

            using( WebClient pWebClient = new WebClient( ))
            {
               pWebClient.BaseAddress = @"http://" + this.Server;
               byte[] byImage = pWebClient.DownloadData( @"cgi-bin/getimage.cgi" );
            }

0
 

Author Comment

by:NavinKaushik
ID: 9656103
Thanks for your crucial time :) but I want code or API'S in C# ......
0
 
LVL 6

Accepted Solution

by:
purpleblob earned 1000 total points
ID: 9659433
The code demonstrated by psdavis is C# and should fulfill your requirements - here's the same thing but showing the download of a page which might make it more in line with what you want to do with it.

byte[] buffer = null;

using(WebClient client = new WebClient())
{
   buffer = client.DownloadData("http://www.google.com");
}

Now we could convert the buffer to a string using System.Text.ASCIIEncoding.ASCII.GetString(buffer); and then find any URL's within it to also download. Also as psdavis has shown you could assign the base URL to the BaseAddress property and thus DownloadData will expect relative paths - this is obviously useful if you do want to download images etc. from the page that are shown using relative paths.

Hope this helps
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:NavinKaushik
ID: 9662070
Thanks purpleblob. I will be much more thankful to you if you help me more. Actually I dont have any practical exp in .NET. Previously there was a desktop application in VC++ , multi-tired using MFC in UI , ATL com in middle layer, application was to crawl the urls present  in table(database), application download all the pages for those urls into a folder.
  First I create the urls Ex.
    If you login to your mail account through a webpage then it will display your inbox. What I do is
    www.abc.com/inbox.asp?login_name=navin+kaushik&pwd=xyz
   Now when I put this url
 byte[] buffer = null;

using(WebClient client = new WebClient())
{
   buffer = client.DownloadData("www.abc.com/inbox.asp?login_name=navin+kaushik&pwd=xyz");
}
 This is one of modules of my application.
 I think my objective is more clear to you now. My question is Should I go for C# Since later I may integrate it with web also.
 How can I make multi-tired application in C# and tell me the corresponding API's in .NET  of ISAPI in VC++ 6.0
  Thanks.
0
 
LVL 6

Expert Comment

by:purpleblob
ID: 9662393
You can use your existing COM controls within C# by selecting Add References from a Visual Studio project and then select the COM tab and either browse for your COM objects or select from the list of registered object. VS will then create a wrapper DLL (like #import in VC++).

I'm afraid I don't think you can write ISAPI filters in C# as C# DLLs doen't expose methods in the same way to the "standard" C DLL's. However this said, if you wanted to use managed C++ you could create ISAPI filters and then call C# objects thus you could write all the main logic in C# and just have the C++ DLL acts as a proxy to your C# code.

As for should you go for C#, I would say at the moment it sounds like C# will not offer you too much that your current system doesn't already have - however if you migrated your existing system to managed C++ you would then be able to work with the .NET libraries and also write C# objects which you could then call from managed C++.

Hope this helps
0
 

Author Comment

by:NavinKaushik
ID: 9670102
hmmm lot more confusing!!!!.
   Is there anything which I can't do in C# ( considering application from scratch) and I can do that in VC++.NET.
   I think this will solve my problem.
0
 
LVL 6

Expert Comment

by:purpleblob
ID: 9670217
Basicaly things like ISAPI filters which require standard style DLL's are not possible within C# at this time. Also C# does not create COM objects. However by using VC++.NET in conjunction you could write most/all of the functionality in C# DLL's then create VC++.NET wrappers around them. So this is very much achievable.

With regards your application, I'm not sure where ISAPI fits in with your current system (if at all) so this might not be an issue.

From what I understand of your application, i.e. you have a database of URL's, you get these URL's navigate to them and download whatever is contained at the URL - this can easily be done with C#. If you are also creating the web based application, i.e. at the URL then you could use ASP.NET with C# as your chosen language - so again probably no problems there.

Now, not wishing to confuse the issue fuether, I suppose what I was saying earlier about C# not offering you much that you don't currently have is this - if I were writing such an application from scratch I would definitely write it in C# and .NET using ADO.NET and ASP.NET. If all I was doing was adding a little bit of code to download data from URL's and the rest of the system was already written in C++ I'd probably use VC++.NET just for consistancy.

I hope I haven't confused matters further - it's difficult to say what's best for YOU to do as I do not know the ins and outs of your whole application.

Hope this helps
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Extention Methods in C# 3.0 by Ivo Stoykov C# 3.0 offers extension methods. They allow extending existing classes without changing the class's source code or relying on inheritance. These are static methods invoked as instance method. This…
This article aims to explain the working of CircularLogArchiver. This tool was designed to solve the buildup of log file in cases where systems do not support circular logging or where circular logging is not enabled
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
Suggested Courses
Course of the Month12 days, 10 hours left to enroll

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question