Solved

Crawler in C#

Posted on 2003-10-30
7
1,518 Views
Last Modified: 2008-02-01
I am writing an application in C#. There is table that contains lot of urls, what I want is, to take one url at a time and download the content until all of them downloaded.I think I should use thread when it send the request and download it.What to do when there is a url which is to be sent by post method.
   Please give me some rough idea how can I achieve in C#. I have little idea in VC++. Like I can use ISAPI session,connection...etc.
0
Comment
Question by:NavinKaushik
  • 3
  • 3
7 Comments
 
LVL 7

Expert Comment

by:psdavis
ID: 9650417
This is the code I use to run a cgi script and download an image.
Sorry I don't have time to give you more, running to airport in a few mins.

            using( WebClient pWebClient = new WebClient( ))
            {
               pWebClient.BaseAddress = @"http://" + this.Server;
               byte[] byImage = pWebClient.DownloadData( @"cgi-bin/getimage.cgi" );
            }

0
 

Author Comment

by:NavinKaushik
ID: 9656103
Thanks for your crucial time :) but I want code or API'S in C# ......
0
 
LVL 6

Accepted Solution

by:
purpleblob earned 250 total points
ID: 9659433
The code demonstrated by psdavis is C# and should fulfill your requirements - here's the same thing but showing the download of a page which might make it more in line with what you want to do with it.

byte[] buffer = null;

using(WebClient client = new WebClient())
{
   buffer = client.DownloadData("http://www.google.com");
}

Now we could convert the buffer to a string using System.Text.ASCIIEncoding.ASCII.GetString(buffer); and then find any URL's within it to also download. Also as psdavis has shown you could assign the base URL to the BaseAddress property and thus DownloadData will expect relative paths - this is obviously useful if you do want to download images etc. from the page that are shown using relative paths.

Hope this helps
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 

Author Comment

by:NavinKaushik
ID: 9662070
Thanks purpleblob. I will be much more thankful to you if you help me more. Actually I dont have any practical exp in .NET. Previously there was a desktop application in VC++ , multi-tired using MFC in UI , ATL com in middle layer, application was to crawl the urls present  in table(database), application download all the pages for those urls into a folder.
  First I create the urls Ex.
    If you login to your mail account through a webpage then it will display your inbox. What I do is
    www.abc.com/inbox.asp?login_name=navin+kaushik&pwd=xyz
   Now when I put this url
 byte[] buffer = null;

using(WebClient client = new WebClient())
{
   buffer = client.DownloadData("www.abc.com/inbox.asp?login_name=navin+kaushik&pwd=xyz");
}
 This is one of modules of my application.
 I think my objective is more clear to you now. My question is Should I go for C# Since later I may integrate it with web also.
 How can I make multi-tired application in C# and tell me the corresponding API's in .NET  of ISAPI in VC++ 6.0
  Thanks.
0
 
LVL 6

Expert Comment

by:purpleblob
ID: 9662393
You can use your existing COM controls within C# by selecting Add References from a Visual Studio project and then select the COM tab and either browse for your COM objects or select from the list of registered object. VS will then create a wrapper DLL (like #import in VC++).

I'm afraid I don't think you can write ISAPI filters in C# as C# DLLs doen't expose methods in the same way to the "standard" C DLL's. However this said, if you wanted to use managed C++ you could create ISAPI filters and then call C# objects thus you could write all the main logic in C# and just have the C++ DLL acts as a proxy to your C# code.

As for should you go for C#, I would say at the moment it sounds like C# will not offer you too much that your current system doesn't already have - however if you migrated your existing system to managed C++ you would then be able to work with the .NET libraries and also write C# objects which you could then call from managed C++.

Hope this helps
0
 

Author Comment

by:NavinKaushik
ID: 9670102
hmmm lot more confusing!!!!.
   Is there anything which I can't do in C# ( considering application from scratch) and I can do that in VC++.NET.
   I think this will solve my problem.
0
 
LVL 6

Expert Comment

by:purpleblob
ID: 9670217
Basicaly things like ISAPI filters which require standard style DLL's are not possible within C# at this time. Also C# does not create COM objects. However by using VC++.NET in conjunction you could write most/all of the functionality in C# DLL's then create VC++.NET wrappers around them. So this is very much achievable.

With regards your application, I'm not sure where ISAPI fits in with your current system (if at all) so this might not be an issue.

From what I understand of your application, i.e. you have a database of URL's, you get these URL's navigate to them and download whatever is contained at the URL - this can easily be done with C#. If you are also creating the web based application, i.e. at the URL then you could use ASP.NET with C# as your chosen language - so again probably no problems there.

Now, not wishing to confuse the issue fuether, I suppose what I was saying earlier about C# not offering you much that you don't currently have is this - if I were writing such an application from scratch I would definitely write it in C# and .NET using ADO.NET and ASP.NET. If all I was doing was adding a little bit of code to download data from URL's and the rest of the system was already written in C++ I'd probably use VC++.NET just for consistancy.

I hope I haven't confused matters further - it's difficult to say what's best for YOU to do as I do not know the ins and outs of your whole application.

Hope this helps
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Summary: Persistence is the capability of an application to store the state of objects and recover it when necessary. This article compares the two common types of serialization in aspects of data access, readability, and runtime cost. A ready-to…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now