Solved

Searching html files

Posted on 1997-07-12
1
156 Views
Last Modified: 2013-12-25
I'm trying to write a search engine in 'C'for several www pages. The idea i that the search engine will search through these pages and record the number of times the required word is found. It should also return the URL of the pages where the word was found. I've written an html form asking for the search word, but I'm not having much luck opening the required html files and searchig through them. I dont really know the best way to search through html files.  Can anyone please help ?  I originally posted this question on the C pogrammers questions, but was advised to try here instead.
               Phil H.
0
Comment
Question by:ee96m17
1 Comment
 
LVL 5

Accepted Solution

by:
icd earned 200 total points
ID: 1829037
I assume you are able to run cgi scripts on your server (always worth asking).

You can find several search engine scripts in C at the following URL.

http://www.cgi-resources.com/

Follow the links to 'scripts' 'C' and then Search Engines.

Don't discount scripts written in other languages (such as perl)

One further point I would make. You have two options. The first one is that the search takes place at the point the user submits the form. This is OK if there are a small number of pages that are updated frequently.

The second options is probably the most effective, you do the same as the big Internet search engines do. You have an independent process that periodically scans all your pages and compiles a database of key words. When the user submits the search form then you can go straight to the database to find the key words. This is *far* more efficient when the documents don't change very frequently compared to the number of search requests.

I think you will find scripts for both these approaches on the resourse I gave above.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Google  (Get  Users Email) 2 135
Chocolatey under PowerShell is not working properly 3 68
Change the background and font colors in Notepad++ 5 107
Selecting Right Partition 6 67
In this tutorial I will aim to show you how simple is making a small application in WhizBase, how to add, remove and update data in the DB. I will make a small address book application where you can add, browse, update and remove addresses. I wi…
In this tutorial I will show you how to provide a dynamic RTF document on your website generated with data from your database. For this tutorial you will need Microsoft Word or WordPad, WhizBase and Microsoft Access. In this tutorial I will show …
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

789 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question