Solved

Web Site Search Engine software

Posted on 2004-09-08
8
283 Views
Last Modified: 2013-11-19
I am running IIS 5.0 on Windows 2000 Server.  I found a packaged search engine software called 'Zoom Search Engine 3.1'.  This particuliar product says it can run on one box and use the space on that box to index the site verses on the actual server.  I was wondering if anyone had any info about this product or if there is something better out there.  I have a lot of documents on the web site both word and .pdf that folks need to find via the search engine.  Apparently the zoom has a plug-in that can do this for me.  I also have space issues at this time so need to run it from another box.  I don't want to write any code myself as I am the only person that handles all of this and other projects as well and don't have time for coding.  Thanks for any information or refrences as to what other products can do this.
0
Comment
Question by:a182612
  • 4
  • 3
8 Comments
 
LVL 6

Expert Comment

by:Fahdmurtaza
ID: 12006098
Ok what I'll recommend is my favorite and I think the best free one i.e the web wiz site search. You can search for it on google and you will instantly get its link. You can download it from there and easily configure it for your site in 5 mins.

Regards
Fahd Murtaza
0
 
LVL 3

Expert Comment

by:passmark
ID: 12013760
If you really have no free disk space on your web site to upload the Zoom index files, then you can put the search script and the index files on another site.

For example, the CIA's world fact book web site was indexed with Zoom, but the search function put on a different host. See this page,
http://www.wrensoft.com/zoom/worldfactbook/search.php

You can see the files that are generated by Zoom here,
http://www.wrensoft.com/zoom/worldfactbook/

The search function is on this domain,
http://www.wrensoft.com/
but the results point to this domain
http://www.cia.gov/

However in your case you will probably need to select the ASP option instead of the PHP option in Zoom. (becuase you are using IIS and by default PHP is not installed on IIS).

If you have specifc questions about Zoom, list them here and I'll try to answer them point by point.

----
David
0
 

Author Comment

by:a182612
ID: 12018358
David, what if I add another drive and then create a virtual directory that the webserver recognizes as part of the root. Could I use that drive to run the zoom appication and then store the index that it requires?
0
 
LVL 3

Expert Comment

by:passmark
ID: 12023544

I don't see any problem with what you are suggesting. It doesn't really matter what directory the index files are in, as long as you can access them via a URL.

For the creation of the index there are two modes in Zoom, offline mode and spider mode. In general spider mode is a better choice becuase it indexes all your dymanic content (such as ASP content and database content exposed through web pages).

----
David
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:a182612
ID: 12043117
Can I set this up so it will only index file names for .pdf and .doc documents?  
Could I also set it up to search actual .pdf and .doc files for specific keywords?
0
 
LVL 3

Expert Comment

by:passmark
ID: 12043259
In the Configuration window of Zoom on the Scan Options tab you can enter in a list of file extensions to search. You can remove all the extensions except .doc and .pdf if you like. The potential problem you might have with this however is that you probably have a number of HTML pages that provide the navigation for your web site. (i.e. your home page is probably a HTML or ASP document, rather than a PDF).

So for the spider to get to your PDF files you’ll probably need to index and follow the links on a number of HTML pages. So removing the HTML file will maybe not give the result you want. There are a couple of ways around this problem, but it would help to know what your site is like. Can you post the URL?

I am not sure not if I understand your question about specific keywords. By default Zoom will index the entire content of PDF and Word documents and store all the words it finds in its index (not just specific keywords). Can you give an example of the behaviour you want?

------
David
0
 

Author Comment

by:a182612
ID: 12055904
Indexing all the documents will probably take up a lot of space.  Especially since I have so many.  Can the product use a metatag or can it just search on the document names in the URL such as 'safety/cssd/Hurricane.pdf' without actually having to store an entire .pdf file as an index?
0
 
LVL 3

Accepted Solution

by:
passmark earned 250 total points
ID: 12059906
The index is coded and compressed so it will be much smaller than your entire collection of PDFs. But nevertheless the entire text of the document is indexed. The text of the document is required to be stored in the index for the following reasons,

1) For exact phrase matching to occur. e.g. the user searches for "Hurricane safety procedures". The search results will be just the pages that have these three words in the text AND have the words in the same order that the user entered them.

2) So that the search results can display the context of the search. e.g. "Your local government recomends that you follow the <bold>hurricane safety procedures</bold> described below"

You can disable both of these features and save some space in the index

There is also an option in Zoom called, "Index meta information only". This option allows you to only index the meta information found on a page (ie: the title, keywords, description, and zoomwords). Note that the page content will still have to be scanned through (in order to find links to other pages) so the scanning process will not be significantly faster, but does allow the index data files to be smaller. This is useful for sites where the page contents are less meaningful or searchable than the meta information available (eg. technical papers, charts, etc.). However we have found that in a lot of cases the meta data is not accurate or keep up to date by webmasters. There is also a feature for for adding meta data to PDF files, using .desc files. See the Zoom users guide for more details.
http://www.wrensoft.com/ftp/zoom.pdf

-----
David
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Accessibility and Usability are two concepts that seem to be closely related.  But, too many people seem to have a distorted perception of them. During last five years, those two words have come to the day-to-day work of almost every web develope…
"In order to have an organized way for empathy mapping, we rely on a psychological model and trying to model it in a simple way, so we will split the board to three section for each persona and a scenario and try to see what those personas would Do,…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now