Solved

Web Site Search Engine software

Posted on 2004-09-08
8
293 Views
Last Modified: 2013-11-19
I am running IIS 5.0 on Windows 2000 Server.  I found a packaged search engine software called 'Zoom Search Engine 3.1'.  This particuliar product says it can run on one box and use the space on that box to index the site verses on the actual server.  I was wondering if anyone had any info about this product or if there is something better out there.  I have a lot of documents on the web site both word and .pdf that folks need to find via the search engine.  Apparently the zoom has a plug-in that can do this for me.  I also have space issues at this time so need to run it from another box.  I don't want to write any code myself as I am the only person that handles all of this and other projects as well and don't have time for coding.  Thanks for any information or refrences as to what other products can do this.
0
Comment
Question by:a182612
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
8 Comments
 
LVL 6

Expert Comment

by:Fahdmurtaza
ID: 12006098
Ok what I'll recommend is my favorite and I think the best free one i.e the web wiz site search. You can search for it on google and you will instantly get its link. You can download it from there and easily configure it for your site in 5 mins.

Regards
Fahd Murtaza
0
 
LVL 3

Expert Comment

by:passmark
ID: 12013760
If you really have no free disk space on your web site to upload the Zoom index files, then you can put the search script and the index files on another site.

For example, the CIA's world fact book web site was indexed with Zoom, but the search function put on a different host. See this page,
http://www.wrensoft.com/zoom/worldfactbook/search.php

You can see the files that are generated by Zoom here,
http://www.wrensoft.com/zoom/worldfactbook/

The search function is on this domain,
http://www.wrensoft.com/
but the results point to this domain
http://www.cia.gov/

However in your case you will probably need to select the ASP option instead of the PHP option in Zoom. (becuase you are using IIS and by default PHP is not installed on IIS).

If you have specifc questions about Zoom, list them here and I'll try to answer them point by point.

----
David
0
 

Author Comment

by:a182612
ID: 12018358
David, what if I add another drive and then create a virtual directory that the webserver recognizes as part of the root. Could I use that drive to run the zoom appication and then store the index that it requires?
0
Learn how to optimize MySQL for your business need

With the increasing importance of apps & networks in both business & personal interconnections, perfor. has become one of the key metrics of successful communication. This ebook is a hands-on business-case-driven guide to understanding MySQL query parameter tuning & database perf

 
LVL 3

Expert Comment

by:passmark
ID: 12023544

I don't see any problem with what you are suggesting. It doesn't really matter what directory the index files are in, as long as you can access them via a URL.

For the creation of the index there are two modes in Zoom, offline mode and spider mode. In general spider mode is a better choice becuase it indexes all your dymanic content (such as ASP content and database content exposed through web pages).

----
David
0
 

Author Comment

by:a182612
ID: 12043117
Can I set this up so it will only index file names for .pdf and .doc documents?  
Could I also set it up to search actual .pdf and .doc files for specific keywords?
0
 
LVL 3

Expert Comment

by:passmark
ID: 12043259
In the Configuration window of Zoom on the Scan Options tab you can enter in a list of file extensions to search. You can remove all the extensions except .doc and .pdf if you like. The potential problem you might have with this however is that you probably have a number of HTML pages that provide the navigation for your web site. (i.e. your home page is probably a HTML or ASP document, rather than a PDF).

So for the spider to get to your PDF files you’ll probably need to index and follow the links on a number of HTML pages. So removing the HTML file will maybe not give the result you want. There are a couple of ways around this problem, but it would help to know what your site is like. Can you post the URL?

I am not sure not if I understand your question about specific keywords. By default Zoom will index the entire content of PDF and Word documents and store all the words it finds in its index (not just specific keywords). Can you give an example of the behaviour you want?

------
David
0
 

Author Comment

by:a182612
ID: 12055904
Indexing all the documents will probably take up a lot of space.  Especially since I have so many.  Can the product use a metatag or can it just search on the document names in the URL such as 'safety/cssd/Hurricane.pdf' without actually having to store an entire .pdf file as an index?
0
 
LVL 3

Accepted Solution

by:
passmark earned 250 total points
ID: 12059906
The index is coded and compressed so it will be much smaller than your entire collection of PDFs. But nevertheless the entire text of the document is indexed. The text of the document is required to be stored in the index for the following reasons,

1) For exact phrase matching to occur. e.g. the user searches for "Hurricane safety procedures". The search results will be just the pages that have these three words in the text AND have the words in the same order that the user entered them.

2) So that the search results can display the context of the search. e.g. "Your local government recomends that you follow the <bold>hurricane safety procedures</bold> described below"

You can disable both of these features and save some space in the index

There is also an option in Zoom called, "Index meta information only". This option allows you to only index the meta information found on a page (ie: the title, keywords, description, and zoomwords). Note that the page content will still have to be scanned through (in order to find links to other pages) so the scanning process will not be significantly faster, but does allow the index data files to be smaller. This is useful for sites where the page contents are less meaningful or searchable than the meta information available (eg. technical papers, charts, etc.). However we have found that in a lot of cases the meta data is not accurate or keep up to date by webmasters. There is also a feature for for adding meta data to PDF files, using .desc files. See the Zoom users guide for more details.
http://www.wrensoft.com/ftp/zoom.pdf

-----
David
0

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

FAQ pages provide a simple way for you to supply and for customers to find answers to the most common questions about your company. Here are six reasons why your company website should have a FAQ page
SEO can be a real minefield to navigate, but there are three simple ways to up your SEO game just be re-assessing your content output.
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

627 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question