Link to home
Start Free TrialLog in
Avatar of Qsorb
QsorbFlag for United States of America

asked on

Full Text Indexing witih SQL 2000 and ColdFusion 8 Search Server

We run a large public news website and would like our readers to search through hundreds of thousands of stories saved in NTEXT and TEXT data types.  Stories are generally no larger than 15,00 to 20,000 characters. We considered varchar but because of the size limitaion in sql2000, we have not. We're using SQL 2000 and Coldfusion 8 standard. I used sql2005 as the category because I could not find any here for sql 2000.

We've been looking at Full Text Indexing then also saw a reference to ColdFusion 8 Search server and wondered if that "server" can do the job, or if Full Text Indexing works with it, separately, etc.  Or what.

I'd sure like a narrative here and easy to understand overview of the best and easilest way to set up fulltext seaches on our news site. I know it would be good to upgrade to SQL 2005 but I don't think that's going to happen so we'll need to stay with SQL 2000 for now.

I really need not only to undstand how to best set this up in SQL but also use CF8 to work with it.

Comments. Suggestions. What to avoid? Which does what? If you just paste links please first be sure they're written for novices, easy to understand, and for sql2000. My best hope is to find a very easy but complete overview

As it is now, I  know next to nothing about either and am not sure in what direction to do.
Avatar of Brijesh Chauhan
Brijesh Chauhan
Flag of India image

With ColdFusion 8, you would have VERITY search engine, here is how you can use it

1. Create a COLLECTION of the content you want to search, you can create a collection in 2 ways a) through COLDFUSION ADMINISTRATOR b) by <cfcollection> tag.

2. After creating a collection you index it, this is done by <cfindex> tag. This index can be added as a coldfusion schedule tag, so that it indexes the collection every day or week, just as any other spider indexes your website.

3. Use <cfsearch> tag to search the indexed collection.





Avatar of Qsorb

ASKER

Thanks for the info and the links. I'll check them out. II've seen the livedocs already.

I was really interested in the difference between CF8 Search Server and Full-Text Search.  Are they totally separate or can them be used somehow together?

Understand that we'll soon have over a million news files, many rather large. We want the best search ability for the news files in the sql 2000 database but don't want to drag the system down. As it is the dual core CPU is running at an average of 55% most of the time, and short of a different system and upgraded software, we have to make the right decision.
CF8 has VERITY K2 Search Engine which provides Full-Text Search,  Verity Spider lets you index more than two hundred of the most popular application document formats, including Microsoft Office, WordPerfect, ASCII text, HTML, SGML, XML and PDF (Adobe Acrobat) documents.

Verity comes with a BOT to index the docs, which is just like any other bot /spider, you can go to Verity folder under coldfusion8 directory and use 'vspider - ' interface for most of the stuff.

Also while indexing you can specify what to do, whether to refrest, update, delete etc.

We have sucessfully implemented search based on Verity for our semiconductor company which has millions of docs and data.

Also, if you are looking for KEYWORD BASED search, then GOOGLE is the best, you can use their services, but for Database search, verity would perfom better in terms of relavence of results and indexing.
Avatar of Qsorb

ASKER

Thanks for all the info but I'm still not getting what I need.

I'm trying to compare cf8 verity (Search Server) to SQL Full Text Searching.

We're only searching pure text stored in sql 2000 data type netxt or text. And some of the text may not be indexed on Google, and anyway, we want to control the search parameters.

So, would someone please compare the two, speed, options, factors, etc?
The main advantage of using Verity is that it can ndex all your documents like pdfs which you won't be able to do with SQL.

With SQL the main advantages are that you can do more than levels of filtering on the results, which will NOT be possible with verity, as it only gives 4 custom columns.

SQL will also not restrict you with the description, verity will return trimed discription.

For Controlling search parameters and text searches, SQL would perform better, also if you are using SQL 2008, they have a superior way of ranking then verity.
Avatar of Anthony Perkins
>>So, would someone please compare the two, speed, options, factors, etc? <<
I cannot compare and I very much doubt that you will find anyone who has used both to give you any meaningful feedback.  But what I can do is answer any specific questions regarding using Full-Text Search.  I also suspect you will find members here who can do the same for your other option.  You will then have to make a choice or at the very least educate yourself on the subject by doing a small proof of concept.  I suspect this is one of those cases where your own experience is far more meaningful than any comparison you get on these two products, as they do not appear to be even remotely alike, either in functionality or deployment.
ASKER CERTIFIED SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
As you can see my previous point has proven to be true:
"I very much doubt that you will find anyone who has used both to give you any meaningful feedback."
>>The main advantage of using Verity is that it can ndex all your documents like pdfs which you won't be able to do with SQL (AS FAR AS I KNOW). << 

>>also if you are using SQL, they have a superior way of ranking then verity. <<
HERE IS THE EXPLANATION FOR IT --
Verity automatically provides stemming (horse -> horses and vice versa) which
is nice BUT it ranks matches the same (horse is worth the same as horses). It
supports thesaurus (fast -> quick and vice versa), but the same ranking
problem exists and MS SQL supports theasurus as well (without the ranking
problem).

@acperkins: SQL 2000 DOES NOT support PDF's, but there are ways arround it, like storing it as a text or image, but that would NOT be as fast as verity search .. here is an MSDN article for your reference
http://msdn.microsoft.com/en-us/library/aa175787(v=sql.80).aspx
I beg to differ with you.  In order to use Full-Text Search with SQL Server 2000 to search documents in their native format they had to be saved using the deprecated image data type.  There was no other choice to save binary data at the time.

Also, Hilary's article you posted appears to contradict you.  He specifically states:
As you know, SQL Server 2000 gives you the option of storing what are often large and format-heavy documents such as PDFs in their native format in an image data type column. By specifying the document extension, you ensure that the appropriate iFilter will be applied to the documents.

But as I said before, this point is totally moot and while the memory trip was fun, we really need to get back to addressing the author's question...
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Verity is used to search indexed documents, so if you need to search database fields, you need to use SQL full text search
Avatar of Qsorb

ASKER

Thanks a bunch for the dialogue. Looks like we need to stick with SQL.
>>Looks like we need to stick with SQL. <<
If you have any specific question regarding SQL Server 2000's Full-Text Search, feel free to post a related question to this thread, that way I see it.