Question

Fulltext large (200.000) count query performace too slow

Asked by: albertung

Hi,

We are having too slow response times for a simple count on a fulltext in SQL Server 2005. Even using the most simple query on a good server (64 bit Dual Opteron 4GB Ram with high speed 16 raid disk storage)):

select count(*) from content_books where contains(searchData,'"english"')

Takes 4 seconds to count the avg 300.000 results

I have removed all the joins with real table data so that the query is only inside the fulltext engine..

I would expect this to be down to 10 milli seconds. Isn't it just getting the size of the "english" word result index?

It  seems the engine is going through all the results because if a do a more complex search that returns less results the performance is better.

Any clues of how to do this faster? We never need to read the thousands of records BUT we ALWAYS need to count them. The 4 seconds of CPU/IO usage can't go live as it won't scale up.

Has anyone gone through this?

Thank you very much,
Filipe

This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.

Subscribe now for full access to Experts Exchange and get

Instant Access to this Solution

  • Plus...
  • 30 Day FREE access, no risk, no obligation
  • Collaborate with the world's top tech experts
  • Unlimited access to our exclusive solution database
  • Never be left without tech help again

Subscribe Now

Asked On
2008-02-15 at 07:46:23ID23166305
Tags

SQL Server 2005 Fulltext

Topics

MS SQL Server

,

SQL Server 2008

,

SQL Server 2005

Participating Experts
8
Points
0
Comments
25

Trusted by hundreds of thousands everyday for fast, accurate and reliable tech support.

  • "The time we save is the biggest benefit of Experts Exchange to Warner Bros. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange." Mike Kapnisakis, Warner Bros.
  • "Our team likes having a resource that is more secure than just using Google and most experts using this service really know their stuff. It's nice to look here first versus using Google." Dayna Sellner, Lockheed Martin
  • "Anytime that I've been stumped with a problem, 9 out of 10 times Experts Exchange has either the accepted solution or an open discussion of the potential solution to the problem." Kenny Red, eBay Inc.

See what Experts Exchange can do for you.

Got a question?

We've got the answer.

Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.

Screenshot of Experts Exchange Knowledgebase

Need individual assistance?

Our experts are ready to help.

If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.

Screenshot of Experts Exchange Knowledgebase

Want to learn from the best?

Read articles from industry experts.

Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.

Screenshot of an Article

Working on a long term project?

Store your work and research.

Save solutions to your questions, answers you’ve discovered through searching plus helpful articles in your personal knowledgebase for easy future access.

Screenshot of Experts Exchange Knowledgebase

Access the answers to your technology questions today.

Subscribe Now

30-day free trial. Register in 60 seconds.

What Makes Experts Exchange Unique?

Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Trusted by the world's most respected brands.

image of each brand's logo

Faithfully serving IT professionals since 1996.

Experts Exchange Logo

Try it out and discover for yourself.

Subscribe Now

30-day free trial. Register in 60 seconds.

Related Solutions

  1. how to tell if a fulltext is indexed?
    let's suppose we have the two following indexes: ALTER TABLE xyz ADD FULLTEXT index1 ( field1(15) ), ADD FULLTEXT INDEX index2 ( field2(15) this works, but using SHOW INDEX FROM xyz, how can I tell if the fulltext is FULLTEXT or FULLTEXT INDEX ? I need a way that works on...
  2. MySQL fulltext search benchmark
    Anyone has a benchmark on MySQL's fulltext indexing and search? My plan is to index 4M of "contact info" records, each with different fields (name, address, phone #...etc), where name and address field has to be fulltext indexed. However, I need to have some idea a...
  3. Case-insensitive matches on fulltext search problem
    Trying to perform a search using fulltext on a mysql 4.x database and php. The collation of the database is latin1_sweedish_ci the query is: $query = $_POST["query"]; $SQL = "SELECT * FROM claims where MATCH(IA,claim_no,id,assigned_by,claims_rep,insurance_comp...

Free Tech Articles

  1. WARNING: 5 Reasons why you should NEVER fix a computer for free.
    It is in our nature to love the puzzle. We are obsessed. The lot of us. We love puzzles. We love the challenge. We thrive on finding the answer. We hate disarray. It bothers us deep in our soul. W...
  2. SCCM OSD Basic troubleshooting
    SCCM 2007 OSD is a fantastic way to deploy operating systems, however, like most things SCCM issues can sometimes be difficult to resolve due to the sheer volume of logs to sift through and the dispe...
  3. Migrate Small Business Server 2003 to Exchange 2010 and Windows 2008 R2
    This guide is intended to provide step by step instructions on how to migrate from Small Business Server 2003 to Windows 2008 R2 with Exchange 2010. For this migration to work you will need the fo...
  4. Create a Win7 Gadget
    This article shows you how to create a simple "Gadget" -- a sort of mini-application supported by Windows 7 and Vista. Gadgets can be dropped anywhere on the desktop to provide instant information, ...
  5. Outlook continually prompting for username and password
    There have been a lot of questions recently regarding Outlook prompting for a username and password whilst using Exchange 2007. There are a few reasons why this would happen and I will try to cover t...
  6. Backup Exchange 2010 Information Store using Windows Backup
    There seems to be quite a lot of confusion around the ability to backup Exchange 2010 using the built in Windows Backup feature. This stems from the omission of this feature prior to Exchange 2007 s...

Cloud Class Webinars

  1. Avoiding Bugs in Microsoft Access
    Alison Balter takes and in-depth look at avoiding bugs in Access. In this webinar you will learn about using the immediate window to debug your applications, invoking the debugger, using breakpoints to troubleshoot, stepping through code, setting the next statement to execute, ...
  2. Top 10 Best New Features in Visio 2010
    Scott Helmers gives live demonstrations of the top 10 new features in Visio 2010. This webinar will teach you how to create compelling diagrams by adding shapes to the page with a single click, linking the shapes in a diagram to data in Excel (or SQL Server, or SharePoint), ...
  3. IT Consultant Business Secrets Revealed
    Michael Munger, Experts Exchange tech pro and IT consultant, pulls back the curtain on his very successful businesses and answers question on every IT consultant and business owner should know about. He shares secrets on what he did to solve the 5 most common problems in IT, ...
  4. Disaster Recovery and Business Continuity
    Quest CTO, Mike Billon, gives an overview of the steps involved in building a dunamic disaster recovery plan. Through case studies and an examination of software/hardware tooles for monitoring and testing, you'll gain a better understandin of where you are, where you want ...
  5. Organize Your Visio Diagrams with Containers and Lists
    Scott Helmers uses cross functional flowcharts, wireframe diagrams, data graphic legends and seating charts to teach you: how to ustilize all three new structured diagram components in Visio 2010, the best practices for organizeing shapes in previous version of Visio, how to organize ...
  6. How to Us Objects, Properties, Events and Methods in Microsoft Access
    Alison Dalter gives an in-depbth look at objects, properties, events and methods in Microsoft Access. In this webinar you will learn about using the object browser, referring to objects, working with properties and methods, working with object variables, understanding the ...

Join the Community

Give a Little. Get a Lot.

Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.

Join the Community

Answers

 

by: gladxmlPosted on 2008-02-15 at 08:08:30ID: 20903002

try to count only one field and do not use *.

Try to use the SQL below where field_id is an index in your table

select count(field_ id) from content_books
where contains(searchData,'"english"')

 

by: albertungPosted on 2008-02-15 at 08:15:07ID: 20903070

I am using count(1) not * sorry for the typo...

 

by: gladxmlPosted on 2008-02-15 at 09:27:32ID: 20903885

<<<
We never need to read the thousands of records BUT we ALWAYS need to count them.
<<<

However using the count property is also reading trough the record the difference is you just only display the count. Therefore if you are counting a large amount of data the speed will also get affected.

BUt you can try to use  the GROUP BY might speed up the query...

select count(field_ id) from content_books
GROUP BY fieldname
where contains(searchData,'"english"')

or

select count(field_ id), commonfieldname from content_books
GROUP BY commonfieldname
where contains(searchData,'"english"')

Please note that in each commonfieldname, there will be a seperate count. Assuming that there are few common field this might work.

HTH.

 

by: Cedric_DPosted on 2008-02-15 at 09:29:02ID: 20903908

One of most common problems on VLDB when users try to exact count huge values, for example your "english" word occured 284,327 times among 100mln entries.

This is almost always nonsense, very resource-consuming operation.
You can speed-up all searches to one second, when only returns "More than 10000" occurences, provided by SELECT TOP (N) clause.

 

by: Cedric_DPosted on 2008-02-15 at 09:33:14ID: 20903948

try:

select count(*) from (select top 10000 1 from content_books where contains(searchData,'"english"')) as t

 

by: albertungPosted on 2008-02-15 at 09:43:14ID: 20904032

Hi,

Thank you for the replies.

Cecrid D: I cannot do a work-around and the "more than N" can only be shown if i can know how many there are in a fast way, and that is THE problem...

Cecrid D & gladxml: the suggestions you make will INCREASE cost of processing this query (when testing it might not seem so because of internal SQL Server cache).

To improve it i need to get the info directly from the fulltext catalog. Extracting info from it will always have more cost and won't scale up.

See you!

 

by: acperkinsPosted on 2008-02-15 at 10:17:40ID: 20904366

Not to start a contraversy that is totally beside the point, but you will not find any difference between doing COUNT(*), COUNT(1), COUNT(YourColumnName).  The SQL Server optimizer is intelligent enough to know what you want.  That is not to say that it is always better to do IF EXISTS (SELECT 1 FROM ...

But now for the real problem.  Have you done a Full Population recently?

 

by: acperkinsPosted on 2008-02-15 at 10:21:20ID: 20904404

If the answer is yes, than there is not much more you can do.  It is sequentially counting the returned results.  Incidentally, there is a limit as to the total number of rows returned when using Full-Text Search.  With SQL Server 2000 it was only 20,000, but this max has been increased with SQL Server 2005

 

by: Cedric_DPosted on 2008-02-15 at 10:41:49ID: 20904606

>> Cecrid D: I cannot do a work-around and the "more than N" can only be shown if i can know how many >> there are in a fast way, and that is THE problem...

when you learn SQL storage fundamentals well you'll be surprising that counting of "more than X" can be many times faster than determining exact value of COUNT(*)   :)

because: to determine exact COUNT server should read much more data, but for TOP N it stops counting after N matches.

 

by: weellioPosted on 2008-02-17 at 20:05:25ID: 20917489

have you tried creating a Clustered Index?
http://www.devx.com/dbzone/Article/29530/1954

or possibly shringking the field sizes to make the database smaller, thus quicker easier to sort because of using less cache?

 

by: albertungPosted on 2008-02-18 at 04:54:14ID: 20919399

Hi,

I don't know your experience with fulltext, but i it seems to me that you are missing the point. Please read this http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/ftslesld.mspx to undestand how far i have gone and what i am trying to do to have the top performance of a "simple" fulltext count.

Resuming, there is a boundary between the fulltext catalog and the sql server so to have the top performance that i need i am looking for some additional parameters to pass to the fulltext, not inside sql server 2005. These

The optimization i did so far (althoug it has a huge cost on disk space) that is giving better results is to create a seperate table with the same data (?) only for the search. I don't know why, but IO and CPU usage is 1/10 less than with the original table. We are still trying to figure out what is hapenning (yes we rebuilt all indexes from scratch).

I am still looking for the solution and it should be in the area of "telling" the fulltext engine that we only need the count (inverted index UNSORTED queries with count is of milliseconds for millions of results).

Thank you again.

 

by: acperkinsPosted on 2008-02-18 at 05:50:08ID: 20919847

>>I don't know your experience with fulltext, but i it seems to me that you are missing the point.<<
He/she are just trying to help, cut them some slack.  So they are not familiar with Full-Text Search, let alone experience with it? Jjust say so and move on.

 

by: acperkinsPosted on 2008-02-18 at 06:03:17ID: 20919938

Now back to the question.  Have you implemented any of the recommendations from the link you posted?  If so, what were they and were there any improvements?

My guess is that you are now down to bigger and better hardware, I cannot possible see how you can significantly improve the performance to retrieve a count over so many rows.  This is no different than getting a count on a large table withour any indexes.

 

by: albertungPosted on 2008-02-18 at 08:46:30ID: 20921380

@acperkins:
Sometimes writen words sound "agressive" but they aren't (or where not when i wrote them :-) ).

Back to the subject:
We are testing the query recommendations found here: http://technet.microsoft.com/en-us/library/ms142560.aspx
The ones in the first document i posted that can be applied to our case had already been done/tested.

"This is no different than getting a count on a large table withour any indexes."
I hope this is not true (for the sake of IO :-) ). For UNSORTED queries (i am only asking for a count) the fulltext engines usually do reverse index matching that is VERY high performance (pure pointer navigation on "compressed" arrays).

Thks.

 

by: acperkinsPosted on 2008-02-18 at 10:09:23ID: 20922087

>>The ones in the first document i posted that can be applied to our case had already been done/tested.<<
The two that I found interesting were using CONTAINSTABLE instead of CONTAINS and partitioning the table into multiple tables and then doing a UNION ALL to get the total.

>>I hope this is not true (for the sake of IO :-) ) <<
Try it some time. :)
<quote>
From the original link you posted, this is the part that you may want to highlight:
Full-text query performance is roughly proportional to the number of matching keys returned from the catalog. On a large catalog, querying for words that are only rarely present in the indexed column returns fewer keys than querying for words that are contained in many columns, and accordingly the queries that return fewer matching keys generally perform better.
</quote>

 

by: xyzzerPosted on 2008-02-29 at 14:30:58ID: 21018234

Try using manual change tracking. Query performance might be lower during indexing.

 

by: xyzzerPosted on 2008-02-29 at 14:36:06ID: 21018271

Another solution might be a fragmented index - you might need to do a full merge
ALTER FULLTEXT CATALOG catname REORGANIZE

 

by: Cedric_DPosted on 2008-03-07 at 06:38:49ID: 21070396

What a dumb to try and try again :)

Please understand that determining exact COUNT(*) = 247,348,458,434 will ALWAYS be slow because SQL Server should read ~100k pages even on the index!

Even Google returns approximation "of about 1,620,000,000", not exact count!

 

by: xyzzerPosted on 2008-03-14 at 08:36:16ID: 21126482

Well, that's not exactly right. Number of occurrences is kept on top of the index and if the index is not fragmented - the result should be almost instantaneous.

 

by: Cedric_DPosted on 2008-03-14 at 10:18:21ID: 21127373

Well, but not exactly :) even for perfect indexing case, to determine N=535,456,735,986 count server should read LOG2(N) ~ 26 pages, but to determine "More than 500,000,000,000" it should read only one page...

Anyway, 26 times faster, and for non-perfect index matching (for ex LIKE '%WORD%') it can be thousands times.

 

by: shyamhangalPosted on 2008-05-26 at 08:41:06ID: 21646893

Wondering if somebody got the final solution to this problem ? I am currently facing the same issue where the Count(*) is taking a very long time...

thanks
shyam

 

by: albertungPosted on 2008-07-25 at 15:15:07ID: 22092689

hi,
I ended up using the index on the other table.
Thx to all. Next time I face this will get into more detail..,, no time now...

 

by: Computer101Posted on 2008-10-10 at 16:39:27ID: 22691614

PAQed with points refunded (250)

Computer101
EE Admin

 

by: AlexanderSchweighoferPosted on 2009-04-16 at 03:58:50ID: 24156264

Hi,
Does anyone know, if there were any changes in SQL Server 2008? I mean there was huge performance boost in the FT engine, at least MS says so. But is there any function provided by the full text engine to ONLY determine the count of reults? Or at least approximately.

20120131-EE-VQP-002

3 Ways to Join

30-Day Free Trial

The Experts

98% positive feedback on 31,087 answers since March 2000. angeliii is a Microsoft Most Valuable Professional for his work with MS SQL Server & Develoment.

He has also proven his knowledge of Visual Basic Programming, PHP Scripting and Oracle Databases.

The Experts

97% positive feedback on 10,752 answers since July 2000. lrmoore has more than 18 years experience in the networking industry.

The six-time Mircosoft MVPs specialties include firewalls, virtual private networking, and network management.

Testimonials

"...and excellent source for support... Kind of like having your very own IT dept." Electriciansnet

Testimonials

"I was apprehensive at signing up at first. However... it has already made my life as an IT administrator much easier." JaCrews

Testimonials

"WOW! You guys have great, active, and knowledgeable people on here." moore50

Business Clients

Business Clients

In the Press

"If you’ve got a question... Experts Exchange can supply an answer.”

In the Press

"...an invaluable aid for both IT professionals and those who require tech support."

In the Press

"where IT professionals provide quick answers on just about any topic"

Business Account Plans

Loading Advertisement...