Link to home
Start Free TrialLog in
Avatar of pzozulka
pzozulka

asked on

SQL query performance in high traffic environment

What is the best approach in high traffic environments with millions of records:

1) Make your query use a temp table to SELECT a filtered down subset of data and insert it into your temp table. This way you can now work with your temp table on a much smaller data set.

2) Avoid using temp tables because INSERTs are so expensive.
ASKER CERTIFIED SOLUTION
Avatar of lcohan
lcohan
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of pzozulka
pzozulka

ASKER

Scott, my question was more in terms of future queries' architecture decisions, and whether it's better to work with smaller aggregates (#temp tables) or with the "real" tables directly to avoid expensive inserts.

Granted, having good indexes will improve performance no matter what. But I think that's besides the point and doesn't address my question (query architecture).
OK, as you wish.  But without that foundation, your q is misguided, because you can't solve performance problems (except with bigger and bigger hardware, which is a half-solution at best).  It's like trying to improve gas mileage while hauling a half-ton boulder.  Good luck with it, though.
" future queries' architecture decisions" should be made based on needs right?
If you do not need to modify/aggregate subsets of data in a SQL code object you can use CTE or subqueries just make sure to filter your data as much as possible aside of the good indexes - use query plans to make sure there are no huge intermediate recored sets - eliminate all those "thick" lines.

If you need the opposite do not be afraid to use them - for instance in some large and longer running jobs I use tens even hundreds of millions of rows in temp tables for that matter but very "short" records 1 maybe 2 columns - Id and something else at the most - just make sure you optimize your tempdb.
I'm confused. While indexes are likely the MOST import part of performance, isn't a crappy query the other part?

Also, I have no control over the indexes in our "real" tables. I have to work under the assumption that they are correct and optimized. The only indexes I have control over are the ones I can create on my temp tables (if I choose to go that route).
lcohan, are CTE the same as #temp tables in terms of performance? I always thought that CTEs are weaker in performance primarily because you can't assign a clustered index on them, and no primary keys involved. I also try to avoid using subqueries as most of the time the subqueries I require tend to be "correlated" subqueries -- which are very slow from my understanding.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The CTE are not "weaker in performance primarily because you can't assign a clustered index on them" and you don't need PKeys on them either in my opinion.
Sub-queries are sometimes actually faster but it all depends how well your data is organized/indexed and code is written.

You can't just "poor butter over entire bread" and say ALL #tempdb tables or NONE #tempdb tables in my databases architecture. Aside that..compilation/recompilation may be a factor using #temptables, ETC....lost of things to consider but...as Scott said the question is somewhat misguided because #temptables and "high traffic environments with millions of records" do not INCLUDE or EXCLUDE each other - good luck!
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Can you be a bit more clear on what the uses are/will be?

Are you needing a roll up of data for reporting/ad hoc querying?  Is it too slow running against the live data?