?
Solved

Caching queries in Application scope

Posted on 2003-03-04
24
Medium Priority
?
1,341 Views
Last Modified: 2013-12-24
I am developing an ecommerce website.  Obviously, many of the pages need to access product information.  I don't want to hit the database (SQL Server 2k) everytime these pages are accessed.  Therefore I've made the tentative decision to cache all the product information from the database into one or more Application scope variables.  I will only refresh these App-scope variables when the product information is changed.

My question: is this a bad practice?  How much information is too much to store in the Application scope.  As far as I understand it, these apps are stored in RAM.  My server has 512 MB RAM which will soon be upgraded to 1024 MB RAM.  Since my entire SQL Server databse claims to be less than 20 MB, I could store the entire database inside RAM in the App-scope and not have to worry about it, right?  Or is there overhead in doing this that I am missing?

If you recommend against doing what I plan to do, what would you do instead to avoid hitting the database?

Thanks,

Justin Kohlhepp
justin@coffeegeek.net
0
Comment
Question by:curmudgeon42
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 8
  • 3
  • +2
24 Comments
 
LVL 1

Accepted Solution

by:
DA_JED earned 300 total points
ID: 8066666
What you can do is declare you application in your application.cfm file
     <cfapplication name="Application_Name" applicationtimeout="#CreateTimeSpan(7,0,0,0)#" clientmanagement="Yes" setclientcookies="Yes">

Create A cache array
     <cfset Application.CacheWithin = ArrayNew(1)>
     <cfset Application.CacheWithin[1] = "1">
     <cfset Application.CacheWithin[2] = "0">
     <cfset Application.CacheWithin[3] = "0">
     <cfset Application.CacheWithin[4] = "0">

User the cache array in your cachewithin attribute in your queries
     <cfquery  username="#Application.User_Name#" password="#Application.User_Password#"  name="Query_Name" datasource="DataSource" cachedwithin="#CreateTimeSpan(Application.CacheWithin[1], Application.CacheWithin[2], Application.CacheWithin[3], Application.CacheWithin[4])#">
     
What does this do?

If Person A has visited your site already within the cachewithin time span, your database is not hit.

Let me know if this helps, there is more I want to add, but this should suffice.  Let me know if you need more


DA JED

 
0
 

Author Comment

by:curmudgeon42
ID: 8066867
I don't need instructions on HOW to cache information within the application scope.  I want to know if doing this with a large amount of data taxes the server too much.  How much Application- and Server-scope data is too much for a relatively fast and high-RAM server?

justin
0
 
LVL 1

Expert Comment

by:DA_JED
ID: 8066968
Hi Justin

That's what I'm saying, you are going along the right path,  you don't need to put the entire DB in your cache however.  The opinion where you have 512 MB and only 20 of DB is not a good road to go to go down.

Also, consider using Drill Down's  in your queries.  That will help to cut down on server loads.


0
Supports up to 4K resolution!

The VS192 2-Port 4K DisplayPort Splitter is perfect for anyone who needs to send one source of DisplayPort high definition video to two or four DisplayPort displays. The VS192 can split and also expand DisplayPort audio/video signal on two or four DisplayPort monitors.

 

Author Comment

by:curmudgeon42
ID: 8067023
Why is the 20 / 512 thing not a good road to go down?

What is a Drill Down?

It seems like we're having communication issues. :-)

Justin
0
 
LVL 1

Expert Comment

by:DA_JED
ID: 8067072
hahahaah

don't worry about it.  Ok here goes

Drill Down: You know when you visit a search engine website, do your search and you see the results.  It tells you 1 of X records found viewing Y records at a time? And you see the links for next previous and so on?  That's drill down.

Instead of returning all of the records that satisfy the query, return only a subset.  That will cut down DRAMATICALLY the server loads.

20/512:  The space you specify for cache is used as the need increases.  If the need exceeds a limit, it takes out the oldest element in the list.  Depending on your site (hits) and what else you are doing on that server, you may or may not be in a situation where you will be getting a slow responses from your pages, because all of your resources are being used.  What I suggest is monitor the average and max hits you tend to get as well as server loads.  Adjust your cache according to that.


Hope that helps, let me know if you need more.

DA JED
0
 

Author Comment

by:curmudgeon42
ID: 8067149
I'm not talking about using CACHEDWITHIN on a query.  I'm talking about manually instantiating CFCs inside of the Application scope.  If the Application scope gets to big, then it will beginning swapping out of RAM to the HD.  This would be bad because it would slow down the pages to a crawl.

However, it seems to me that this wouldn't happen because the entire DB right now is only 20 MB and the ram is at least going to be 512 MB if not 1024 MB.  So, even if the DB tripled in size, it would still not even be taking up 20% of the available RAM on the server.

My question is about the internal mechanics of the Application scope?  Will the scope actually be using more RAM than what I think it is?  How much more?  How does CFMX deal with this scope if it runs out of RAM?

Justin
0
 
LVL 8

Expert Comment

by:TallerMike
ID: 8067885
OK, first let's correct something: Application variables are NOT stored in the RAM of the server, they are stored in the registry. Session variables are stored in RAM. So having an abundance of Application variables will simply increase the size of your registry, NOT use excessive memory. (On a side note, don't use Session variables unless you HAVE TO!!!)

OK, so back to the topic of Application Variables. There's nothing wrong with allowing these variables to stay in the Application as far as performance goes. It is correct that CF will not contact SQL to get the new data, so it does mean that if you have updated the data in the DB, you will not see the new values until you reload the Application varaibles. This can be accomplished by a number of methods, if you'd like to discuss let me know.

What IS bad about putting these queries into Application variables is that you need to lock Application variables everytime you read/write them. This means that during any process that you're looking at these queries, you'll need to lock them, at least as Read-Only.

The other suggestions that people are giving (using the cachedwithin attribute on CFQUERY tags) might be the best way to go, as this will store the information as if it was an application variable, but you won't have to lock it, and it will be refreshed in whatever timespan you put into cachedwithin.
0
 
LVL 8

Expert Comment

by:TallerMike
ID: 8067902
One thing to consider about caching queries, or putting them in Application variables is that although they're available for quick reference, if you need to filter that query somehow, it's of no use. Let's say you wanted to display all the products in alphabetical order, and then later by product_id, are you planning on caching both of these queries?
0
 
LVL 8

Expert Comment

by:TallerMike
ID: 8067911
curmudgeon42, you a Nirvana fan? Where in CT do you work?
0
 
LVL 17

Expert Comment

by:anandkp
ID: 8069713
hi,

U do have a option in CFadmin to set the query caching ... here u can set the no of queries u wld want to cache !

this limits the maximum number of cached queries that the server will maintain. Cached queries allow for retrieval of result sets from memory rather than through a database transaction. Since the queries reside in memory, and query result set sizes differ, there must be some user imposed limit to the number of queries that are cached. When this value is exceeded, the oldest query is dropped from the cache and is replaced with the specified query.

PS : i wldnt recomment using application/session variable's ... going this route [query caching] is much better

K'Rgds
Anand
0
 

Author Comment

by:curmudgeon42
ID: 8069762
hmmm...interesting...so App vars are stored in the Registry, huh?  Same goes for server vars?  however, isn't the Windows registry resident in memory - or, at least, the most recently-accessed keys

also, in CFMX i thought we didn't have to lock as many variable scopes anymore - do we still have to lock app and server-level scopes?

i don't want to use query caching b/c then i don't have control over when the queries refresh

TallerMike: To sort first by name and then later by product ID, I would use CF's "query of a query" functionality - <CFQUERY DBTYPE="query" etc..> to reorder, i.e.

<CFQUERY NAME="reorderedQuery" DBTYPE="query">
  SELECT * FROM origQuery
  ORDER BY name
</CFQUERY>


also TallerMike: I work in Milford, Ct.  I like Nirvana but I don't listen to them as much as I did a decade ago. :-)

thanks for all the responses guys

justin
0
 
LVL 8

Expert Comment

by:TallerMike
ID: 8072247
I've done sme research, and it looks like I am incorrect about the registry bit, maybe this was in an older version. Both Session and Application variables are stored in the RAM of the server. Client variables are stored in the registry. I apologize for saying otherwise, I was under the wrong impression.

I was afraid that you were going to say "query of a query". I can't find much documentation on this, but as far as I would guess, this can't be a very fast option. CF Server was not meant to run queries, and I just can't imagine that it can do this very quickly. I would think the time that you saved by caching the query would be lost here somehow. If anyone has a better idea of how this "query a query" method works, please shed some light...

You can control when the queries will be refreshed in query caching by basically setting the "cachedwithin" date to a far off date (like a year). Or maybe even better, to use the "cachedafter" date, and set it to a variable that you can change either through the DB or in the code somehow. Have this default always to today (for when the server reboots) and then have a way of changing this variable (it would be an Application variable as I see it).

Here is a link with some info about query caching:

http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=16&lngWId=9

***********************************************************************************************
***********************************************************************************************

I saw curmudgeon and thought Nirvana...anyhow. Very interesting that you live in Milford. I actually just moved from Milford about 6 months ago, I lived in an apartment complex on Shell Ave (Out near the hospital, right on the water). Now I live in Michigan, though I still work for a company in Danbury. It's a small world... Where in Milford do you work? I may have interviewed there...
0
 

Author Comment

by:curmudgeon42
ID: 8072299
that's good about the RAM being used and not the Registry - i feel much better now

about the "query of a query" - it seems to me the time it takes CFMX to process the query itself would be less than the time it would take to communicate to SQL Server via the datasource and then ODBC or whatever - but I could be wrong - i wonder if there is some data out there about this

i don't know that the word curmudgeon was used by Nirvana, but i'm not expert :-) i've been using this handle for so many years i don't remember why it started

i work for New Standard Institute - very small company

justin
0
 
LVL 8

Expert Comment

by:TallerMike
ID: 8072467
I can't find all the info, but I got an email from Tessa on June 6, 2001 concerning an interview, and I'm pretty sure I followed through. Maybe I'm confusing this with another interview, but I think I had met with 2 young guys that were in need of some help doing something with PDF forms maybe?

Curmudgeon was the title of one of Nirvanas songs.
0
 

Author Comment

by:curmudgeon42
ID: 8072629
small world - one of those young guys was probably me - when was this?
0
 

Author Comment

by:curmudgeon42
ID: 8072635
which Nirvana album?
0
 
LVL 8

Expert Comment

by:TallerMike
ID: 8072771
I had a feeling one of them might have been you. I remember being in a room with 2 young guys, computers side by side. I think there was a board on the wall. And you were telling me that your boss was off peddling the product all over the country, so you don't see him often.

I'm guessing since I got an email from Tessa early June 2001 that I went on the interview shortly after. I can't think of how I'd track it down though. I think I was called after the interview so I don't have any records, I imagine my resume could be floating around your HR department somewhere though...

As for the Nirvana tune, it was on the Lithium single. It's a rare song, not really radio friendly. I had a friend in High School that obsessed with Nirvana so I've heard just about every song they put out...
0
 
LVL 5

Expert Comment

by:heathprovost
ID: 8077017
This is my recommendation for dealing with this type of scenerio. I have done it sucessfully on a number of sites with excellent results.

1. Create a number of "base" queries that will be cached. The idea is to create the minimum number of cached resultsets possible. So you want to create a few queries that are basically static, i.e. no variable WHERE clauses or anything like that. Just a big resultset that represents all the data you might need later. You should think of these as the "Tables" that your real queries will be using, because that is basically what the idea boils down to.

2. All site code should use query on query to do their work, using the base queries as there targets. These queries should not be cached, or should have very small caching intervals. Unlike your "big" queries, these will be difficult or impossible to flush effectively. This does of course impose some limitations because of the limited feature set of QonQ, but if you are creative with how you build your base tables alot of these limitations can be addressed there.

3. To refresh your data on demand, all you have to do is run the big queries again with the caching turned off. Since they are basically static queries, you dont have to concern yourself with variations of the query getting cached. The other queries will then pull the changes. I usually create a template to flush all my queries in a single step.

The advantage of this approach is that is gives a good tradeoff between memory use and cpu utilization. Doing aggresive query caching site wide (i.e. on a large number of queries) tends to use alot more memory than this method (also has the problem you mentioned about being very hard to control flushing on demand). You do, however, incur some CPU overhead from the query on query stuff, but if you are careful with how you build stuff you can usually keep things acceptable.

Heath

ps. IMO using application scope for query caching, while a very powerful approach, is painful to implement in practice, and alot harder to keep updated as the site changes. Also, you will incur quite a bit of locking overhead during reads that isnt necessary with the method I described.

If you _really_ need to be able to do per row updates to your caches, the application scope is pretty much your only option. But I find that for most uses, the method I described performs nearly as well while being much easier to program and manage.

0
 
LVL 8

Expert Comment

by:TallerMike
ID: 8077162
0
 

Author Comment

by:curmudgeon42
ID: 8079866
well, the thing is healthprovost, that i'm not just keeping QUERIES in the app scope, i'm keeping CFCs in the app scope.  these CFCs all manage their query data in an OOP approach.  so query caching isn't really the only thing i'm aiming for.  i'm aiming for instantiating the app-scope CFCs only once in the application.cfm (if they aren't already instatiated), and then any pages that need them can refer to them

however, i am essentially doing your "base query" approach, where you load all the data from the DBase into a table, and then "query on query" from there

i guess i'll just have to see if this approach works out or bogs down the server too much

justin
0
 
LVL 8

Expert Comment

by:TallerMike
ID: 8080075
I would be VERY interested to see the performance of this. I don't have CFMX installed yet, still running on 4.5 But how does this query a query stuff work? Who is doing the joining of the tables? If it's Cold Fusion, it just can't be fast. That's why we build seperate Database servers is so that the Database actions can be performed by a seerate machine through a process finely tuned for such actions. And if Cold Fusion is returning the cached query to SQL to join to, this just can't be faster either. Now instead of sending a SQL statement and returning a recordset, you'd have to send a recordset AND SQL and return a recordset.

Try (if you would) to create a rather time consuming query (oh, that should be hard) and then compare the time it takes to add an order by to the statement as opposed to running a query of queries order by of the cached statement. This won't give ALL the numbers, but it will give an idea of how the performance works out.

Does the CF debugging show the processing time for a query of a query? We'd need that of course...

And if ANYONE can find any details on specifically HOW this query of a query works (who's doing the processing, efficiency, speed, CPU and memory considerations) I'd be really interested...
0
 
LVL 5

Expert Comment

by:heathprovost
ID: 8080550
curmudgeon42:

I see what you are after now. I didnt realize that you were using an OOP wrapper around your data. You should be able to get away with it, just be careful about your locking, as it is real easy to corrupt stuff in the app scope if you dont lock consistently.

Assuming all your page level code is just calling methods on the CFC, you should be able to move all your locking complexity up into the CFC code.

TallerMike:

>But how does this query a query stuff work?

It basically boils down to CF implementing a simple SQL parser and execution engine which operates against data stored within native CF query recordsets. Indexes are built on the fly where needed and all the grunt work is done by CF.

>Who is doing the joining of the tables?

That depends on what you are doing. QonQ can do simple joins between previous query recordsets, but ideally you leave the heavy lifting to the real DB.

>If it's Cold Fusion, it just can't be fast.

You would be very surprised I think. Admittedly it depends on what you are actualing doing, and also on the size of the recordsets, but it is often MUCH MUCH faster than doing a full query to the DB, orders of magnitude in many cases. But this is only true in general if you already have a query lying around to utilize as a datasource.

>And if Cold Fusion is returning the cached query to SQL
>to join to, this just can't be faster either. Now instead
>of sending a SQL statement and returning a recordset,
>you'd have to send a recordset AND SQL and return a
>recordset.

That isnt how it works. The join would be done CF-side, the db server would only be responsible for returning the data for its part of the join.

>Try (if you would) to create a rather time consuming
>query (oh, that should be hard) and then compare the time
>it takes to add an order by to the statement as opposed
>to running a query of queries order by of the cached
>statement. This won't give ALL the numbers, but it will
>give an idea of how the performance works out.

I have done this many times, in fact that is one of the best ways to utilize QonQ. If you are doing nothing but an order by on the data, it is nearly free. Compare the differences in each approach and think about which would be faster...

Reruning the query with a new order by:

1. cfquery sends query to db (usually over network)
2. db server parses and executes query
3. db server returns resultset to CF (usually over network)
4. CF converts native result set to a CF recordset

Using QonQ with a cached query

1. CF parses query itself.
2. CF generates an adhoc index as needed.
3. CF executes query against data in memory.
4. CF builds a new recordset based on the query.

The primary difference isnt processing speed, it is latency. Sure, the db server actually does the work faster in general, but it requires a roundtrip over the network to do so. CF does it all in locally and in memory. And, at least in CFMX (not sure about CF5) it keeps these indexes laying around for future use. So the indexing step will get eliminated in subsequent runs. In practice I find it to be much faster than you would expect. Youd be surprised.

>Does the CF debugging show the processing time for a
>query of a query? We'd need that of course...

Yes it does.

>And if ANYONE can find any details on specifically HOW
>this query of a query works (who's doing the processing,
>efficiency, speed, CPU and memory considerations) I'd be
>really interested...

I think I covered that already. I dont know all the details about internal data representation or anything like that, but through experience I find it to be pretty darn fast. The exception would be CF5 though, it isnt nearly as fast as MX in this particular area. It is still usable, but MX blows it away in this regard.

Heath
0
 
LVL 8

Expert Comment

by:TallerMike
ID: 8080632
Well, with that said. I'm ready to start delivering points! Good call Health, looks like you know your stuff.

But I'm still not convinced... Remind me again why we seperate the DB server from the web server? If you take this to a certain extent, and your website is not heavily hit, it would seem easiest just to marry the 2 servers into one? I mean what's the point (other than data storeage and the occasional refresh) of having a seperate server if you're making the webserver to the processing?

I guess I'd have to see it on a heavily hit system using large recordsets before I was convinced of anything. It just seems like a bad idea though.
0
 
LVL 5

Expert Comment

by:heathprovost
ID: 8081260
Well, it is still a much better idea to run a db server on a seperate machine, that hasnt changed or anything. Most SQL servers (I use MSSQL) are memory hogs, and dont really like sharing CPU sycles with other processes. So running on the same box as the web server, while faster in simple cases, often leads to slowdowns because CF is competing with SQL for processor time and memory (and in unpredictable ways).

The thing is that QonQ isnt useful in and of itself. It is only really useful when combined with cached queries. IMO there really isnt any point to using it otherwise. The only exception is if you need to do a join between data from seperate datasources, it is occasionally useful for that since it cant be done otherwise.

But when combined with cached queries, it opens up alot of options you dont have otherwise. Im not claiming it is the fastest way to go about things... The fastest way would be to either just cache everything or try and maintain very small resultsets. But either of those approaches are difficult to manage.

Ill give you an example from the real world. I did a site which has an elaborate db search. The data returned from this search comes from a 3 way join, and the total number of rows was about 7000, which included 4 text fields with quite a bit of data in them. The search is done using multiple criteria, and the results are displayed in pages of 10 records, with a drill down to display all the data. The client wanted one of the text fields to be displayed in the result lising, along with the items name and a few other criteria. I calculated that the entire recordset weighs in at approximately 20MB.

Caching everything would be a nightmare here, because there are literally thousands of possibilites for the where clauses and such. Even if you had 4GB of memory, people tend to all search for different keywords and such, so the caching would be ineffective most of the time.

Doing very small resultsets would be very very difficult here. you would have to come up with a query that returns only the records to be displayed for the current results page (i.e. the query returns only the needed 10 records). While it is possible to do this, it makes for some very ugly and generally poorly performing queries. I have tried this approach in the past and although it is better than nothing, it isnt fun to deal with.

Your only other conventional option is to just write the code normally, but impose limits on searching parameters. You dont want people doing searches that return the whole recordset because that means CF is fetching 20MB of data every page load...

Using QonQ is an elegant way to deal with this. You have a base query that just fetches everything (with an order by specifying the most common case ordering). You cache it in memory using the normal query caching methods. Then on searches you use QonQ to filter this recordset. Once CF builds up the indexes, it is pretty fast. If the ordering is the common case (the user hasnt specified a different ordering) you dont even need to deal with order by clauses, as the data is already in order. You are generally just running simple where clauses. It is simple to code and predictably easy to flush when new data is available. It isnt as fast as the db server, but it doesnt need to be, it eliminates virtually ALL overhead with regards to the size of the resultset being transfered over the network. In fact, the bigger the underlying resultset, the more effective this is. If the result set was very small there would be no need for it.

Heath

Heath
0

Featured Post

Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article explores the design of a cache system that can improve the performance of a web site or web application.  The assumption is that the web site has many more “read” operations than “write” operations (this is commonly the ca…
What You Need to Know when Searching for a Webhost Provider
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…
Suggested Courses
Course of the Month13 days, 20 hours left to enroll

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question