Caching to achieve better Performance and Scalability

Published:
Updated:
Thoughout my experience working on eCommerce web applications I have seen applications succumbing to increased user demand and throughput. With increased loads the response times started to spike, which leads to user frustration and lost sales. I have seen several downtimes during the holiday season and promotional events, which is catastrophic for businesses.

The reason for this is mostly performance bottlenecks or scalability issues. Usually web applications get their data from relational databases. If an application is not designed right, databases become bottlenecks. The cost of scaling an enterprise databases like Oracle or DB2 is huge and can take us only so far as most of the relational databases are not always horizontally scalable. The other bottleneck is usually web service calls. On high loads web service calls can literally kill a website not designed for scalability and performance.

I recommend the use of caching for almost any application. Though I have used memcached and EHCache as caching providers and will be using memcached's example for this article, I am not here to propose any one caching solution. For simple websites simple map based solution of WhirlyCache, OSCache or cache4j might work pretty well, though you can still use more advance solutions like EHCache, memcached, JCS, Terracotta, Oracle Coherence, and many more. There is an upfront cost to using some of the advance distributed caching solutions but it really pays off over a longer period with better scalability and maintainability.

I will use memcached for explaning the caching concepts and examples in this article. More information about memcached can be found from http://memcached.org/

Let’s take a hypothetical example where a customer is logged in to the site and is browsing different pages. We might show a welcome message to the user on header of each page and also showing some personalized content on different pages. To fetch user information we need to query the database on each page; that is unnecessary as the user information is not going to change with each request. So we will be caching the customer information to reduce requests to the database.

memcached is a distributed memory object caching system, which means we will be running outside the application's Java virtual machine. memcached has 2 components: the memcached server and client. For web applications written in Java, the client library implementing the memcached interface is called spymemcached. For client libraries in other popular languages, refer to https://code.google.com/p/memcached/wiki/Clients

Once you setup the server, it can be referred to using host and port. On login, you can cache the customer object as:
 MemcachedClient c=new MemcachedClient(
                          new InetSocketAddress("hostname of memcache server", portNum));
                      
                      
                      // Cache customer object for 1 hour. Here customerKey is be unique key per customer.
                      c.set(customerKey, 3600, customerObject);

Open in new window

You can create customer key based on Session ID or customer ID or a combination of both depending on you application need. 

We can now bypass the database for any future use of customer object by fetching the object from cache:
// Retrieve customer object (synchronously).
                      Object myObject=c.get(customerKey);

Open in new window


One might now ask why not use a session scoped object then. The answer is that by using a session scoped object we are overloading application heap. Thus we are offloading calls to databases but we are also pushing the issue to a different layer. By using memcached we can scale the application horizontally by adding more applications and/or memcached instances.

Similarly one can also cache webservice responses when for the same request the response is always going to be the same (or at least the same for immediate calls).

If you are using Spring, memcached provides a plugin for that avoids the boilerplate code for putting and getting objects from cache: http://code.google.com/p/spymemcached/wiki/SpringIntegration

Caching is a huge topic and I am pretty sure I have left lots of room for questions and clarification. Feel free to send me your comments and feedback. 
4
1,633 Views

Comments (2)

CERTIFIED EXPERT

Commented:
I think I miss some point here, if you store it as "currentCustomer" and you have 2 clients currently browsing your website what happens when customer B makes a request? He will appear as customer A due to the shared "currentCustomer" cache key.

I agree session-scoped caches are bad but I think your cache-key should be something like "currentCustomer-<id>" then the session cookie should provide <id> and <auth_nonce> and then you can retrieve from memcache currentCustomer-<id> and verify that the auth_nonce matches, or something like that.

I really don't see how the proposed solution could work in a multi user environement.

Author

Commented:
Hi ThG,

Good catch!!! I will fix it in the article. To keep is simple and focused on caching only, I will not talk about nonce here.

Thanks for your feedback.

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.