Link to home
Start Free TrialLog in
Avatar of Jeong Kim
Jeong KimFlag for United States of America

asked on

How to Best Manage High PostgreSQL CPU Use?

I am trying to fix a high PostgreSQL CPU use problem. We are using PostgreSQL 8.0.9 and when our JEE web application (in JBoss) is used in certain load increase conditions, top shows a slow increase in processes for PostgreSQL. When the problem occurs, there are approximately 12-15 PostgreSQL processes all showing SELECT on the far right of the process information and approximately 6-7% CPU usage each and then the app slows down a lot.

JBoss version: JBoss (MX MicroKernel) 4.0.3
Operating system: CentOS Linux 5.5
Kernel and CPU: Linux 2.6.18-194.26.1.el5 on x86_64
Processor information: 2 x Intel(R) Xeon(R) CPU E5420 @ 2.50GHz, 8 cores

Currently, our thought is to throw more hardware at it. If we do this, would the best option be something like Option A below or Option B?

Option A: 4 x AMD Opteron™ 6100 Series Processors each with 12 Cores
Option B: 4 x Intel® Xeon® 7500 series Processors each with 8 Cores

Is it correct to assume that CentOS Linux 5.5 with PostgreSQL 8.0.9 will scale proportionately with the addition of this many processors and cores (Ex. 4 processors each with 12 cores)? Is there something else I should consider in terms of throwing more hardware at it?
Avatar of MrNed
MrNed
Flag of Australia image

Linux 5.5 will scale fine with that many cores, I believe its not until you get to 64-cores that things start to level off.

An upgrade to PostgreSQL 8.3+ will also help as there have been many performance improvements from 8.3 onwards.

Firstly, you should analyze your top queries and try to tune them. Much cheaper and more likely to get bigger benefits than just throwing hardware at the problem :)
Or alternatively, simply adding more memory might be cheaper and have a big benefit if you KNOW that increasing it will improve the query times. First things first - you need to understand why the queries are slow rather than blaming it on lack of CPU power.
Avatar of Jeong Kim

ASKER

Hello, regarding the expert comment
Or alternatively, simply adding more memory might be cheaper and have a big benefit if you KNOW that increasing it will improve the query times. First things first - you need to understand why the queries are slow rather than blaming it on lack of CPU power.
, the server has a lot of extra memory and we would like to explore this option while we are exploring others.  Please refer to This Intro or another as you may prefer.  We are not ready to upgrade to a newer release of PostgreSQL for another month and this problem is urgent.  In that intro or another, please specify which parameter or parameters in postgresql.conf I should try increasing and to what degree and what other parameters should be kept in specific proportion as those are increased.  Which parameters and how much for a server with 17.38 GB total real memory on which JBoss is allocated a significant portion (Xms=1024m and Xmx=6144m but usually uses less than 3072m)?
What are your current settings?

Assuming you have 10GB of memory for PostgreSQL use, I would set these as a starting point and tweak them as necessary for your queries:

shared_buffers = 2GB
effective_cache_size = 8GB
wal_buffers = 1MB
work_mem = 64MB
maintenance_work_mem = 64MB

For example, are your queries predominantly large selects with lots of temporary usage (eg sorting, hashing, group by, etc) - if so then increasing work_mem further might help.
Hello, regarding the expert comment
What are your current settings?

Assuming you have 10GB of memory for PostgreSQL use, I would set these as a starting point and tweak them as necessary for your queries:

shared_buffers = 2GB
effective_cache_size = 8GB
wal_buffers = 1MB
work_mem = 64MB
maintenance_work_mem = 64MB

For example, are your queries predominantly large selects with lots of temporary usage (eg sorting, hashing, group by, etc) - if so then increasing work_mem further might help.

The current settings are much lower than the settings you have suggested and it seems increasing them a lot might help.

Regarding shared_buffers, This Intro mentions a concern when it says
It's likely you will have to increase the amount of memory your operating system allows you to allocate at once to set the value this high; see Managing Kernel Resources for details

What do you think about that comment about increasing the memory the operating system allows me to allocate once the value is set high (ex. 2GB)?

In the 8.0 documentation, Here it says
    Sets the number of shared memory buffers used by the database server. The default is typically 1000, but may be less if your kernel settings will not support it (as determined during initdb). Each buffer is 8192 bytes, unless a different value of BLCKSZ was chosen when building the server. This setting must be at least 16, as well as at least twice the value of max_connections; however, settings significantly higher than the minimum are usually needed for good performance. Values of a few thousand are recommended for production installations. This option can only be set at server start.

    Increasing this parameter may cause PostgreSQL to request more System V shared memory than your operating system's default configuration allows. See Section 16.5.1 for information on how to adjust those parameters, if necessary.

In that portion of documentation, the recommendation is a few thousand bytes!  That is a lot different than 2GB!

What about the reference to a ration between this parameter and the max_connections parameter?  How much should max_connections be set to in the scenario we are talking about?

This portion of documentation also mentions the shared memory that the operating system will allow - what do you think - please share your thoughts on this?

Also, it seems that 8.0 needs the memory amounts to be entered in bytes - does this assumption seem correct to you?
Ah sorry, I forgot the 8.0 parameter was measured in pages rather than bytes. In that case try a value of 262144 (x 8k per page = 2gb). The default values are ridiculously small and are designed to run on very little hardware. It's true you may need to increase some semaphore settings, but PostgreSQL will tell you if it doesn't work when you try to restart. If it failed it will just use the old settings when it comes back up. Sorry, but I can't remember if the default kernel parameters were enough for 2GB. In Linux 5.5 I would expect it should be fine.

The max_connections is referenced because every connection requires memory+semaphores. It's just a calculation to ensure you allocate enough. As an example, I have these semaphore + memory settings on one server that work fine:


$ ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

Memory Parameters...
  shared_buffers = 16GB
  effective_cache_size = 32GB
  wal_buffers = 1MB
  work_mem = 64MB
  maintenance_work_mem = 256MB


Also, make sure you are looking at the 8.0 manual, not the 9.0 one, e.g. http://www.postgresql.org/docs/8.0/static/runtime-config.html
Hello, thank you again for your quick reply.  Regarding the expert comment above beginning with:
Ah sorry, I forgot the 8.0 parameter was measured in pages rather than bytes. In that case try a value of 262144 (x 8k per page = 2gb)....
It seems we are really narrowing down the steps I should take in exploring this option.  However, I would very much like for you to re-phrase the comment in a step by step format where everything is spelled out for these specific circumstances (ex. 8.0 and 10gb of memory available).  

For example, what specifically should I do and look for to check the operating system allowances?

Also, what specifically should I set in postgresql.conf in consideration that 8.0 uses bytes and that 8.0 in some cases measures parameters in pages rather than bytes?
Hello, this output from the server may help:
$ ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

Open in new window

Hello, does the following output from the server indicate that there should be no concern about operating system allowances?
$ ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
Hello, regarding the expert comment:
For example, are your queries predominantly large selects with lots of temporary usage (eg sorting, hashing, group by, etc) - if so then increasing work_mem further might help.
I believe we do have a lot of these types of queries.
Your semaphore settings should be fine for the memory settings we're talking about.

Try these settings in postgresql.conf, suitable for 8.0:

shared_buffers = 262144
effective_cache_size = 4194304
wal_buffers = 8192
work_mem = 65536
maintenance_work_mem = 262144

After saving the file, restart the cluster for them to take effect.
Keep an eye on the log file and turn on temp file logging (using parameter log_temp_files). If you see lots of occurrences of it using temp files, you can increase the work_mem parameter which will make it use more memory for temporary operations before spilling to disk.

See here for example of how to setup log monitoring.
Whoops sorry, log_temp_files doesn't seem to exist in 8.0. Save that for after you upgrade :)
Hello, thank you again for your quick reply.  Regarding the expert comment beginning with:
Your semaphore settings should be fine for the memory settings we're talking about.

Try these settings in postgresql.conf, suitable for 8.0:
It seems that you may have converted things wrong.  For example, assuming the postgresql.conf integers for these parameters are understood to be kb, the shared_buffers = 262144 would be .25 gigabytes (See This Link

Is 8.0 seeing the integers as kb?  Isn't your conversion off?
According to the 8.0 manual its measured in pages, each of which are 8KB by default.

http://www.postgresql.org/docs/8.0/static/runtime-config.html#shared_buffers
Hello, thank you again for your quick replies.  Regarding your comment:
According to the 8.0 manual its measured in pages, each of which are 8KB by default.
Thank you for that explanation.  I see that in postgresql.conf there is a comment that says:
8KB each
There is another important comment in the file as follows:
#effective_cache_size = 1000      # typically 8KB each
#wal_buffers = 8            # min 4, 8KB each
However, work_mem and maintenance_work_mem have comments that include:
size in KB
Please advise how everything should be adjusted given these comments.
That is correct, and it agrees with the documentation. I didn't mean every parameter was in pages:

shared_buffers = 262144 (pages)
effective_cache_size = 4194304 (pages)
wal_buffers = 8192 (pages)
work_mem = 65536 (KB)
maintenance_work_mem = 262144 (KB)

Good thing you're checking my conversions though, don't trust anything you read on the Internet ;)
Hello, thank you again for your quick replies.  Regarding your comment:
That is correct, and it agrees with the documentation. I didn't mean every parameter was in pages:

shared_buffers = 262144 (pages)
effective_cache_size = 4194304 (pages)
wal_buffers = 8192 (pages)
work_mem = 65536 (KB)
maintenance_work_mem = 262144 (KB)

For the effective_cache_size, if each page is 8192 bytes and the setting is set to 4194304 pages, this would be 32 gigabytes!  Please advise.
Hello, for:
wal_buffers = 8192 (pages)
This would be 64 megabytes if each page is 8192 bytes.  Earlier, you wrote you wanted this to be set to 1 megabyte.  Please advise.
Sorry, was looking at my own settings when I converted them! I'm sure you've already figured it out but these are your conversions:

shared_buffers = 262144 (2GB in pages)
effective_cache_size = 1048576 (8GB in pages)
wal_buffers = 128 (1MB in pages)
work_mem = 65536 (64MB in KB)
maintenance_work_mem = 65536 (64MB in KB)
Hello, it seemed you adjusted maintenance_work_mem up to 256 megabytes after some further consideration.  Isn't this correct?
Sorry for the confusion, see my previous comment.
So should maintenance_work_mem be 64 or 256?
SOLUTION
Avatar of MrNed
MrNed
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
For a database that grows by approximately 1MB per day and is now about 20MB, how often should I vacuum?
I don't think I've ever worked on a database that small, I hope it's a typo :)

Since 8.0 has no auto vacuum (upgrade!) I'd suggest scheduling a weekly vacuum analyze. Really depends on the type of changes happening - i.e. whether they tend to bloat the tables/indexes. But with such a small db/growth, weekly should be fine.
Hello, I have applied these new settings.  It will take time to analyze if there has been an improvement but an initial test was not very encouraging.  I accessed a particular URL with a large query in 9 browsers with the GET requests being sent in approximately 3 second intervals.  The 5th browser started to show some big slowdown.  Watching top the postgres processes were consuming a lot of CPU with SELECT at the far right of the process information output.

Is there something that can be done to cache these queries so they don't use so much CPU?  (if at the application source code level, please be specific for a JBoss application that uses EJB 3.0 session and entity beans)
Can you please confirm that your database is only 20MB in size? If that's the case the default parameters which you would have had (but didn't show so can't confirm) would have been fine too. FYI its a tiny database and will definitely be already fully cached. For it to overload an 8-core server so fast you much have some incredibly bad queries and/or insufficient indexes.

1. Confirm database size
2. Post an example slow query
3. Post an explain analyze of that slow query

If you don't know #2 then turn on the log_min_duration_statement parameter (see the above doc for values to use) and then get it from the log file.
I don't think its bad queries - I think it is query frequency or volume.  The most significant factor is that there are 3 small queries run every second via AJAX on almost every page.
The database backup tar archive is less than 30MB.
3 small queries per second x 5 users against a tiny fully cached database should have next to zero CPU usage if done correctly. I still think you should identify those queries and post their analysis. If they take 1/3sec to run, that's too slow!
I apologize for emphasizing the 3 queries/second * 5 users without mentioning that there were also 5 huge queries being run at the same time on those pages (ie. the GET requests I mentioned earlier sent approximately 3 seconds apart).  Without the big queries performance is fine.
Please elaborate on if done correctly.  Maybe you could provide a URL to an article or explanation of the key things that make a query correct in terms of what would use CPU and what wouldn't.
Sorry don't have anything to link to. I mean there has to be a schema design suitable for querying... for example, have primary keys, foreign keys, indexes on commonly referenced columns in where clauses. SQL-101 stuff. If you post a query along with its explain analyze output, I can see if all that stuff is being done properly.
You mentioned that a small fully cached database usually doesn't have CPU problems.  Would throwing a lot of hardware (ex. 4 x 12 core CPUs and RAID 10 with 8 x 15k RPM drives) at it help significantly if the queries were unchanged and it was still small and fully cached?  How would the extra processor resources help?  How would the RAID 10 with RAID 10 with 8 x 15k RPM drives help?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
PostgreSQL is not so well paralelized. Each session uses single processor thread.
For Example EnterpriseDB uses very small shared_buffers and assumes system unified cache does it's job well.
Also you need to vacuum analyze once in a while.

What do you have in your datasource file?
http://www.coderanch.com/t/89928/JBoss/Jboss-connection-pool-not-releasing
You can tune the pool to keep reasonable number of connections open.
Hello, thank you for your quick reply.  Regarding the expert comment:
What do you have in your datasource file?
http://www.coderanch.com/t/89928/JBoss/Jboss-connection-pool-not-releasing
You can tune the pool to keep reasonable number of connections open
This is what I have in my datasource file:
<datasources>
  <local-tx-datasource>
    <jndi-name>Foo</jndi-name>
    <connection-url>jdbc:postgresql://localhost:5432/foodb</connection-url>
    <driver-class>org.postgresql.Driver</driver-class>
    <user-name>foouser</user-name>
    <password>foopassword</password>
      <!-- corresponding type-mapping in the standardjbosscmp-jdbc.xml (optional) -->
      <metadata>
         <type-mapping>PostgreSQL 8.0.9</type-mapping>
      </metadata>
  </local-tx-datasource>
</datasources>
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Problem not solved yet.
Did pooling change anything?