Solved

How to Best Manage High PostgreSQL CPU Use?

Posted on 2011-02-24
41
1,417 Views
Last Modified: 2012-05-11
I am trying to fix a high PostgreSQL CPU use problem. We are using PostgreSQL 8.0.9 and when our JEE web application (in JBoss) is used in certain load increase conditions, top shows a slow increase in processes for PostgreSQL. When the problem occurs, there are approximately 12-15 PostgreSQL processes all showing SELECT on the far right of the process information and approximately 6-7% CPU usage each and then the app slows down a lot.

JBoss version: JBoss (MX MicroKernel) 4.0.3
Operating system: CentOS Linux 5.5
Kernel and CPU: Linux 2.6.18-194.26.1.el5 on x86_64
Processor information: 2 x Intel(R) Xeon(R) CPU E5420 @ 2.50GHz, 8 cores

Currently, our thought is to throw more hardware at it. If we do this, would the best option be something like Option A below or Option B?

Option A: 4 x AMD Opteron™ 6100 Series Processors each with 12 Cores
Option B: 4 x Intel® Xeon® 7500 series Processors each with 8 Cores

Is it correct to assume that CentOS Linux 5.5 with PostgreSQL 8.0.9 will scale proportionately with the addition of this many processors and cores (Ex. 4 processors each with 12 cores)? Is there something else I should consider in terms of throwing more hardware at it?
0
Comment
Question by:gwheato200
  • 21
  • 17
  • 3
41 Comments
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
Linux 5.5 will scale fine with that many cores, I believe its not until you get to 64-cores that things start to level off.

An upgrade to PostgreSQL 8.3+ will also help as there have been many performance improvements from 8.3 onwards.

Firstly, you should analyze your top queries and try to tune them. Much cheaper and more likely to get bigger benefits than just throwing hardware at the problem :)
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
Or alternatively, simply adding more memory might be cheaper and have a big benefit if you KNOW that increasing it will improve the query times. First things first - you need to understand why the queries are slow rather than blaming it on lack of CPU power.
0
 

Author Comment

by:gwheato200
Comment Utility
Hello, regarding the expert comment
Or alternatively, simply adding more memory might be cheaper and have a big benefit if you KNOW that increasing it will improve the query times. First things first - you need to understand why the queries are slow rather than blaming it on lack of CPU power.
, the server has a lot of extra memory and we would like to explore this option while we are exploring others.  Please refer to This Intro or another as you may prefer.  We are not ready to upgrade to a newer release of PostgreSQL for another month and this problem is urgent.  In that intro or another, please specify which parameter or parameters in postgresql.conf I should try increasing and to what degree and what other parameters should be kept in specific proportion as those are increased.  Which parameters and how much for a server with 17.38 GB total real memory on which JBoss is allocated a significant portion (Xms=1024m and Xmx=6144m but usually uses less than 3072m)?
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
What are your current settings?

Assuming you have 10GB of memory for PostgreSQL use, I would set these as a starting point and tweak them as necessary for your queries:

shared_buffers = 2GB
effective_cache_size = 8GB
wal_buffers = 1MB
work_mem = 64MB
maintenance_work_mem = 64MB

For example, are your queries predominantly large selects with lots of temporary usage (eg sorting, hashing, group by, etc) - if so then increasing work_mem further might help.
0
 

Author Comment

by:gwheato200
Comment Utility
Hello, regarding the expert comment
What are your current settings?

Assuming you have 10GB of memory for PostgreSQL use, I would set these as a starting point and tweak them as necessary for your queries:

shared_buffers = 2GB
effective_cache_size = 8GB
wal_buffers = 1MB
work_mem = 64MB
maintenance_work_mem = 64MB

For example, are your queries predominantly large selects with lots of temporary usage (eg sorting, hashing, group by, etc) - if so then increasing work_mem further might help.

The current settings are much lower than the settings you have suggested and it seems increasing them a lot might help.

Regarding shared_buffers, This Intro mentions a concern when it says
It's likely you will have to increase the amount of memory your operating system allows you to allocate at once to set the value this high; see Managing Kernel Resources for details

What do you think about that comment about increasing the memory the operating system allows me to allocate once the value is set high (ex. 2GB)?

In the 8.0 documentation, Here it says
    Sets the number of shared memory buffers used by the database server. The default is typically 1000, but may be less if your kernel settings will not support it (as determined during initdb). Each buffer is 8192 bytes, unless a different value of BLCKSZ was chosen when building the server. This setting must be at least 16, as well as at least twice the value of max_connections; however, settings significantly higher than the minimum are usually needed for good performance. Values of a few thousand are recommended for production installations. This option can only be set at server start.

    Increasing this parameter may cause PostgreSQL to request more System V shared memory than your operating system's default configuration allows. See Section 16.5.1 for information on how to adjust those parameters, if necessary.

In that portion of documentation, the recommendation is a few thousand bytes!  That is a lot different than 2GB!

What about the reference to a ration between this parameter and the max_connections parameter?  How much should max_connections be set to in the scenario we are talking about?

This portion of documentation also mentions the shared memory that the operating system will allow - what do you think - please share your thoughts on this?

Also, it seems that 8.0 needs the memory amounts to be entered in bytes - does this assumption seem correct to you?
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
Ah sorry, I forgot the 8.0 parameter was measured in pages rather than bytes. In that case try a value of 262144 (x 8k per page = 2gb). The default values are ridiculously small and are designed to run on very little hardware. It's true you may need to increase some semaphore settings, but PostgreSQL will tell you if it doesn't work when you try to restart. If it failed it will just use the old settings when it comes back up. Sorry, but I can't remember if the default kernel parameters were enough for 2GB. In Linux 5.5 I would expect it should be fine.

The max_connections is referenced because every connection requires memory+semaphores. It's just a calculation to ensure you allocate enough. As an example, I have these semaphore + memory settings on one server that work fine:


$ ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

Memory Parameters...
  shared_buffers = 16GB
  effective_cache_size = 32GB
  wal_buffers = 1MB
  work_mem = 64MB
  maintenance_work_mem = 256MB


Also, make sure you are looking at the 8.0 manual, not the 9.0 one, e.g. http://www.postgresql.org/docs/8.0/static/runtime-config.html
0
 

Author Comment

by:gwheato200
Comment Utility
Hello, thank you again for your quick reply.  Regarding the expert comment above beginning with:
Ah sorry, I forgot the 8.0 parameter was measured in pages rather than bytes. In that case try a value of 262144 (x 8k per page = 2gb)....
It seems we are really narrowing down the steps I should take in exploring this option.  However, I would very much like for you to re-phrase the comment in a step by step format where everything is spelled out for these specific circumstances (ex. 8.0 and 10gb of memory available).  

For example, what specifically should I do and look for to check the operating system allowances?

Also, what specifically should I set in postgresql.conf in consideration that 8.0 uses bytes and that 8.0 in some cases measures parameters in pages rather than bytes?
0
 

Author Comment

by:gwheato200
Comment Utility
Hello, this output from the server may help:
$ ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

Open in new window

0
 

Author Comment

by:gwheato200
Comment Utility
Hello, does the following output from the server indicate that there should be no concern about operating system allowances?
$ ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
0
 

Author Comment

by:gwheato200
Comment Utility
Hello, regarding the expert comment:
For example, are your queries predominantly large selects with lots of temporary usage (eg sorting, hashing, group by, etc) - if so then increasing work_mem further might help.
I believe we do have a lot of these types of queries.
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
Your semaphore settings should be fine for the memory settings we're talking about.

Try these settings in postgresql.conf, suitable for 8.0:

shared_buffers = 262144
effective_cache_size = 4194304
wal_buffers = 8192
work_mem = 65536
maintenance_work_mem = 262144

After saving the file, restart the cluster for them to take effect.
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
Keep an eye on the log file and turn on temp file logging (using parameter log_temp_files). If you see lots of occurrences of it using temp files, you can increase the work_mem parameter which will make it use more memory for temporary operations before spilling to disk.

See here for example of how to setup log monitoring.
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
Whoops sorry, log_temp_files doesn't seem to exist in 8.0. Save that for after you upgrade :)
0
 

Author Comment

by:gwheato200
Comment Utility
Hello, thank you again for your quick reply.  Regarding the expert comment beginning with:
Your semaphore settings should be fine for the memory settings we're talking about.

Try these settings in postgresql.conf, suitable for 8.0:
It seems that you may have converted things wrong.  For example, assuming the postgresql.conf integers for these parameters are understood to be kb, the shared_buffers = 262144 would be .25 gigabytes (See This Link

Is 8.0 seeing the integers as kb?  Isn't your conversion off?
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
According to the 8.0 manual its measured in pages, each of which are 8KB by default.

http://www.postgresql.org/docs/8.0/static/runtime-config.html#shared_buffers
0
 

Author Comment

by:gwheato200
Comment Utility
Hello, thank you again for your quick replies.  Regarding your comment:
According to the 8.0 manual its measured in pages, each of which are 8KB by default.
Thank you for that explanation.  I see that in postgresql.conf there is a comment that says:
8KB each
There is another important comment in the file as follows:
#effective_cache_size = 1000      # typically 8KB each
#wal_buffers = 8            # min 4, 8KB each
However, work_mem and maintenance_work_mem have comments that include:
size in KB
Please advise how everything should be adjusted given these comments.
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
That is correct, and it agrees with the documentation. I didn't mean every parameter was in pages:

shared_buffers = 262144 (pages)
effective_cache_size = 4194304 (pages)
wal_buffers = 8192 (pages)
work_mem = 65536 (KB)
maintenance_work_mem = 262144 (KB)

Good thing you're checking my conversions though, don't trust anything you read on the Internet ;)
0
 

Author Comment

by:gwheato200
Comment Utility
Hello, thank you again for your quick replies.  Regarding your comment:
That is correct, and it agrees with the documentation. I didn't mean every parameter was in pages:

shared_buffers = 262144 (pages)
effective_cache_size = 4194304 (pages)
wal_buffers = 8192 (pages)
work_mem = 65536 (KB)
maintenance_work_mem = 262144 (KB)

For the effective_cache_size, if each page is 8192 bytes and the setting is set to 4194304 pages, this would be 32 gigabytes!  Please advise.
0
 

Author Comment

by:gwheato200
Comment Utility
Hello, for:
wal_buffers = 8192 (pages)
This would be 64 megabytes if each page is 8192 bytes.  Earlier, you wrote you wanted this to be set to 1 megabyte.  Please advise.
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
Sorry, was looking at my own settings when I converted them! I'm sure you've already figured it out but these are your conversions:

shared_buffers = 262144 (2GB in pages)
effective_cache_size = 1048576 (8GB in pages)
wal_buffers = 128 (1MB in pages)
work_mem = 65536 (64MB in KB)
maintenance_work_mem = 65536 (64MB in KB)
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:gwheato200
Comment Utility
Hello, it seemed you adjusted maintenance_work_mem up to 256 megabytes after some further consideration.  Isn't this correct?
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
Sorry for the confusion, see my previous comment.
0
 

Author Comment

by:gwheato200
Comment Utility
So should maintenance_work_mem be 64 or 256?
0
 
LVL 7

Assisted Solution

by:MrNed
MrNed earned 333 total points
Comment Utility
The number I converted is for 64MB. I would suggest this as a minimum and you can increase it later if required. It's used for maintenance operations for example vacuuming.
0
 

Author Comment

by:gwheato200
Comment Utility
For a database that grows by approximately 1MB per day and is now about 20MB, how often should I vacuum?
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
I don't think I've ever worked on a database that small, I hope it's a typo :)

Since 8.0 has no auto vacuum (upgrade!) I'd suggest scheduling a weekly vacuum analyze. Really depends on the type of changes happening - i.e. whether they tend to bloat the tables/indexes. But with such a small db/growth, weekly should be fine.
0
 

Author Comment

by:gwheato200
Comment Utility
Hello, I have applied these new settings.  It will take time to analyze if there has been an improvement but an initial test was not very encouraging.  I accessed a particular URL with a large query in 9 browsers with the GET requests being sent in approximately 3 second intervals.  The 5th browser started to show some big slowdown.  Watching top the postgres processes were consuming a lot of CPU with SELECT at the far right of the process information output.

Is there something that can be done to cache these queries so they don't use so much CPU?  (if at the application source code level, please be specific for a JBoss application that uses EJB 3.0 session and entity beans)
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
Can you please confirm that your database is only 20MB in size? If that's the case the default parameters which you would have had (but didn't show so can't confirm) would have been fine too. FYI its a tiny database and will definitely be already fully cached. For it to overload an 8-core server so fast you much have some incredibly bad queries and/or insufficient indexes.

1. Confirm database size
2. Post an example slow query
3. Post an explain analyze of that slow query

If you don't know #2 then turn on the log_min_duration_statement parameter (see the above doc for values to use) and then get it from the log file.
0
 

Author Comment

by:gwheato200
Comment Utility
I don't think its bad queries - I think it is query frequency or volume.  The most significant factor is that there are 3 small queries run every second via AJAX on almost every page.
0
 

Author Comment

by:gwheato200
Comment Utility
The database backup tar archive is less than 30MB.
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
3 small queries per second x 5 users against a tiny fully cached database should have next to zero CPU usage if done correctly. I still think you should identify those queries and post their analysis. If they take 1/3sec to run, that's too slow!
0
 

Author Comment

by:gwheato200
Comment Utility
I apologize for emphasizing the 3 queries/second * 5 users without mentioning that there were also 5 huge queries being run at the same time on those pages (ie. the GET requests I mentioned earlier sent approximately 3 seconds apart).  Without the big queries performance is fine.
0
 

Author Comment

by:gwheato200
Comment Utility
Please elaborate on if done correctly.  Maybe you could provide a URL to an article or explanation of the key things that make a query correct in terms of what would use CPU and what wouldn't.
0
 
LVL 7

Expert Comment

by:MrNed
Comment Utility
Sorry don't have anything to link to. I mean there has to be a schema design suitable for querying... for example, have primary keys, foreign keys, indexes on commonly referenced columns in where clauses. SQL-101 stuff. If you post a query along with its explain analyze output, I can see if all that stuff is being done properly.
0
 

Author Comment

by:gwheato200
Comment Utility
You mentioned that a small fully cached database usually doesn't have CPU problems.  Would throwing a lot of hardware (ex. 4 x 12 core CPUs and RAID 10 with 8 x 15k RPM drives) at it help significantly if the queries were unchanged and it was still small and fully cached?  How would the extra processor resources help?  How would the RAID 10 with RAID 10 with 8 x 15k RPM drives help?
0
 
LVL 7

Accepted Solution

by:
MrNed earned 333 total points
Comment Utility
The disk will only help if there are a lot of writes happening. But it sounds like your problem is the large query that runs on each page which is all read activity. Since the entire db is in RAM, disk is out of the equation. More CPUs would probably help you get a few more users on there but that's the difference between 5 users and maybe 50 users if you're lucky. Fixing the application code is required to get to 5000 users which is what that hardware should be handling. How many concurrent users do you need to cater for?

Your choice of course, but if it were me I wouldn't even be suggesting a hardware upgrade. Instead find the slow queries and tune them or at least ask the developers about tuning them if you are unsure.
0
 
LVL 61

Expert Comment

by:gheist
Comment Utility
PostgreSQL is not so well paralelized. Each session uses single processor thread.
For Example EnterpriseDB uses very small shared_buffers and assumes system unified cache does it's job well.
Also you need to vacuum analyze once in a while.

What do you have in your datasource file?
http://www.coderanch.com/t/89928/JBoss/Jboss-connection-pool-not-releasing
You can tune the pool to keep reasonable number of connections open.
0
 

Author Comment

by:gwheato200
Comment Utility
Hello, thank you for your quick reply.  Regarding the expert comment:
What do you have in your datasource file?
http://www.coderanch.com/t/89928/JBoss/Jboss-connection-pool-not-releasing
You can tune the pool to keep reasonable number of connections open
This is what I have in my datasource file:
<datasources>
  <local-tx-datasource>
    <jndi-name>Foo</jndi-name>
    <connection-url>jdbc:postgresql://localhost:5432/foodb</connection-url>
    <driver-class>org.postgresql.Driver</driver-class>
    <user-name>foouser</user-name>
    <password>foopassword</password>
      <!-- corresponding type-mapping in the standardjbosscmp-jdbc.xml (optional) -->
      <metadata>
         <type-mapping>PostgreSQL 8.0.9</type-mapping>
      </metadata>
  </local-tx-datasource>
</datasources>
0
 
LVL 61

Assisted Solution

by:gheist
gheist earned 167 total points
Comment Utility
Give it a try:
<min-pool-size>0</min-pool-size> <- just in case DB server has memory leak to release process sometimes.
<max-pool-size>100</max-pool-size> <- double no of CPU cores/threads looks a reasonable value....
<max-idle-timeout>5</max-idle-timeout>
<blocking-timeout-millis>20000</blocking-timeout-millis>
<idle-timeout-minutes>1</idle-timeout-minutes>
<track-statements>true</track-statements>
0
 

Author Closing Comment

by:gwheato200
Comment Utility
Problem not solved yet.
0
 
LVL 61

Expert Comment

by:gheist
Comment Utility
Did pooling change anything?
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Article by: kevp75
Hey folks, 'bout time for me to come around with a little tip. Thanks to IIS 7.5 Extensions and Microsoft (well... really Windows 8, and IIS 8 I guess...), we can now prime our Application Pools, when IIS starts. Now, though it would be nice t…
Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
Steps to create a PostgreSQL RDS instance in the Amazon cloud. We will cover some of the default settings and show how to connect to the instance once it is up and running.
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now