Solved

Server bottleneck - CPU, mem, hd ?

Posted on 2003-12-10
37
1,202 Views
Last Modified: 2013-12-16
Hello,
  i have a server

Athlon XP 1400
hda : IDE 60gb
hdc : IDE 40gb
RAM 1gb
3COM 100mb net

running on Debian 2.2, Apache, MySQL, PHP, ASP, Qmail

I feel the server is slow, when serving the PHP pages.

The traffic is about 250gb/month, pages served (mostly PHP) about 20 milions, files served total about 50 millions.

I monitor the main things, like

CPU
http://www.valka.cz/cacti2/cpu.gif (one K means 10%)

MEM
http://www.valka.cz/cacti2/mem.gif

PROC RUNNING
http://www.valka.cz/cacti2/proc.gif

NET TRAFFIC
http://www.valka.cz/cacti2/net_traf.gif

but as it seems to me, CPU is fine (mostly running on some 5-10% load), memory is fine? (I didnt see big difference between 0,5 and 1 gb of RAM when added), network is fine too (100mbps card shouldn't be overheaded by 2,5mb/s traffic).

Are my implications wrong, is there really something else ? disk maybe ? how can i find out the usage of the drives, if they are working fine ? IOSTAT doesnt work on my machine, maybe i need to install it or something ?

My current HDA setting looks like this :

 Model=MAXTOR 6L060J3, FwRev=A93.0500, SerialNo=663201911680
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=32256, SectSize=21298, ECCbytes=4
 BuffType=3(DualPortCache), BuffSize=1819kB, MaxMultSect=16, MultSect=16
 DblWordIO=no, OldPIO=2, DMA=yes, OldDMA=2
 CurCHS=4047/16/255, CurSects=16511760, LBA=yes, LBAsects=117266688
 tDMA={min:120,rec:120}, DMA modes: mword0 mword1 mword2
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, PIO modes: mode3 mode4
 UDMA modes: mode0 mode1 mode2 mode3 mode4 *mode5 mode6

/dev/hda:
 multcount    = 16 (on)
 I/O support  =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 nowerr       =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 7299/255/63, sectors = 117266688, start = 0

and the speed :

Timing buffer-cache reads:   128 MB in  6.13 seconds = 20.88 MB/sec
Timing buffered disk reads:  64 MB in  6.30 seconds = 10.16 MB/sec

I seek help to find out the problem area of my server. I want to invest into new components, but I want to be sure, where the problem is.

If I should provide some more info, please let me know.

Thanks for any help

0
Comment
Question by:Letus
  • 18
  • 10
  • 8
  • +1
37 Comments
 
LVL 5

Expert Comment

by:willy134
ID: 9914037
What is the bandwidth of you external pipline (assuming you are serving pages to the world not just internal)?  You may find your slowdown is not your connection but your data bandwidth.  I know you have the 100Mb/s but if you outgoing line is on 1Mbs you may need to up that.
0
 
LVL 2

Author Comment

by:Letus
ID: 9914532
Nope, problem shouldnt be there, it is connected to backbone provider on some gigabyte line. I'm sure the pages are slow already "on the server".
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 9914888
could it be that your php pages make each their own connection to mysql?
are there a lot of pgaes using mysql?
0
 
LVL 2

Author Comment

by:Letus
ID: 9915059
the mysql provides milions of queries, there are about twenty websites with php pages ... the average numbers for the whole mysql are

total (5 hrs, 22mnts) 2 293 862  
avg - hr 426 390.42  
avg - min  7 106.51  
avg - sec     118.44  

I also use caching on the mysql, there are not many processes running at the same time (about 5 max), except when the server is
slowing down, then the amount of processes grows up to 20.

for php in our code, we tried to change the connection from persistent to non persistent and back, but it makes no difference. Also the ASP
(via chilisoft) is slow, so I think the problem is in the machine itself, not in the codes (various codes, like phpBB, even phpMyAdmin and
so on, i think they are pretty well done).
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 9919182
what does top tell you?
0
 
LVL 2

Author Comment

by:Letus
ID: 9919209
top is fine, as I mentioned, it doesnt have problem with CPU, memory, anything. There is lot of mysql processes, and httpd, according to the load of the webserver, i do not see anything special there. Do you want me to paste the results here ?
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 9919504
top is dynamic, youu need to watch it some time, pasting results makes no sense, usually.
but the header lines would be interesting, please post these (first 5 or so)
have you checked what's going on in mysql?
There are some commands inside mysql to get statistics about performance, like status (sorry I'm not well experianced there)
0
 
LVL 2

Author Comment

by:Letus
ID: 9919674
TOP results :


  1:21pm  up 51 days, 14:52,  0 users,  load average: 4.85, 3.42, 3.81
319 processes: 310 sleeping, 6 running, 3 zombie, 0 stopped
CPU states:  0.6% user,  0.5% system,  0.0% nice,  0.6% idle
Mem:  1034076K av, 984304K used,  49772K free,      0K shrd,  24116K buff
Swap: 2048248K av, 243232K used, 1805016K free                275764K cached

mysql performance - I cannot read it, what all the numbers means or what should it look like :( as I said, the caching of the queries improved the performance, it is set up as "large" mysql server with a lot of memory.
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 9919813
sounds good, so if there are no %CPU and/or %MEM consuming processes in top, then it might be a network prbloem.
Check with   ifconfig -a  and see if there are significant values for errors: dropped: collisions:
0
 
LVL 2

Author Comment

by:Letus
ID: 9919895
nope, no errors present :(

eth0      Link encap:Ethernet  HWaddr 00:04:75:E4:BE:6C  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:576752212 errors:0 dropped:0 overruns:1 frame:0
          TX packets:529002209 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          Interrupt:10 Base address:0xec00

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1339569994 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1339569994 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

Thats weird, isn't it ? Thanks to all of you for the help. I hope we will find the problem :)
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 9921316
hmm, best I can think of is if you can compare the two machines.

You're assuming that it is a hardware and/or system(-configuration) problem. But if you find that it is an application problem (httpd, mysqld, etc.), it's probably worth to test with a professional tool like Quantify (to be found at http://www.rational.com/)
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 10068475
You posted:
CPU states:  0.6% user,  0.5% system,  0.0% nice,  0.6% idle
Those numbers should add to approx 100%. Where has all the CPU gone?
0
 
LVL 2

Author Comment

by:Letus
ID: 10068961
This is what it says today :

TOP

 8:38am  up 79 days, 10:10,  1 user,  load average: 3.21, 2.85, 2.03
273 processes: 267 sleeping, 3 running, 3 zombie, 0 stopped
CPU states:  0.2% user,  0.0% system,  0.0% nice,  0.2% idle
Mem:  1034076K av, 986052K used,  48024K free,      0K shrd,  38216K buff
Swap: 2048248K av, 230256K used, 1817992K free                318520K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
27458 mysql     18   0  157M  79M 59700 R    2984 30.0  7.8   5:20 mysqld
 7757 root      14   0  1564 1564   680 R       0 16.6  0.1   0:01 top
 7600 nobody    12   0 27684  21M  4520 S       0 10.0  2.1   0:00 httpd
 7597 nobody    14   0 22212  16M  4460 R       0  4.3  1.6   0:01 httpd
 7582 nobody     9   0 22256  19M  7720 S       0  2.5  1.9   0:01 httpd
 7317 nobody     9   0 22652  17M  4720 S       0  1.5  1.7   0:03 httpd
 7430 nobody     9   0 22320  17M  4796 S       0  1.5  1.6   0:02 httpd
 7416 nobody     9   0 12560 7476  4808 S       0  1.2  0.7   0:01 httpd
 7583 nobody    12   0 11776 9152  7276 S       0  1.2  0.8   0:00 httpd
 7594 nobody    10   0 23332  17M  4468 S       0  1.0  1.7   0:03 httpd
 7309 nobody     9   0 21272  19M  7972 S       0  0.7  1.9   0:04 httpd
 7444 nobody     9   0 21688  18M  7212 S       0  0.7  1.8   0:02 httpd
17762 mysql      9   0  157M  79M 59696 S    2984  0.5  7.8   5:56 mysqld
28152 mysql      9   0  157M  79M 59700 S    2984  0.5  7.8   6:16 mysqld
 6976 nobody     9   0 22932  17M  4920 S       0  0.5  1.7   0:03 httpd
 7040 nobody     9   0 23220  18M  4880 S       0  0.5  1.7   0:08 httpd
 7148 nobody    10   0 22732  20M  7980 S       0  0.5  2.0   0:04 httpd


APACHE STATUS

Server Version: Apache
Server Built: Oct 6 2003 14:17:15

--------------------------------------------------------------------------------
Current Time: Thursday, 08-Jan-2004 08:39:30 CET
Restart Time: Wednesday, 07-Jan-2004 17:43:34 CET
Parent Server Generation: 5
Server uptime: 14 hours 55 minutes 56 seconds
Total accesses: 871742 - Total Traffic: 6.0 GB
CPU Usage: u107.78 s27.13 cu.08 cs.08 - .251% CPU load
16.2 requests/sec - 117.5 kB/second - 7.2 kB/request
19 requests currently being processed, 8 idle servers
RWWWWRWW_WWWW__RR.__WW__W_.W...W................................
................................................................
................................................................
................................................................

IPCONFIG -a

dummy0    Link encap:Ethernet  HWaddr 00:00:00:00:00:00
          BROADCAST NOARP  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

eth0      Link encap:Ethernet  HWaddr 00:04:75:E4:BE:6C
          inet addr:xxxx  Bcast:xxxx  Mask:255.255.255.224
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:825108085 errors:0 dropped:0 overruns:1 frame:0
          TX packets:801001004 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          Interrupt:10 Base address:0xec00

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1932302551 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1932302551 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0


MYSQL

This MySQL server has been running for 2 days, 22 hours, 13 minutes and 48 seconds. It started up on Jan 05, 2004 at 10:28 AM.
Server traffic: These tables show the network traffic statistics of this MySQL server since its startup.

                 Traffic   ø per hour  
 Received   1,775 MB   25,878 KB  
 Sent   4,008 MB   58,439 KB  
 Total   5,783 MB   84,317 KB  

                   Connections   ø per hour   %  
 Failed attempts   3   0.04   0.00 %  
 Aborted   482   6.86   0.06 %  
 Total   778,901   11,090.72   100,00 %  
 
SQL Queries
Total               ø per hour      ø per minute     ø per second  
 21,976,754     312,925.44      5,215.42          86.92  


0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 10069170
you 86 SQL Queries per second and roughly 11000 connections per hour (according your posted statistics).

This does not sound like cached queries (as you stated), it also sounds like a lot of new connections, probably multiple connections per HTTP-request.

It will be hard to identify the initial culprit: mysql, apache, or hardware.
But I'd start analyzing apache's logfile for a request statistics (request/hour, request/second, request requiering mysql, etc.).

Another idea is to fine-tune apache itself. There're sevaral options.
I'd do this after analyzing traffic-statistics (see above), for example:
  + compile apache (httpd) with special athlon optimize flags
  + remove unused modules in httpd.conf
  # propper use and configure mod_php, mod_fastcgi
  + httpd.conf samples:
    HostnameLookups Off
    Timeout 120
    KeepAlive On
    KeepAliveTimeout 10
    # MaxSpareServers      # you need to experiment her: comment out, or high value like 30
    # MinSpareServers       # you need to experiment too
to get a feeling how these tunings help, you may use: http://httpd.apache.org/docs/programs/ab.html

But as I said in a previous comment: my assumtion is that PHP/MySQL is the culprit.
0
 
LVL 2

Author Comment

by:Letus
ID: 10069297
Ok, thanks for ideas ... (I have no idea of compiling Apache for Athlon, I'm too newbie for this, I'm glad I was able to compile it the normal way :) )

Maybe I do not understand the concept of caching that well, here is the part of mySQL status report about cache :

 Qcache queries in cache   2025  
 Qcache inserts   10468505  
 Qcache hits   8619511  
 Qcache lowmem prunes   262916  
 Qcache not cached   54897  
 Qcache free memory   4107160  
 Qcache free blocks   957  
 Qcache total blocks   5329  

Which kind of queries can it "cache", speed up ? All ? SELECT only ? Because some of the sites are doing lot of select queries, but also INSERT and UPDATE (article read count, new users and so on)
 
The numbers from above should be followed by these (totals only):

change db   1,130,132  
delete   408,722  
insert   86,116  
select   10,514,547  
update   1,091,558  

I do not use the DNS lookup, and after experimenting with the AB tool for apache (load stress tool, I found out, that by turning KeepAlive off will speed it up by almost 75%, which is, I know, crazy, but for some reason, the KeepAlive on wasted the resources with beeing connected too long and waiting, even if timeout was set to 8. Our sites contains a lot of pictures /design and content/ which was probably the reason of keepalive leak. (I dont want to discuss to optimize the pages themselves, there is no problem with the content and size of the pages for the users, just with the way the server sometimes work fine, sometimes slows down)


0
 
LVL 2

Author Comment

by:Letus
ID: 10069307
One more point - we do not use persistent connection in PHP, because this meant a lot of concurent connections to mySQL, reaching more than 200 (one for each user mostly) and mySQL started to refuse the connections then ...
0
 
LVL 2

Author Comment

by:Letus
ID: 10069422
About the apache statistics, our biggest site (there are two major php sites and few small static html ones) has following statistics :

December 2003
Page viewed 16.987.810 (mostly php / mySQL)
Hits : 35.541.192
Data transfered : 190.62 GB


so it gives an average of

page views : 22.833 / hr
page views : 381 / mn
page views : 6,3 / sec

and

hits : 47.770  / hr
hits : 796 / mn
hits : 13,26 / sec

and

data : 261,5 Mb/ hr
data : 4,36 Mb / mn
data : 0,073 Mb / sec



0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 10069582
all in all, according your last comments, I still assume that the huge ammount of connects from PHP to MySQL is the problem.
I've no experiance with php and mod_fastcgi, but when using perl with mod_fastcgi it should be possible to reduce the # of connections, and use persistent ones.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 2

Author Comment

by:Letus
ID: 10071074
but mod_fastcgi doesnt support PHP, as it seems, does it ?
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 10072136
AFAIK it does not support PHP.
And AFAIK PHP's mysql_pconnect is a lousy/not recomended for use implementation.

Can you rewrite your PHP in perl, or whatever? Just the one with the mysql part.
0
 
LVL 2

Author Comment

by:Letus
ID: 10072366
well, I do not think so, I cannot work with Perl that good, as with php, and our site does mySQL stuff on everypage in various ways (posting and rating photos, comments, discussion board and so on) so it is not just one point to change, it would mean to rewrite the whole page :(
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 10075070
ok, then try to benchmark somehow (as I described before).
When you see a bottleneck somewhere you may probably cut it down.
My feeling still is to check PHP first. Probaly you crossed the limits of PHP here.
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 10076642
Returning to your TOP post:
I can see at least 70% of your CPU, so the totals line is bogus. I think we really need to see a good one, so please follow these instructions:-
0. These instructions assume a triple mouse click will highlight a line, and that the selection is persistent (does not go away if the display changes). I know this works in an xterm or on the linux console as long as you're running GPM (you can get to the console by Ctrl-Alt-F6). It may not work for other terminal emulators.
1. Run top. WAIT for it to refresh a few times
2. Triple-click on the totals line. It will highlight briefly.
3. You can now paste the totals line with the middle button, or with a 2-button mouse in the console the right button (into some temporary file maybe).
4. Post the totals line here.

I already suspect that you're going to see the CPU is fully utilised. If that is the case and you have money to throw at the hardware, then throw it. A quad xeon should do the trick.
0
 
LVL 2

Author Comment

by:Letus
ID: 10078540
Thanks, but the TOP command shows these numbers all the time, except of some problems, when the server is getting reaaaaly slow, then it shows 20, 30, 50 % ... to be sure, please see
http://www.valka.cz/cacti2/cpu.gif (one K means 10%)
where the cpu utilisation is logged into graph ... doesnt show any high load :((( even apache in it status log doesnt show it would use more than 1-5 % of the CPU :(

but, to be sure, I will try to do what you say
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 10081057
I looked at that already. I am not familiar with that tool, I trust TOP. Looking at your post, mysqld is using 30.0% CPU, TOP 16.6% (because there are so many processes, so you probably don't want to run it all the time), the first httpd is using 10.0%, the next 4.3, then 2.5, 1.5, 1.5 and so on. Total of those we can see is 73% CPU or so, and there's a whole lot we *can't* see. top -n 0 or top -n 1 will always give you that low total line, but top -n 2 gave me a good one. Maybe you could just try that.
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 10083022
if there are 30% CPU for mysql, it's most likely the huge amount of SQL queries, and if there're no persistant connection, mysql needs to handle the connections too.
I'd really think about a redesign which uses persistent connections.
Or you may analyse the queries and try to setup a more sophisticated indexing and caching in mysql.
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 10084693
Oops! I just re-read the TOP man page and it warns that you will always see a big usage for TOP in the initial display because the interval is short. So even with all the processes you are running, it won't really use so very much CPU and I should not have warned you against running it. Please just post the output from "top -n 2", then we can help you some more.
0
 
LVL 2

Author Comment

by:Letus
ID: 10086442
top n 2 was the same, but n 5 made the difference :

1:01pm  up 81 days, 14:33,  1 user,  load average: 1.76, 1.61, 1.64
188 processes: 180 sleeping, 6 running, 2 zombie, 0 stopped
CPU states: 68.2% user, 22.4% system,  0.0% nice,  9.2% idle
Mem:  1034076K av, 953036K used,  81040K free,      0K shrd,  35528K buff
Swap: 2048248K av, 228400K used, 1819848K free                449932K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
 7090 mysql     10   0  128M  62M 36960 S    1232 21.7  6.2   8:17 mysqld
29763 nobody    17   0 27604  21M  8184 R       0 10.7  2.0   0:02 httpd
29883 nobody    19   0 26604  19M  7992 R       0 10.1  1.9   0:01 httpd
29636 nobody    17   0 27472  21M  8304 R       0 10.0  2.0   0:03 httpd
29886 nobody     9   0 16096 5816  4428 S       0  7.2  0.5   0:00 httpd
29884 root      19   0  1492 1492   688 R       0  4.0  0.1   0:01 top
20204 mysql     14   0  127M  62M 36960 R    1232  3.3  6.1   7:43 mysqld
 7092 mysql      9   0  128M  62M 36960 S    1232  2.3  6.2   8:20 mysqld
29514 nobody     9   0 17344 7864  4820 S       0  1.7  0.7   0:01 httpd
29898 nobody    15   0 16132 5924  4456 S       0  1.5  0.5   0:00 httpd
 6817 mysql     17   0  126M  61M 36960 R    1232  1.1  6.2   8:11 mysqld
29776 nobody    10   0 27432  17M  4728 S       0  0.8  1.7   0:01 httpd
29882 nobody    12   0 17212 7164  4632 S       0  0.8  0.6   0:00 httpd
29889 nobody    14   0 16192 5920  4520 S       0  0.8  0.5   0:00 httpd
29896 nobody     9   0 16244 9608  7924 S       0  0.8  0.9   0:00 httpd
29244 nobody     9   0 28504  18M  5160 S       0  0.7  1.8   0:12 httpd
29891 nobody    10   0 16156 5848  4396 S       0  0.7  0.5   0:00 httpd

so what does it says now ? does it mean my statement about CPU was wrong ? Is CPU utilized a lot ?
0
 
LVL 2

Author Comment

by:Letus
ID: 10086525
yes, the CPU graph was wrong !! :((( I found it out, when coding the graphs, I used following :
(part of perl script)
top b -n 1 | grep "load average" | awk '{printf substr(\$10,1,4)}'

which is taking the load avg, not the CPU, i thought these numbers are the same :(

I think it shloud be something like this :

top b -n 2 | grep "CPU states" | awk '{printf substr(\$3,1,4)}'

but this doesnt work good, because the "real" number is on line number 2, and I dont know how to access it (this gives me result like "0.540." which is 0.5% of first result and 40.3% of the second one mixed together). As I wrote, my perl knowledge is limited, I dont know how to fix it, so I will get the real CPU utilisation graph :(
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 10086664
Well yes, this particular snapshot shows 90.6% CPU usage. mysqld 21.7%, 3xhttpd 10.something%, TOP down at 4% - you can read the 3rd last number in each detail line as well as I can I am sure. Oh, it looks to me like there are extra  mysqld threads - that's good, they could run on another processor in a multi-cpu system.
There is *some* idle (9.2%), that *could* be disk i/o (is the disk light on much?).
You had to use top -n 5 to get this? Is that consistent (top -n 5 always gives this)? Maybe you just got a lucky busy snapshot.

Ok there is another, more sensitive tool than TOP which I use, but it just displays a bar in an X display so you'd have to watch it and make a value judgement.
The tool is xcpustate. It displays a small multicoloured bar - blue for idle, green for user CPU, yellow for nice'd  user CPU, red for system CPU. It updates every second but it's very lightweight - I can't generally see it in TOP even on my 96% idle system. I think itt generally comes in distros - I built my current copy from a source RPM - but just post if you have trouble finding a copy.

Do you have an X display on your server? If not you but you have the X libraries you can run it to a remote display. If no X libraries, you can do what I do on my router - run rstatd to provide data to a remotely running xcpustate. Again, post if you need assistance. (On my installation rstatd also shows in yellow what should be green, but that's not a big deal).

Once you have xcpustate running, glance at it when the system feels "slow". Does the blue part disappear frequently? That would indicate a CPU bottleneck. If the blue part *grows* during slow periods, it's not the CPU. Could be page thrashing (not enough RAM) (but your posted top o/p looks fine in that regard - swap disk usage is only 20% of RAM size and up to 50% is usually no problem); could be actual disk i/o; could be interrupt servicing for the N.I.C.'s.

Could be a whole lot of things *if* the blue area increases on system slowdown, but let's cross that bridge when we come to it.
0
 
LVL 2

Author Comment

by:Letus
ID: 10092187
Thank you ... but, at home I do not run linux, I just use it as server and there is no X, thats why my knowledge is so limited ... but I fixed the CPU graph, now it shows 5 minute average of the top -n 2 CPU states statistic :

http://www.valka.cz/cacti2/cpu.gif

I will add soon the other statistics (this is just the user, it needs to show all the infos, like system and nice usage) to see the full utilisation of the CPU.
But, I think we are on a good way, thanks to you, Duncan !
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 10092665
I'd suggest the most helpful number you could plot would be %IDLE. Then the total CPU usage is 100 minus that. That's what we really want to know.
0
 
LVL 2

Author Comment

by:Letus
ID: 10094427
CPU - User
http://www.valka.cz/cacti2/cpu.gif

CPU - System
http://www.valka.cz/cacti2/cpu_system.gif

CPU - Idle
http://www.valka.cz/cacti2/cpu_idle.gif

I start to measure it right now, so it will take a few minutes to start plotting the graph

But from user utilisation of the CPU it really seems the CPU is the slow part :(
0
 
LVL 34

Expert Comment

by:Duncan Roe
ID: 10098760
Nice CGI monitoring tool. You had 1/4 hr or so of zero idle around 14:00. Was the system down for 3/4 hr around 16:00?
I look forward to viewing a full day's graph tomorrow.
0
 
LVL 2

Author Comment

by:Letus
ID: 10102323
Oh yes, i tried to modify it (it runs as 5 minute averages, so I tried to run the measurement only each 5th minute, but it made that gap, measuring it every 2 minutes is better). And around 1600 the system was not running due to restart.

(If you want, I can share my .pl files to fill the rrd tables and to create the graphs ... based on RRD tool and SNMP)
0
 
LVL 34

Accepted Solution

by:
Duncan Roe earned 350 total points
ID: 10109978
Yes it would be interesring to see the .pl files - why not put them up on your website and post the link here.
Looking at yesterday's graph- you have zero idle for several 5-minute periods, so I think you would definitely benefit from more CPU power. I suggest 2x current power is a bare minimum, and 5x would last you a lot longer. You are probably looking at a new multi-CPU  motherboard for anything over the bare 2x.
0
 
LVL 2

Author Comment

by:Letus
ID: 10119481
Duncan, thanks for your help ... I think the problem is in the CPU, as you say, we are going to split the load into two servers, one for each website, both having two Opteron CPU's. Thanks for help, at the beginning, i thought the problem is absolutly different, you help me a lot.

Here are the files :
http://www.valka.cz/cacti2/pl/cpu.pl (user load)
http://www.valka.cz/cacti2/pl/cpu_system.pl (system load)
http://www.valka.cz/cacti2/pl/cpu_idle.pl (cpu idle)
http://www.valka.cz/cacti2/pl/disk.pl (generates three graphs for three devices (hda, hdc), showing the total size and occupied space)
http://www.valka.cz/cacti2/pl/mem.pl (showing on positive side the swap memory on the  disk, on negative side the real hardware memory, how much is used, and the total)
http://www.valka.cz/cacti2/pl/proc.pl (showing the number of currently running processes)
http://www.valka.cz/cacti2/pl/net_traf.pl (showing the network traffic in and out for one device)

all the scripts contains paths, where the script, include files and outcome is located, and where the graph should be generated. To use it, create a batch, which will run all these files in about 2 minute period, this will generate the numbers and redraw the graphs.

0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Daily system administration tasks often require administrators to connect remote systems. But allowing these remote systems to accept passwords makes these systems vulnerable to the risk of brute-force password guessing attacks. Furthermore there ar…
rdate is a Linux command and the network time protocol for immediate date and time setup from another machine. The clocks are synchronized by entering rdate with the -s switch (command without switch just checks the time but does not set anything). …
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now