?
Solved

Server very slow. CPU Usage reach 100% for sqlservr.exe

Posted on 2005-03-07
30
Medium Priority
?
24,538 Views
Last Modified: 2008-07-17
Originally we have a Dell PowerEdge 2600/1.8Ghz model with 1024K (512K x 2) RAM. It has 1 Xeon Processor 2400MHz/512K

Recently, we added another 1024K(512K x 2) of RAM to the server to increase it's performance.
 
But the server speed does not seem to improve. In fact users who access the server (they access through the browser. They use this application to retrieve data from the sql database in the server) complained that the server has slowed down. Before the addition of the RAM, when the users run the heaviest task which is the a Search task that retrieve data from mant different tables, the server will just go slower, but after adding the RAM, they complained that the server sometimes got so slow that they thought the server has hanged up when the run that search function. When 1 user is running the search function, other users has a hard time using that application because simple task become very slow.
 
I have checked the task manager and it seems that CPU usage always reach 100% when user is using the Search function (but sometime server will also get slow when this function is not use). The process cause this is sqlservr.exe.

Furthermore, java.exe may also go to 50% usage when Tomcat is starting but it will go back to 0% after that. As for the Memory usage, only like 20% of memory is shown being used when I try the Search function.
 
I try removing the new RAM to see if there will be any difference. But the outcome is still the same. The CPU usage still reach up to 100%.

Lastly, there are some other application who when tried in a normal PC (windows XP with only 512K RAM) works fine, but also slowed down when it was transfered to the server.

I have also updated the windows 2000 and server BIOS to the latest version.

Anyone have any suggestion only solving the problem?

My collegue has also mentioned that the performance of the sql database also depends on the type of harddisk. Can anyone confirm this?
0
Comment
Question by:minevra
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 13
  • 6
  • 2
  • +7
30 Comments
 
LVL 69

Expert Comment

by:Callandor
ID: 13476201
Your problem dosn't sound like a hardware problem, since you see no  change when you remove the RAM.  Fast SCSI hard disks can improve performance, but your 100% cpu utilization seems to indicate the problem is somewhere else, perhaps the software server configuration.  How many users are simultaneously hitting the database?

You can also post a 20-point pointer question in the SQL TA http://www.experts-exchange.com/Databases/Microsoft_SQL_Server/ to get some opinions from the software/configuration side.
0
 
LVL 7

Assisted Solution

by:crazijoe
crazijoe earned 60 total points
ID: 13477756
SQL is disk intensive and CPU intensive because of the constant access to the database. RAM usually isn't a problem because you are accessing different information all the time so alot of it is being swaped back and forth to the hard drive constantly. You might try and add a second CPU to the server. We have a Dell 6650 server with dual Xeon 1.8GHz, 2GB memory and 2 18GB SCSI drives running SQL. Our processors seem to be fine but our drives are being a big bottle neck because they are configured in a RAID 1 array.
You might trry MS AD sizer to see how your network stacks up.
http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=77C0A895-3DFC-469F-BE40-6A0EE594821C
0
 
LVL 9

Expert Comment

by:MrAruba
ID: 13478014
SQL is known in some cases of not releasing the amount of memory it isn't using. SQL will take as much free memory it can find to work but if wrongly configured it will keep that amount even if it's not needed.
Try allocating a fixed amount of ram for your sql.
You can do this through sql server enterprise manager - choose your server - right click on it and choose properties - click the memory tab and specify the amount you want to allocate. Reboot to make sure the changes take effect.
(never allocate less then 256 mb.)
0
Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

 
LVL 30

Expert Comment

by:IanTh
ID: 13479228
0
 
LVL 4

Expert Comment

by:divi2323
ID: 13479293
have you run the sql profiler to see which queries are holding up the server?  are these full text queries?  if so, is your search engine properly running?

you also might check for indexes.  if you're searching an unindexed field, table scans will occur and disk reads and cpu utilization will increase.  take your sql profiler data and trace the actual sql code being sent to the server.  then check those tables being accessed and ensure that those columns in the tables have been indexed.  indexing DRAMATICALLY increases read speeds in sql databases (with a few exceptions).
0
 
LVL 1

Author Comment

by:minevra
ID: 13483222
Not sure the exact number people who access the database simultanouesly, because they don't access in for the whole of offfice hour. But the around 15 PC where it's user accesss the database quite often.
0
 

Assisted Solution

by:mattdewey
mattdewey earned 60 total points
ID: 13483582
I would follow the recommendations at this site:
http://www.sql-server-performance.com/sql_server_performance_audit2.asp

Also using profiler you can store the data off to a table and look for the queries that have the highest number of reads, writes, cpu usage, and duration.  If you look at the queries with high cpu usage you might be able to fix that.  Take those queries put them in query analyzer, use control K and see if the statistics are invalid.  Maybe all you need to do is run a reindex and optimize statistics job on the database.  Do not run a complete reindex on the server while people are using it, as it can lock the tables.
0
 
LVL 2

Expert Comment

by:mingtze
ID: 13484468
This would be better suited in the database section as mentioned. Based on your specs and description of the scenario (I am assuming your 512mb winXP machine has the application but not the full database) I would think the query is just not optimized.

I would suggest grabbing the SQL queries being run by the application, post it up here or EE DB section as described by Callandor. Include any indexes, table schema, and table rows if possible for all tables involved in the query etc.
0
 
LVL 12

Expert Comment

by:GinEric
ID: 13513649
Attack the problem, not the results.

SQL should not have priority over the system, network, etc..  Lower it's priority by setting it in task.

Get Process Explorer from http://www.sysinternals.com/ and find out which process is eating the cpu time.

Find it first, then set its priority.

Our SQL Server, using Process Explorer, shows 17meg of memory, 0.250 seconds kernel time, 0.380 user time.  With Process Explorer, you can see all of this, all of the threads, and how much time each thread and process is using.

We had a similar problem with some form of DBRelDataBase or something spinning its wheels.  Something like kernel32.dll!Rtl something or other, under threads for svchost.exe, and if I remember correctly, it was the network service host, not the local service host.

There are viruses that attack these, and do so through MSSQL.  Check for them.

Also, printers under the Process Explorer seem to be always trying to mutex.

I think you'll find the Process Explorer will pinpoint the problem for you.
0
 
LVL 1

Author Comment

by:minevra
ID: 13597804
Just for update (before turn123 post a 'no response' warning message here :)). I'm currently trying to do the SQL Performance Audit. Have not got the result yet.
0
 
LVL 11

Expert Comment

by:TreyH
ID: 13638280
If you are running a RAID, you might check for a defective drive. One bad drive in a RAID 5 could cause such a problem.
0
 
LVL 1

Author Comment

by:minevra
ID: 13642528
We don't have RAID controller in our server. Just talk to a computer guy today. He pointed out that not having a RAID controller + not enough processor (we only have 1) might be the reason for the slowness too,
0
 
LVL 12

Expert Comment

by:GinEric
ID: 13688522
I don't think it is a database problem, not at 100% cpu.  It tends to look more like a Windows thread asymptotic critical point loop problem.  Usually, one of the threads is in a tight loop, but doing nothing.

Which is why I suggested Process Explorer; it will identify, exactly, which thread is using the cpu time.

This is a thread:

MSDTCPRX.dll!CConnectionManager::Create+0x2d0

from SQLSERVR.exe

One of the thrashers from service host looks something like:

ntdll.dll!Rtl - something, something about replication followed by a relative address

Like ntdll.dll!RtlDbAllocateObject+0xxx

The Process Explorer goes right to the problem.  Although logs will help, they don't exactly identify the threaded function that is using all the time.  This little program does and from there you can investigate as to why it is spinning its wheels.

By double-clicking on SQLSERV.exe in Process Explorer when it's running, and then selecting the third tab, you get the CPU Usage and the Memory as Private Bytes in two graphs.  The other tabs provide information, such as the Threads tab which shows all threads and cpu time activity updated for CSwitchDelta telling you when it's in the Execution Unit of the microprocessor.

I don't see how you can get better information than this.
0
 
LVL 1

Author Comment

by:minevra
ID: 13724375
I tested with Process Explorer. I found the thread which take most CPU usage is MSVCRT.DLL+0x83d3. There r a lot of thread with this same name. While using the said heavy duty Search function, there r 2 threads that take up a lot of CPU usage. Each take 40+% of CPU usage and that make the total ~100%. If doing other function, only 1 thread occupy 40+% usage. The others usually r single digit %. Can anyone tell me what's this thread r for or direct me to some refence page where I can look it up myself?

Thanks
0
 
LVL 11

Expert Comment

by:TreyH
ID: 13728730
msvcrt.dll is a code library for programs written in Visual C++.
0
 
LVL 12

Expert Comment

by:GinEric
ID: 13733241
MicroSoft C Run Time Library dynamic link library for some very basic stuff.

If you do a "Find" on msvcrt.dll you will get a list of all the programs calling it.  Amond them are, alg.exe, iexplore.exe cmd.exe notepad.exe , just about everything.

You need to look a little deeper by getting the properties of the threads and tell us the names of the actual callers that are using so much msvcrt.dll time.

The name of the thread will be as above, of the form:  msvcrt.dll!endthreadex+03a

We need the name of the thread.  You mentioned "the sql database in the server" so that in Process Explorer you would highlight SQLSERVR.EXE by by one left click, then right click for properties.  You should see a bunch of MSVCRT40.dll!beginthread+0x82 or something similar.  It may be 50 for XP or later versions for SQL.

Also, as you do the search, highlighting any item found will also highlight in the backscreen; right clicking that and properties will show all processes.  

0x83d3 is the "entry point" or a pointer to it.  With the proper debug symbols, from this site, you will see there are many threads:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore98/html/_crt__endthread.2c_._endthreadex.asp

in the lefthand menu.  You need to know the complete name of the thread to go further.
0
 
LVL 1

Author Comment

by:minevra
ID: 13742406
"We need the name of the thread.  You mentioned "the sql database in the server" so that in Process Explorer you would highlight SQLSERVR.EXE by by one left click, then right click for properties.  You should see a bunch of MSVCRT40.dll!beginthread+0x82 or something similar.  It may be 50 for XP or later versions for SQL."

That is what I did. I right click sqlservr.exe and go to it's properties. Then, I click the 'Threads' tab and all I see is lots in MSVCRT.DLL+0x83d2. I didn't see any thread that resembles ' MSVCRT40.dll!beginthread+0x82'. Did I go to the wrong place?
0
 
LVL 1

Author Comment

by:minevra
ID: 13899372
Hi, I have done the audit for SQL Server Hardware Bottlenecks. The 2 items who always exceeds limit whenever the server slow down are "Proccessor: % Processor Time" and "System: Processor Quene Length".
The recommendation for this is getting a faster CPU or more CPUs or getting CPU with a larger on-board L2 cache.

I ask the Dell tehnical support about this and he asked me to make sure whether our program and actually utilise the extra CPU or not. The program that used up the most CPU usage is sqlserv.exe followed by java.exe. So, what do u think?

The support guy also suggested enabling hypertrading. Has anyone try this before? Is it safe? Our server currently have only 1 CPU and without any RAID controller.
0
 
LVL 12

Expert Comment

by:GinEric
ID: 13900115
I suggested www.sysinternals.com Process Explorer above.  That is what Microsoft itself suggests now.  You need more than just the processor and cpu times, you need to know the exact name of the thread or process that is using all of the time.  These microthreads have names that are very descriptive of what the process is doing while it's off on a binge.

What difference is it going to make if you add hardware and don't fix the basic software problem?

This is how Microsoft, and others, troubleshoot their threading modules problems.  Usually, some process is banging away at an endless loop.  Java is a real goodie at this!  The Queue length means that lots of applications and services are backing up waiting for this runaway process to release the cpu time.  Faster cpu ain't gonna do it,that will just use even more time in a faster cpu.  I think he meant "hyperthreading" also.  Regardless, he is at a loss of what to do.

It won't hurt to install little Process Explorer and re-read above, then report back.  Usually, it's some thread or handle that is not even used or necessary, which therefore bangs away trying to get attention which the Operating System denies because it doesn't see the handle as necessary any longer, or, there is not answer for the handle to get.  There are actually three conditions in Binary Logic, True, False, and Undefined.  Undefined is infinite, as in "infinite loop."

There's a lot of database stuff in SQL and a lot of Internet and Network interaction in Java.  either of these can "go off" from time to time, spin their wheels, and accomplish absolutely nothing, with the exception of thrashing the computer system.

The symptoms are Classic.  At least try the "free" Process Explorer.
0
 
LVL 1

Author Comment

by:minevra
ID: 13923038
Dear  GinEric,

It's not that i never take your advice . The first suggestion I try is your suggestion (please read comment 04/06/2005 and 04/08/2005) . I have installed Process Explorer and run it everytime the server slowdown. The program is still in my server. It is only that I cannot find that specific thread u are talking about that make me try the SQL audit suggestion. The fact that I receive no answer when I report back on 04/08/2005 and asking for further advice  (on using the Process Explorer) make me to reach dead end and I have to try other option. I can't just kill MSVCRT.DLL+0x83d2 because it may affect the sql server greatly. I can't afford to make mistake like that because that program is important for my company operation.

Our database is increasing everyday. Is it not possible that 1 CPU is not enough for our operation?

P.S. Sorry if I sound rude and ungrateful but I can't contain my frustration when someone accuses me of doing things I didn't do or not doing things that I'm suppose to do. Thanks for still trying to help. But I did reach a dead end when it comes to Process Explorer.
0
 
LVL 1

Author Comment

by:minevra
ID: 13924439
"The Queue length means that lots of applications and services are backing up waiting for this runaway process to release the cpu time.  Faster cpu ain't gonna do it,that will just use even more time in a faster cpu."

Why would a faster processor use more time? What if the heavy threads is really needed?
0
 
LVL 12

Expert Comment

by:GinEric
ID: 13924563
It sure is possible that 1 CPU, even one server, is not enough!

It's okay, frustration is part of the job.  I don't want you to kill anything.

So, are you saying that it is exactly msvcrt.dll?   The Dynamic Link Library?

msvcrt.dll show no cpu time in any threads, and there are thousands of them, as this is a linking library [set of pointers to other code].

In Process Explorer, under SQLSERVER.EXE, doubleclick to get running process, what is top lefthand CPU?  That is the time the process is using.  There are at least 20

MSVCRT40.DLL!beginthread+0x82

Is this where you're looking, under SQLSERVER.EXE?

Even with Microfsoft SQL Enterprise Manager up an on screen, executing queries, it rarely goes above 4%

You're doing a lot with SQL if you're users are running browser queries, and I guess your also doing email through it.

I can only assume that you have "load balance" set on the server.

You should also find out about Provisioning and Quality of Service.

If one user is running some query, getting results or tables or whatever, and using something like BirTorrent, this will definitely affect all other users!  BitTorrent is not really such a good idea for real networks, nor is any "accelerator" that basically steals time from everyone else.

This is why most networks "provision" and limit bandwidth by users.

"When 1 user is running the search function, other users has a hard time using that application because simple task become very slow."

First, ask yourself, "who is this user?"  Why is one user "leadpiping" your server?  Is he "SETI at home" or something similar?  Who is really using your bandwidth?

If it's just one user, that is.

We all work in real networks that can't afford to go down.  We have to troubleshoot and fix "on the fly."  Understand, therefore, that I too have other responsibilities and sometimes it takes me time to get back to a problem, especially if working on our own.

Maybe it's time to get a Server that can handle it.

http://www.tyan.com/products/html/thunderk8we.html
http://www.tyan.com/products/html/opteron.html
http://www.tyan.com/products/html/thundergche.html

Suggested.

Even if you just take some old Asus and throw a server together, You should have no problem with the basic Windows 2000 5 Server Licenses Included.

Then, you can migrate MS SQL to a standalone, with backup servers, if you can add yet another server, and isolate the critical database with redundancy.  That, before your SQL server crashes, which it seems like it's headed for as it overheats that one cpu; something's gotta give eventually.

You're already running at critical mass; company needs to address this issue, now.

If you have 2,000 users, I can understand the load, but not something like 20 or 200.

This leads me to the "one user" conclusion, BitTorrent, etc..

The MSVCRT.DLL is probably a core component which I agree, you should not mess with, but below it, in it's threads, such as "begin," or qhatever, there should be shown excessive times, but again, not in msvcrt itself, as it is only a linker, but in what it is calling, i.e., a subthread, threading module, and so on.  There are thousands of DbConnect and other threads; it only takes one with an absent-minded priority to bottleneck a server or any machine.

MSVCRT40.DLL is just the Foundation Class 4.0, as it was in an NT 4.0 box.  Yours will be different for Server 2000, maybe 5.1 or something similar, unless they dropped the version, or, you are running a really old version, or, installed old or faulty software that has overwritten the proper .dll

PHP will also band away at your system.  A lot of programs will.  Beyond a limit, one cpu is definitely inadquate; shortly thereafter, one server is inadequate.  Microsoft SQL can be spread over more than one server, with improved results.

The databases can be "expanded," if necessary to increase the size, as far as I know.  Always back them up before doing so.

Check all your size, loads, balance, etc..  The run the Microsoft SQL Enterprise Manager and try the help file for Replication, slow, and so on, under the Find tab.  In the end, I think the solution is to re-evaluate your MS SQL configuration and setup, by reading the manual, and then deciding what to do, make more or newer server, adjust load balance of PDC and network, provision bandwidth to user and establish quotas, etc..

Tell me how many users you have, and I'll tell you if your one server can handle it.




0
 
LVL 1

Author Comment

by:minevra
ID: 13932559
"Tell me how many users you have, and I'll tell you if your one server can handle it."

Around 20 users give or take. The maximum connections I have recorded using Performance Monitor is 26. But number of connections can be as low as 5 when the server slow down as well.
0
 
LVL 1

Author Comment

by:minevra
ID: 13933585
"In Process Explorer, under SQLSERVER.EXE, doubleclick to get running process, what is top lefthand CPU?  That is the time the process is using.  There are at least 20

MSVCRT40.DLL!beginthread+0x82"

I did not see SQLSERVER.EXE only sqlservr.exe. so, I doubleclick that one. The value under 'CPU' changes very quickly. currently it range from 0.00+~47.00+. When CPU usage is 100% it's usuall the combination of 2 threads with 40+ value under 'CPU'.

I did not see any "MSVCRT40.DLL!beginthread+0x82" thread but I saw around 50 "MSVCRT.DLL+0x83d2" threads.

What is CSwitch Delta? I see that the threads who hog CPU usage has very high value for CSwitch Delta (from 1000+~3000+).
These thread does not hog CPU usage all the time, only when users r  doing certain queries (through the web interface out programmer make).

The tech support guy also say that it might be the RAMs not compatible since the slowness problem increase after installing the additional RAM (it was already very slow originally, that's why we add RAM) but even after removing the RAMs, the server is still as slow while CPU Usage reach 100%. (the tech guy seem very discouraging with the idea of adding or changing to faster CPU). The performance Bottleneck audit also show the RAM to be mostly ok. The first 2 RAMs are from Dell but the later 2 are from Kingston (dealer said Kingston already has the RAMs tested and the RAMs are specifically for PowerEgde 2650 and PowerEdge 2600).
0
 
LVL 1

Author Comment

by:minevra
ID: 13933881
Here's the server's spec:

Operating System   :
Windows 2000 Server Service Pack 4 (build 2195)

System Model  :
Windows 2000 Server Service Pack 4 (build 2195)   Dell Computer Corporation PowerEdge 2600

Processor  :
2.40 gigahertz Intel Pentium III (Xeon)
8 kilobyte primary memory cache
512 kilobyte secondary memory cache

Drives  :
72.78 Gigabytes Usable Hard Drive Capacity
53.62 Gigabytes Hard Drive Free Space

SAMSUNG CD-ROM SN-124
3.5" format removeable media [Floppy drive]

MAXTOR ATLASU320_36_SCA SCSI Disk Device (36.41 GB) -- drive 0
MAXTOR ATLASU320_36_SCA SCSI Disk Device (36.41 GB) -- drive 1


Memory Modules  :

2048 Megabytes Installed Memory

Slot 'DIMM_1A' has 512 MB
Slot 'DIMM_1B' has 512 MB
Slot 'DIMM_2A' has 512 MB
Slot 'DIMM_2B' has 512 MB

Controllers   :

Standard floppy disk controller
Primary IDE Channel [Controller]
Secondary IDE Channel [Controller]
Standard Dual Channel PCI IDE Controller


Bus Adapters   :
LSI Logic 1020/1030 Ultra320 SCSI Adapter
LSI Logic 1020/1030 Ultra320 SCSI Adapter
Standard Universal PCI to USB Host Controller

0
 
LVL 7

Expert Comment

by:crazijoe
ID: 13934441
My opinion is there isn't enough CPU.
0
 
LVL 12

Accepted Solution

by:
GinEric earned 90 total points
ID: 13938646
Problem Considerations:

The disks are small for your database, however, they will hold up for a year or two.

2048 MB RAM is top grade server configuration.  You should have the bandwidth to match it, which I don't see mentioned, at least T1 or DSL [basically the same thing anyway].

Server shows no Partnering Servers.

L1 Cache is small at 8KB; 64KB Associative Memory Cache would be preferred.
L2 Cache is okay at 512 MB.

Pentium III Xeon is dated.

Kingston is fine RAM and reliable.  However, there are known issues with Server 2000 and full 2 GB RAM, that is, slots full.  Some have reduced to 1 GB and have seen improvement; I suspect that is due to the Memory Bus having to adjust for the Most Significant Adddress bit or byte.  I realise, however, that you cannot take it off line, this is for future reference when you decide it's time to look for a server motherboard.  Secondly, no RAM is all that much guaranteed.

20 Users should be no problem at all.

Your server should handle this quite easily.

"The performance Bottleneck audit also show the RAM to be mostly ok"

Then it is your disk caching.  Pagefiles for multiple hard drives should be split across hard drives and should range from 1xRAM to 1.5XRAM.  Windows should not manage pagefile.  This is a long standing problem with Windows Operating Systems, overkill on pagefile sizes.  Resizing and redefining pagefiles often clears up disk caching problems.

Is this one IDE?:
"72.78 Gigabytes Usable Hard Drive Capacity
53.62 Gigabytes Hard Drive Free Space"

What are these?:
"MAXTOR ATLASU320_36_SCA SCSI Disk Device (36.41 GB) -- drive 0
MAXTOR ATLASU320_36_SCA SCSI Disk Device (36.41 GB) -- drive 1 "

at 40 GB each?

Too late to partition properly.  Remember for next server build; one gigantic partition is not well thought out, once lost, everything is lost; not so with better partition planning.

Faster cpu is really just a myth, as none of them run anywhere near that crystal oscillator frequency in the gigaHertz range.  This is a "marketing disinformation myth."  But recent 64-bit machines are actually faster.  But that, of course, means a whole new Server and it may be best to wait and see a little longer.  Microsoft is just starting on a 64-bit Operating System  and Intel and AMD have not got their hardware working right yet.  Maybe in two years these will be functional.

But Zeon's and Opteron's are definitely better performers.

"CSwitchDelta" purely from engineering math interpretation, the delta time it takes for C switching.  That is, the rate at which something changes its switching from one thing to another.  It's a slope of the tangent thingee, a rate, like a skew, most likely the context switches, which really only means how fast the cpu is switching contexts.

A similar problem to yours:

http://www.dotnet247.com/247reference/msgs/55/276840.aspx

Note that he too is having problems with MS SQL and cpu racing or thrashing, owing to, apparently, a core thread that will not let go of a context switch [no return from some microroutine].  This means the programmer wrote "Do Until" and "Until" never happens.  Without a trap, the micro spins its wheels.  For example, "Do <this> Until <interface disconnected>" with no trap before hand would cause runaway race.  More correctly it would be something like:
"Do <trap: cpu time excessive> <this> Until <interface disconnected>"  The first trap makes sure that it does not hog the cpu.  Also called a break point.

Server admins usually don't know a whole lot about this stuff.  Which is why engineers tell them to read some more and give more data about the problem.  From there it gets fed back to OEM's and their programmers and they are suggested to fix their code.  Now, the newer processors are starting to have this trap functionality built in, finally!  We hope the programmers "get it."
Some just don't.

Okay, so in the above article he is mentioning similar conditions, web MS SQL usage, .NET apps, and such.  As I said before, these place a heavy load on the server.

Two threads with 40+ values: okay, if you mean cpu, there is your hog.  You seem to confirm with "through the web interface out programmer make" is somebody compiling code there?  That is intensive!  

"MSVCRT40.DLL!beginthread+0x82" from its name, it begins a new thread.  Threads are started for each subtask in a process.  It only opens more cans of worms trying to define it here.  I will note that 0x82 is a Bug Check is stated as appearing "very infrequently."

Idea of Contexts:
Session Context: user session
Process Context: virtual address to physical device - deals with pagefiles, etc..
Register Context: thread owns register values - can also get from other threads
Local Context: local as in "on stack" - no outside .frames - reset by tilde's "~"

Okay, what does all of that mean?   You're in a virtual, or .NET, environment.  You have multiple users and multiple logins.  Everyone is executing at the same time.  But this is not really how a computer works.  What it actually does is switch the users very fast, so it seems simultaneous, at least in theory.

What goes wrong?  Either one user overrides the settings of contexts, or the number of threads is too many for the switching rate to handle, giving, say, the cpu 20 microseconds on each thread, which is not enough, and thereby just switching between threads without really accomplishing anything.  It's like, you want to add A and B, but this takes 30 microseconds, which cpu time your are never given.  What happens?  You try, forever, to add A + B, but you never can because you don't have sufficient cpu time in one pass to accomplish even this simple task.  This is called thrashing.

Priority has caused your process to thrash.  It is not adjusting properly to the time frames needed or the environment [number of users and threads].  You call in load balancing to try and adjust, but that doesn't work either because there is a more basic problem: your time-splice is too small.  Suppose you gave each of your users 30 microseconds to connect, auth, and ack; do you think anyone could login in such a short time?  Similarly, the cpu itself couldn't do it in 30 microseconds.  So, 20 users computers keep trying to talk to a cpu that has programmatically been given inadequate "per-thread" or "per-user" time.  With the exception maybe of one user who is causing all of this with super-priority.  That user could be NT Authority itself.

You have 20 threads in the process.  Standard limit for Windows threading.  Nearly the same as 20 users.  However, you also have 26 logged in!  So 6 are banging away at trying to "get in."  You see the problem, right?  Windows allows 20 users, but does not account for the 6 NT Authority Processes that are running as User NT Authority.  What you needed were more than 26 threads.

But NT, MS SQL, load balancing, won't let you have more than 20 threads.  So now what do you do?

Basically, what you want to do is try to increase the number of max threads and heap size; I know you probably don't know what I'm talking about, but basically what I'm saying is that the number of max logins needs to be greater than the number of actual logins at any point in time, and you must account for all programs and services that are running under a User guise.  And it should be at least ten more than the max, if not double the max [for small networks only; 2,000 users would not double to 4,000 simultaneous, that is, but 20 should have no problem doubling to 40].

I know there are ways to modify this, as in MS SQL Enterprise Manager and service configuration, as well as similar items in .NET and other applications, including the PDC's and BDC's logins and connections themselves.  These are found in the Resource Kit, Books Online, and other documentation.  Right now, it simply will not come to mind as to which book it is in.

Okay, so how to fix?  As I suggested, and if you're really doing programming there, you need more servers and faster cpu's, better cpu's and motherboards.  .NET does have this drawback, it requires scads of cpu time, in other words, it requires more cpu's and servers.

Normally, programming, as in compiling, is dedicated to standalone servers as well, and not to the main domain or network server, particularly not the PDC.  Or, programming is compiled at the network nodes [users and programmers] and not the domain server.

Transactions:

Businesses that do anything from banking to programming, and all things in between, such as sales, need dedicated servers and departments.  In your basic Microsoft "How to plan a network," Terraflora can't even send flowers with only one server.  It uses about five, but your next step would be two.

I obviously don't know much about your business or company, so I can't second guess as to what is causing the problem.  However, I'm giving it a good shot at logical expert shooting by extrapolation.

So far, we are down to 2 threads beating up on the cpu with 40+ % cpu time each, you are sure that is correct?

In that case, they seem to be fighting it out, locked in a battle for processor time, excluding all others from their "mighty priority" and beating each other up with no true victor.

Again with Process Explorer, doubleclick the 40+ thread, either one, and use ALT+PrintScrn to capture an image of what appears, the stack for that thread and process.  Paste it to some Viewer, like LView Pro, and save it.  There will be numbers and similar threads also for this stack, usually 4 or 5.  

I need the exact name of those two threads, and the exact names of the ones captured in images.  From there, I can narrow it down to perhaps the exact cause,which should give a good indication of how to fix it.

Lastly, when done, I will send it off to Microsoft.

0
 
LVL 1

Author Comment

by:minevra
ID: 13951689
Hi GinEric,

Here is the screen captures of the Process Explorer results.

When sqlservr.exe is occupying ~100% of the CPU usage:

http://www.promserv.com.my/pics/CPU100.jpg (note: the CPU usage in this image show 34.31% because I capture the screen at the same moment the CPU usage gone down. The sqlservr.exe properties windows has not change yet at that time. Thus the inconsistancy. )

http://www.promserv.com.my/pics/CPU98-16.jpg

When the sqlservr.exe is not occupying ~50% of the CPU usage:

http://www.promserv.com.my/pics/CPU77-78.jpg





0
 
LVL 1

Author Comment

by:minevra
ID: 14158771
I did not get the exact solution to my problem, but I do not want this question deleted since it contain valuable information.

I've decided to upgrade some of our server's hardware (i.e add CPU etc).

So, I'll split to the points to the experts who's input help me in making my decision.
0

Featured Post

Bringing Advanced Authentication to the SMB Market

WatchGuard announces the acquisition of advanced authentication provider, Datablink, with one mission – to bring secure authentication to SMB, mid-market, and distributed enterprises with a cloud-based solution, ideal for resale via their established channel & MSSP community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Every server (virtual or physical) needs a console: and the console can be provided through hardware directly connected, software for remote connections, local connections, through a KVM, etc. This document explains the different types of consol…
Monitor input from a computer is usually nothing special.  In this instance it prevented anyone from using the computer.  This was a preconfiguration that didn't work.
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question