Oracle taking 50% CPU constantly

Question for the oracle experts:

We have a VM running oracle 11g (I'm not sure which version) that has 1 vCPU with 4 cores. It has 2 DBs and 2 oracle.exe processes which a few minutes after restart each start start taking 50% of one of the cores. The other 2 other cores are left unused. That makes a total usage of 25% (2 x 50% of 2 cores on a total of 4 cores).

We don't know what is generating the CPU usage and why it's a constant. In any case it is a flat line we see all the time once it starts. But we do know that also when the machine is completely network disconnected we see the same thing happening. It only stops after you stop the service and restart it. But after a few minutes it starts again.

Question is why is it doing this what is the easiest way to figure out what is causing this constant 50% CPU load on 2 cores.

It has to be something internally on the Oracle since it happens even if the machine is completely isolated.

Thanks
J ZSysAdminAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

slightwv (䄆 Netminder) Commented:
Check for any running database jobs:
select job_name,state from dba_scheduler_jobs;

Also check for the older jobs:
select job, this_date from dba_jobs;

That would explain why it starts back up after you bounce the database.  The job scheduler knows it hadn't finished a job so it will restart it.
0
J ZSysAdminAuthor Commented:
OK, thank you. It seems to me that no jobs are running.

select job_name,state from dba_scheduler_jobs;

SM$CLEAN_AUTO_SPLIT_MERGE	SCHEDULED
RSE$CLEAN_RECOVERABLE_SCRIPT	SCHEDULED
FGR$AUTOPURGE_JOB	DISABLED
BSLN_MAINTAIN_STATS_JOB	SCHEDULED
DRA_REEVALUATE_OPEN_FAILURES	SCHEDULED
HM_CREATE_OFFLINE_DICTIONARY	DISABLED
ORA$AUTOTASK_CLEAN	SCHEDULED
FILE_WATCHER	DISABLED
PURGE_LOG	SCHEDULED
MGMT_STATS_CONFIG_JOB	SCHEDULED
MGMT_CONFIG_JOB	SCHEDULED

Open in new window


select job, this_date from dba_jobs;

1	(null)
2	(null)

Open in new window


Any other ideas?

I want to start the ADDM. But I can't find how to open it. It says the Database Homepage. But that doesn't seem to run.
0
slightwv (䄆 Netminder) Commented:
I believe ADDM requires an additional license.  Make sure you have the license before you run it.

If you can get into OEM Express you should be able to see what sessions are running what.
0
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

schwertnerCommented:
Check carefully if this is not Oracle 12.1.0.2
This version of Oracle has a generic defect and you have to turn off defect functionalities.

Check the current values of the  following parameters:

OPTIMIZER_DYNAMIC_SAMPLING
DYNAMIC_SAMPLING

Try to set the existing parameters to 0 and check the processor load.
0
J ZSysAdminAuthor Commented:
I'm positive that it is version 11g.

Anyone any more proposals?

An Oracle expert to take a look is also welcome...
0
slightwv (䄆 Netminder) Commented:
11g has DBConsole instead of OEM Express.  It is a browser based management GUI.  It would be the easiest way to help track down the session using the CPU and what it is doing.

If you have to do things via SQL, it is doable, just not as pretty.

There are many queries out there to locate sessions and CPU usage:
http://www.oracle-wiki.net/startsqlshowcpuusagebysess
0
schwertnerCommented:
Oracle has 2 very aggressive processes playing important role in RAC and ASM installations.
One of them is VKTM.
His activity can be suppressed.
It is relatively easy to understand that the process is extremely active:

%  top

If VKTM is often on the top of the Oracle processes, you can suppress this (look in the net or let me know, I am in vacation but will see details in my USB disk).

Do not underestimate the problem with DYNAMIC_SAMPLING.
Oracle fails to introduce this new feature and now there is a Note in MOS to suppress the feature because of the bad performance.

So look the value of these parameters and give a try. They are dynamic and reversible. No danger!
I have succeeded to solve the performance problems of the biggest Health Insurance Company in Germany doing this
0
J ZSysAdminAuthor Commented:
OK, It is a Windows installation but in task manager the processes that are taking the constant CPU load of 50% each on one of the cores is just oracle.exe. Other processing are doing nothing. Does you explanation then still apply?
0
schwertnerCommented:
In most cases the Windows Oracle problems came from the small amount of the free memory.
This causes swapping and concurrency for the rest of the available RAM space.
Normally the Windows Server installations of Oracle need 50% free RAM.
Investigate the memory load and look the advices in the Net.

From other hand the DYNAMIC_SAMPLING problem is generic in Oracle 12c and I suppose in 11g versions.
This is an unsuccessful innovation of Oracle.
There is a special Note in MOS asking for immediate turning off of this feature.
The symptoms are not only 100% CPU load but also very slow performance.

So it doesn't hurt to try also this variant.
In 12c the change of the parameters doesn't harm, are reversible and give immediate effect.

But if you found a parameter change proposal like:

fix:nnnnnn:off

be aware that this change can bring you problems.
To reverse this parameter you will need restart of the instance and possible creation of pfile and SPFILE after that to get rid from the parameter.
Also it can bring you problems by upgrade to 12c - the instance can not startup with a non understandable message


If you need advice how to mitigate the VKTM process let me know.

So you have 3 directions for work: RAM size, VKTM process and DYNAMIC_SAMPLE parameter(s).
0
slightwv (䄆 Netminder) Commented:
Use DBConsole or run the SQL and look at the session.
0
slightwv (䄆 Netminder) Commented:
From another recent question, might it be DBConole actually causing the issue?

It will require tracking down the session and seeing what it is running but it might be related:
https://www.experts-exchange.com/questions/29087982/Oracle-Server-in-CPU-loop.html
0
J ZSysAdminAuthor Commented:
I appreciate all the help. But we are looking in the wrong direction I think.

The reason that makes me believe that is that the server had exactly the same config before they upgraded their code and made change to the database. Only then it started.

The strange thing I see is that on 2 database we see unusual high portions of CPU for certain queries that it should be doing when querying the TOP SQL calls by CPU. Seeing that looks like t a loop in the code to me. The problem is how do I proof that to the developers? :-)
0
slightwv (䄆 Netminder) Commented:
Capture the stats and show them.

If it is a loop in PL/SQL then capture the statement and show them the stats.

If it is a loop in the code and it is executing a statement over and over and over, show them the number of executions.

Best evidence:
start from a known point and show them no cpu usage.  Then have them run their code and show them the high usage.  If they stop their code and cpu goes back to normal, it mush be their code?
0
J ZSysAdminAuthor Commented:
OK, I took all the screenshots and queries in a PDF. But it seems I forgot to upload them. So here's a second attempt...

I hope these queries and screenshots help a bit. Mind that it's 2 databases which both are showing a loop or bug in the code according to me.
Top-SQL-queries-for-Experts-Exchange.pdf
0
J ZSysAdminAuthor Commented:
additional problem is that it's not in-house developed software. It's a package we buy from fuji. So I don't have a view on the code. But the problem started ever since they made changes in November.
0
slightwv (䄆 Netminder) Commented:
Well, if you cannot control the developers then you have an uphill battle to climb.

I would look at the top usage in those reports.  There is the one with 400 million executions.  Is that "normal" or "expected"?

Then there is the pl/sql block MeasuredDataDelete.DeleteData.  Low executions but high disk reads and cpu.

Sorry but there is no magic wand when it comes to tuning.
0
J ZSysAdminAuthor Commented:
I's like to thank everyone for the help. But the suggestions didn't solve my problem because (1) it's an application we didn't write ourselves so we have no control over the code and (2) I'm not familiar enough with Oracle to use all the tips you gave me.

We decided we will be opening a case with the software vendor of the package that is querying the Oracle and see what they can come up with.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
J ZSysAdminAuthor Commented:
I chose my own solution since it's not really solved and that is the solution people will see as the final answer when they search for this problem. Thanks again to everyone
0
J ZSysAdminAuthor Commented:
I explained it in my last answer. The problem was not solved.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.