[Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Fedora Linux problem caused by avc and/or dmidecode?

Posted on 2009-02-23
7
Medium Priority
?
592 Views
Last Modified: 2013-12-15
We have this strange problem in which our main application (the sole purpose for us running Linux) has mysterious outages.
During the main loop of this application there are frequent instances of the loop hanging up for around 380 to 390 msec.

Ordinarily, this loop takes about 5 or 6 msec to execute.

We aren't absolutely CERTAIN yet, but it seems that the part of the system using up this CPU time is related to a TCP/IP socket "select server" we implemented about three years ago, using pselect(). I recall that when we originally did this installation we were running a RH Linux 9 (2.4 kernel) and we really had to jump through a lot of hoops to dance around a select() problem involving SIGHUP signals when sockets disconnected. We put in place some hacks to get around this problem. I don't suggest this is part of the problem now, but we have for the past 1.5 years been running this version:

Linux localhost.localdomain 2.6.20-1.2320.fc5 #1 Tue Jun 12 18:50:38 EDT 2007 i686 i686 i386 GNU/Linux
with a FC5 distro.

The error message I find routinely in the dmesg log looks like this:

Feb 22 21:00:55 localhost kernel: audit(1235336455.988:348): avc:  denied  { read write } for  pid=16351 comm="dmidecode" name="[7282857]" dev=sockfs ino=7282857 scontext=root:system_r:dmidecode_t:s0-s0:c0.c255 tcontext=system_u:system_r:unconfined_t:s0-s0:c0.c255 tclass=unix_stream_socket

... and it seems to correlate very strongly with the occurrence of these 380 msec outages.

We aren't intentionally using any security features of of SELinux. Our system is essentially a well isolated embedded system that communicates with a network of real-time devices over a loose, non-real-time CANBUS network for data acquisition and slow (non-real-time) feed-forward control.

Even though we don't expect real-time performance, these 380 msec outages are causing havoc for our CANBUS driver.

What process or daemon is likely to have spawned this message? Can I just turn it off?
Our system runs nicely whenever we aren't getting this message and faults whenever we encounter a sequence of these messages.

The outages seem to recur on a roughly 12 hour cycle.

0
Comment
Question by:fklein23
  • 5
  • 2
7 Comments
 
LVL 19

Assisted Solution

by:jools
jools earned 400 total points
ID: 23727167
Is SELinux enabled? Use `getenforce` to find out.

If SELinux is enabled you can `setenforce 0` to make it permissive, this should still show warnings but you will know if SELinux is the cause which it seems to be from the error above.

Check the /var/log/audit log for some more detailed error messages and post back here.
0
 

Author Comment

by:fklein23
ID: 23728512
Thanks - I was beginning to think no one would respond!!!

There is no /var/log/audit on my system.

Here's the closext I could come:

# find -name audit
./usr/share/logwatch/scripts/services/audit
./usr/src/kernels/2.6.20-1.2320.fc5-i686/include/config/audit


My system was enforcing.
I set it to permissive. We'll see what happens overnight!

Thanks - Frank
0
 

Author Comment

by:fklein23
ID: 23728537
Thanks - I was beginning to think no one would respond!!!

There is no /var/log/audit on my system.

Here's the closext I could come:

# find -name audit
./usr/share/logwatch/scripts/services/audit
./usr/src/kernels/2.6.20-1.2320.fc5-i686/include/config/audit


My system was enforcing.
I set it to permissive. We'll see what happens overnight!

Thanks - Frank
0
Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

 

Author Comment

by:fklein23
ID: 23821134
Well!!!

It has been a long road to get here, but this is the result:

I tried playing with SELinux enforcement, changing the priorities of my application process and a dozen other approaches and nothing mad a significant difference.

I went through the services we had installed in our Linux system and went through it removing absolutely everything that wasn't clearly needed by the system. Nothing made any difference.

Each main loop of the app should take about 5 msec.
What was really disturbing is that at seemingly random times we saw over 380 msec of time being consumed by the app. We also discovered that at 4 PM and 4 AM clumps of such occurrences were happening.


I planted log messages at critical points in the application code. These messages printed out elapsed time since the last message.

What I found was that pselect() delays for more than 375 msec every time a new ssh conection is opened.

Our solution is going to have to be that we put the pselect() service loop in a separate thread.
The original application had the pselect() service loop in-line in the main loop of the application. But these delays then caused the functionalty of our whole application stop for multiples of 375 msec at random times.

I'd really like to know why this delay happens. Has anyone out there ever heard of this before?
0
 

Accepted Solution

by:
fklein23 earned 0 total points
ID: 23909262
OK, I seem to have corrected the problem.

Thanks to jools for the initial response. Your input was helpful, but did not solve the problem. I am awarding 100 of the points to you and closing the question.

I haven't yet been able to find conclulsive evidence of this, but I suspect the problem is with pselect() not being in the kernel. I thought I read somewhere that Fedora Core 6 added it to the kernel, but I can't find the webpage where I originally read that.

We just brought CentOS 5 up on one of our lab boxes to evaluate it as a replacement for Fedora and discovered that in CentOS the problem simply went away.

Now I find that when new SSH connections come on-line, the main loop is extended by no more tha 10 or 15 ms rather than 350+ ms. A significant improvement!

0
 
LVL 19

Expert Comment

by:jools
ID: 23909310
Thx,

I've always found Centos to be solid.
0
 

Author Comment

by:fklein23
ID: 23909401
That's what I've heard too. Hence why we chose it!
Thanks - Frank
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Setting up Secure Ubuntu server on VMware 1.      Insert the Ubuntu Server distribution CD or attach the ISO of the CD which is in the “Datastore”. Note that it is important to install the x64 edition on servers, not the X86 editions. 2.      Power on th…
I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
Suggested Courses
Course of the Month20 days, 10 hours left to enroll

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question