Avatar of sunhux
sunhux
 asked on

Need Shell script to parse thru suitable filesystems/files/folders for AV scanning

https://www.experts-exchange.com/questions/28702372/Help-troubleshoot-a-Shell-script.html
Refer to above EE post that I've raised:


I'm required to use a command-line AV scanner for hundreds of Solaris 10 x86 servers
but faced a few issues / limitations :

I'm trying to merge 2 options of scanning so as to get the best of both worlds ie
1. not missing any files/folders which may potentially get infected
    (on one Solaris server, we may have  /app1 folder but on another server, may
     have /app2 as the servers belong to our tenants.  Also, if a folder/filesystem
     were to be created in future, the script must include it in the scan)

2. if I use a script with   "avscan `find /* -print -type f ...`", it will cover all files
    but there's a major issue with this AV scanner if a list of files are onpassed
    to it for scanning, this AV scan will read the huge signature pattern file
    (about 70MB), then scan a file, output a banner page to logfile & kept
    repeating it for each file & this is super-inefficient: it takes more than 30
    hours to scan a server with only 70000 files.
    if we just do "avscan /folder_name", even though the folder contains 9000
    files, the scanner only read the pattern file once & output the banner to
    logfile only once (instead of 9000 times if we onpass the list of files in the
    folder to it)

3. A few folders should not be scanned as they contains Fifo & socket files
    & could cause the scanner to run into endless loop.  The folders I've
    identified in our tenants VMs are mostly in the following folders:
    ^/cdrom
    ^/boot
    ^/platf
    ^/proc
    ^/sys
    ^/dev
    ^/net


So I'll need a script that will do the following:

a) parse thru /   (perhaps `ls -lad /* |grep dr` ?)  & do a grep -v of the above
     7 folders to exclude them from the list : let's call it list1

b) then identify which folders/filesystems contain socket & Fifo files, possibly do
     something like:
      `find /each_folder_in_list1 -type p` >> Pipeslist
      `find /each_folder_in_list1 -type s` >> Socketlist

c) for folders that are not among the 7 listed above AND not found in both Pipeslist & Socketlist,
    scan the folders  ie     "avscan -s /list_of_folders_not_among_the_7_and_not_in_Pipelist_&_Socketlist

d) scan last the folders listed in Pipelist & Socketlist but exclude Fifo & Socket files in them, something like:
     avscan /list_of_folders_in_Pipelist_&_Socketlist |grep -v list_of_fifo_or_Socket_files.
     if you can finetune the script/code futher as follows, it will be good:
       eg: if /usr    has /usr/1, /usr/11, /usr/1/1/1, /usr1/1/2, /usr/2, /usr/2/2
              & a problem socket is found under /usr/1/1/mysock, then the
              scanner should scan as many files/folders in /usr other than those
              found to contain the offending Fifo & socket files.  For efficiency,
              we'll need a few scan commands, eg:
              Parse thru the list of folders under /usr, filtering out those folders
              found to contain the sockets & Fifo  & output to list5 & 
                eg:
                                 avscan -s /usr/1
                                 avscan -s /usr/11
                                 avscan -s /usr/1/1/2
                                 avscan -s /usr/2/2
                           & only for the subfolder containing the offending file (ie socket)
                                 avscan `find /usr/1/1/* |grep -v mysock`


I'm on Solaris 10 x86
Shell ScriptingScripting LanguagesUnix OS

Avatar of undefined
Last Comment
skullnobrains

8/22/2022 - Mon
sunhux

ASKER
Typo correction:  should read as follows:
 if we just do "avscan -s /folder_name", even though the folder contains 9000
simon3270

You could do this in two phases, firstly the diretories which only contain files you want to process, then individual files in directories whcih contain "bad" filenames.  Assuming that the patterns for the files to miss are in the grepv.txt file in the current directory:

    # find files in all diretories, keep the ones which match the "bad" pattens, strip off the filename part (after the last "/") to get the directories, and strip out duplicate directories.
    find /* -type f | /usr/xpg4/bin/grep -f grepv.txt | sed 's,/[^/]*$,,' | sort -u > baddirs.lst

    # Now put "^" and "$" around each line, to match exact directory names
    sed -e 's,^,^,' -e 's,$,$,' > baddirs.pat

    # Find all of the directories which *don't* match the bad ones, and process each directory.
    find /* -type d | grep -v -f baddirs.pat | while read dirnam; do
         Now process the $dirnam directory
    done

   # Loop round each directory in the "bad" list.
   # For each one, list the files in the directory, add the directory name to the beginning of the file name (to get the full path), strip out the "bad" names, and process any remaining files individually.
   # Note that the "grep -v" may remove all of the names in that directory, particularly if the "bad" part is the directory name - that will simply mean that no files in that directory are processed.
    while read dirnam; do
        ls $dirnam| sed "s,^,${dirnam}/," | /usr/xpg4/bin/grep -v -f grepv.txt | while read filename; do
            Now process the single file that doesn't match the patterns
        done
     done < baddirs.lst

Open in new window

sunhux

ASKER
Thanks Simon.

Was anticipating in the script that there will be 2 places to do the AV scan (ie process) the scans:
one is for directories that I know will never contain 'problem' files like Fifo & sockets & the other
is what's listed in your script ie :
   ls $dirnam| sed "s,^,${dirnam}/," | /usr/xpg4/bin/grep -v -f grepv.txt | while read filename; do
  "Now process the single file that doesn't match the patterns"

So basically there's 2 list of directories, one which are known to be good (ie dont contain any sockets
& Fifo/named pipes) which will cause the AV scan to go into endless loops   while the other list has
all the other directories (ie exclude directories that I have manually screened through to be safe but
include  directories that we can't predict what our tenants will create in future) containing sockets &
Fifo files which we'll track down in advance by `find /dir_name -type s` &  `find /dir_name -type p`
(for adding into 'grep -v' of the AV scan).

Or for the 1st list of known 'good' directories, I just need the scan commands into your script?


I've used a centralized management tool to screen the 500+ Solaris servers & found that there's
consistently a list of directories (eg: /etc, /opt) which doesn't have sockets & Fifo files.  

For directories which contain Oracle, MySQL etc (they don't cause the AV Scan to go into endless
loops) but they should not be scanned because the AV scan may potentially cause locking and
severe performance issues, are they are filtered out in the 2nd list?
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
skullnobrains

you can also try something like this

find / -xdev -type f -not -path /dev [ ... other exclusions ] | xargs -n 10000 sh avscript.sh

with avscript containing

#!/bin/sh

TMPDIR=/tmp/avscript

i=0

rm -r $TMPDIR/*

for f in "$@"
do
  # link the files into a tmpdir
  ln $f $TMPDIR/"`echo $f | tr / %`"
done

avscan $TMPDIR

[ ... do whatever you need based on the scan results ]


obviously you'll want to be able to stop/pause/resume so this is a little simplistic
simon3270

My code will do the name matching (scan entire directories if none of the bad names appear in them, and step through the remaining directories, skipping over the bad names). but won't do the fifo or socket check.

As far as I can tell, it will omit all oracle and mysql directories.

@skullnobrains - interesting idea (gather all of the "good" files into one directory and scan them there), but the target directory needs to be on the same partition as the files you are gathering.
sunhux

ASKER
I'll only get to access our Solaris servers this Wed as it's public holiday here.

Essentially, for most efficient scan, we'll only use "find" to feed files to the
AV scanner for directories or subdirectories that contain the socket, fifo,
Oracle, MySQL files.

Eg: for /var/tmp, it contains Fifo files in our environment (as sieved out by
`find /var/* -type p` in advance, so /var/other_directories (eg: /var/adm) are
still scanned by  "avscan -s /var/other_directories" while for /var/tmp, it's
scanned by "avscan `find /var/tmp/* -type f -print |grep -v racle| grep -v ysql".
Just to reconfirm the scripts given are in this direction
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
skullnobrains

but the target directory needs to be on the same partition as the files you are gathering

if you are using hardlink as in my example, yes. is that a problem ? if so, you can use symlinks all the same. btw, files are not "gathered", just linked so the actual data is not copied which would be much slower.

unless i'm missing something, this is much less overkill : i believe hardlinking a file and removing the link later on to be neglectible compared to the scanner's work. my example scans 10k files at a time but you probably can use anything between 1k and 10M, or possibly replace the xarg with a small awk script if you need to scan a volume of data at a time rather than a number of files.

with very little modifications, you also should be able to run multiple parallel scans : just add a -P argument to xargs and make sure the children scripts work in separate directories by using something like TMPDIR=/tmp/avscript.$$ for example

the downside of this method, is that the av scanner will not be able to actually remove infected files. some av scanners probably can scan symlinks and remove their targets when they detect an infection.

---

btw which scanner is dumb enough to reread the virus definitions between each args ? are you sure there is no dummy wrapper script that actually calls a separate instance of the scanner for each argument ? no daemon scanner that needs to be launched ?
simon3270

Hard links aren't a problem - you could easily make the location dependent on the partition you are scanning.  You'd also have to scan all relevant partititons, but that should be a simple loop over a small set of directories.  They are being "gathered" in that references to them are being gathered in a single directory, so that the limitation of the scanner is worked round - copying them would indeeed be wasteful (and difficult if the files took more space than was available on the target partition!)
sunhux

ASKER
Simon,

So the line below are the folders that contain those socket, Fifo files?
  ls $dirnam| sed "s,^,${dirnam}/," | /usr/xpg4/bin/grep -v -f grepv.txt | while read filename; do


I've scanned through hundreds of the Solaris x86 servers & confirmed that only the following
folders don't contain sockets, Fifo files :
/usr
/etc
/Desktop
/Documents
/kernel
/lib

So the script given above will loop around folders other than the above to scan by file list
while for the above folders, they will scan by the folder name (using "-s /folder_name"
scans much much faster)
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
sunhux

ASKER
and I'll need to create grepv.txt  in that same folder in advance, right?

grepv.txt
=======
\.sock
^/cdrom
^/boot
^/platf
^/proc
mnttab
^/sys
^/dev
^/net
\.dbf
\.arc
\.lock
\.ctl
\.rdo
[Oo]racle
[Mm]ysql
sulog
syslog
messages
sunhux

ASKER
Think I figured it:  the "Now process the ..." line below covers the 6 good folders I mentioned above

 # Find all of the directories which *don't* match the bad ones, and process each directory.
    find /* -type d | grep -v -f baddirs.pat | while read dirnam; do
         Now process the $dirnam directory
    done
sunhux

ASKER
The script fails without any error & I suspect baddirs.lst was missing in the line below:

    # Now put "^" and "$" around each line, to match exact directory names
    sed -e 's,^,^,' -e 's,$,$,'  > baddirs.pat
                                           ^
                                           |
                                      baddirs.lst

-rwxr-xr-x   1 root     root      199537 Aug 13 20:50 baddirs.lst
-rwxr-xr-x   1 root     root           0 Aug 13 20:54 baddirs.pat
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
sunhux

ASKER
Another error surfaced:

grep: illegal option -- f
Usage: grep -hblcnsviw pattern file . . .
/proc/2210/lwp/382914: No such file or directory
/proc/2210/lwp/382915: No such file or directory
/proc/2210/lwp/382916: No such file or directory
/proc/6947: No such file or directory
/proc/6947/lwp/1: No such file or directory
/proc/6947/object: No such file or directory
/proc/6949: No such file or directory
/proc/6949/lwp/1: No such file or directory
/proc/6949/object: No such file or directory
/proc/6967: No such file or directory
simon3270

/usr/xpg4/bin/grep

and yes, you need baddirs.lst in that sed line.
sunhux

ASKER
There's some improvements in the scan timings : from 10-12 days to about 3 to 5 days.

Just found the script scans some device files despite the "-type f" :
Scanning /dev/dsk/c0t0d0s3-> <<ERROR (-94)>>

# file /dev/dsk/c0t0d0s3
/dev/dsk/c0t0d0s3:      block special (30/67)
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
sunhux

ASKER
Appears that there's too many subfolders : still a bit fragmented:
attached is the actual script

Most appreciate if it can be further enhanced
g3.zip
sunhux

ASKER
Getting dozens of folders that are without files to be scanned & the scanner repeatedly
report "no files to be scanned" ie:

0 files have been checked.
 No file-type viruses found.
LibraryPath: ./lib...


Possibly will need a new algorithm/strategy : maintain a list of good folders (ie
which does not contain  sockets, Fifo  &  files to be skipped like Oracle, Mysql)
& in the next scan, check against this list if any new folders/filesystems has
been added .....
sunhux

ASKER
Currently using a centralized management tool, I've so far identified the
following folders to be good (& they are practically always present in all
the hundreds of servers we are scanning):

/Desktop
/Documents
/etc
/home
/kernel
/lib
/usr

Then only subject all the other filesystems to the sieving process to identify if
they contain sockets, Fifos, Oracle, Mysql, .....   ie the above 7 folders are not
subject to the sieving process
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
simon3270

I'm not sure why anything in /dev is being scanned - that directory should be weeded out by the "/usr/xpg4/bin/grep -v".

Let me have a try of a couple of things.....
sunhux

ASKER
Hi Simon

managed to figure why it scanned /dev or if you could test out the script at your end?
SOLUTION
simon3270

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
skullnobrains

you probably should not rely on a fixed list to determine where fifos and sockets might be since they can be created anywhere on the filesystem, and don't even need to be persistent : many software will happily create temporary sockets in /home, some of them will only exist for a split second. scanning with a tool one one host at one specific moment and expecting to get the same results later on on hundreds of hosts seem to be overconfident.

but if you want to, you should at least exclude directories such as /dev and /proc from both your find commands, and exclude the list of directories you don't want to scan from the second find comand so it does not recursively and uselessly scan them rather than grepping things out.

find location -not -path /dev -not -path /proc ...

you'll also find a flag that instructs find not to traverse filesystem boundaries and another one that prevents it from following symlinks so you don't end up with loops or various stuff being scanned several times.

then it still looks rather inefficient to scan the whole filesystem tree twice before you even start to scan the first file
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
sunhux

ASKER
Hi SkullNB,
I'm mandated to use the "endorsed" AV scanner though must say the party who endorsed
it failed to test it thoroughly & do the due diligence.

Hi Simon,
Just discovered a new option in the scanner which will speed further (though still not fast
enough).  Other than the 7 folders ie /usr, /Documents, /sbin, which I'll process using the much
faster "-s /folder", perhaps we can process those non-7 folders using option below:

echo "Scan started on " `date` > update.log
echo "[FILE]" > FileList.txt
find /non-7_folders -print -type f -size -52428800c |grep -v ".sock" |grep -v "/cdrom" |grep
 -v "/boot" |grep -v "/platfo" |grep -v "/proc" |grep -v mnttab |grep -v "/sys"
|grep -v "/dev" |grep -v "/net" |grep -v ".dbf" |grep -v ".arc" |grep -v ".lock"
 |grep -v ".ctl" |grep -v ".rdo" |grep -v "racle" |grep -v "ysql" |grep -v sulog
 |grep -v syslog |grep -v messages >> FileList.txt
nice -12 ./vscantmsol64 -p=/opt/uvscan @FileList.txt -NC  >> update.log
echo "Scan completed on " `date` >> update.log
rm -f FileList.txt
sunhux

ASKER
Hi Simon

Can you point to me which line(s) of the codes
"skips block special files, character special files and symbolic links" ?
sunhux

ASKER
What's the syntax to do "not of the type   block, symlink, special files"

Is the following syntax correct?  If not, what would be a good syntax?
find /folder -type f   -type -not b   -type -not c  -type -not l   -print
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
simon3270

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
sunhux

ASKER
Ran into syntax errors:

-bash-3.2# find /* -size -52428800c  -a ! \( -type b -o -type c -o -type s -o -type d\)
find: unmatched '('
-bash-3.2# find /* -size -52428800c  ! \( -type b -o -type c -o -type s -o -type d\)
find: unmatched '('

-bash-3.2# find /* -size -52428800c  -a ! \( -type b -o -type c -o -type s -o -type d)\
-bash: syntax error near unexpected token `)'
sunhux

ASKER
-bash-3.2# find /opt/* -size -52428800c  ! (-type b -o -type c -o -type s -o -type d)
-bash: syntax error near unexpected token `('
SOLUTION
simon3270

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
skullnobrains

I'm mandated to use the "endorsed" AV scanner though must say the party who endorsed
it failed to test it thoroughly & do the due diligence.
are you not aloud to name the endorsed scanner ? i'm interested so i know i might run into the same kind of trouble if i ever were to use it... thanks

did you give a try to my approach ? i honestly believe it would save you LOTS of time both by not writing a crazily complicated script that will most likely require permanent maintenance and while scanning... anything wrong with it ?
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23