Search for files by Content

Hi all,

I work for a University, and we have frequent problems with students downloading illegal music files.  Then we a complaint from the DMCA, we slap the student on the wrist, and revoke their internet access until we can verify that all illegal content is removed from their machine.  It's kind of hopeless, given the infinite ways to save their content before we come inspect their system, but hey, what can you do.

So here's my question:

I noticed that MP3 files begin with the two ASCII charactors ÿû (at least those that I've inspected) when viewed in Notepad.  Is there a way to search all files on the system to see if any begin with those 2 ASCII charactors?  The idea is to find ILLEGAL_SONG.MP3  that has been renamed to HARMLESS_PHOTO.JPG

Any raw ASCII/binary search utility that can do this for me....  or suggestions on some code I could compile (I imagine it would be a lot)?

Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

I'd setup the firewall not to allow P2P software getting getting data from the internet. Also make sure plain users can't install software (use Group policy if your using a Win2k server environment), that will prevent them from isntalling P2P software.
best you go to any Unix, better Linux, system and see the /etc/magic file
there're dozents of MPx file formats defined
Well in a University environment, in most cases, you can't kepe users (the students) from installing software or being admins on their machines because the students own the machines. Atleast this is the case in the University I work for :)

I would research the firewall option as you have more control over your network and what goes thru your network then you have over the student's actual computers.
How do you know if your security is working?

Protecting your business doesn’t have to mean sifting through endless alerts and notifications. With WatchGuard Total Security Suite, you can feel confident that your business is secure, meaning you can get back to the things that have been sitting on your to-do list.

a (traditional) firewall does not stop any downloads
If the PCs are the personal property of those stundents, I'd say it is their responsibilty what they do with it and if they download illegal content or not, not yours.
ahoffmann, true, but you can prevent P2P software from getting active, and that is where most "illegal" mp3 files are getting downloaded from (websites that provide illegal mp3s won't last long as they can be traced easily, but on a true p2p network most users aren't even aware they are providing files for download, it is also more difficult to get hold of the providers as many users share the same file).
.. and how many do not know how to tunnel p2p over http or https?
mistagitarAuthor Commented:
Thanks for all the replies, but prevention is not my worry.

We don't like to close ports (and that's not my department anyway)...  we know students download from PTP.  BitTorrent accounts for over 60% of our inbound pipe traffic!  We know about it but we let the students do it at their own risk.  We then get about 30-40 DMCA (Digital Millenium Coppyright Act) violation complaints per year.  Fortunately, no students have peen prosecuted.

I just have to go in and make sure they're all clean.  Like I said, 95% of them probably burn their music before deleting it, but we have to at least look like we're making the effort.

I'm interested in ahoffmann's comment about "the /etc/magic file."  Is this a map of common file types or somthing?  We use CD-bootable Knoppix for data retrieval on systems with a botched OS (Knoppix can read but not write to NTFS partitions), so that could be a solution somehow....

boot knoppix, then mount your window partition(s) and run somthing like:

    find /mounted_windoze_partition -type f -exec file {} \; |grep -i mp

be prepared for a huge amount of data when waiting long enough ;-)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
I don't know how to search for ASCII codes in files that have been renamed or might have been. But, there is a faster way to do things if you suspect this is going on and there is no other solution for you..

The main files one would rename an MP3 to would be doc, bmp, avi, asf, mpg, mpeg, anything that would look like it fits its right size to look like a legit file. You would be stupid to rename it as a jpg, seeing as most MP3's are over many jpg sizes...If you suspect this is happening, then he/she is probably not that stupid as to rename it to a jpg file.


Click start<search<for files or folders

Click pictures, music or video

Tick the box for pictures and photos

Click advanced search options

Scroll down to find "what size is it"

Click specify size (in kB)

Select "at least" from the drop down menu

Enter this amount 2,700

Don't enter a name to search for, leave it blank, and click search

Once you have finished searching, right click in that window and click view<Thumbnails....If any file is not showing a picture, then check its file size by hovering your mouse over it it that option is enabled, or if you have too many pop up then do it this way<

Right click in that window again and select View<Details

Then right click again and select Arrange Icons By<Size

Click the size tab at the top to order everything by size. Start with the bigger files to smallest. Under around 20 megs. If anything is over this and is a jpg, bmp or png, then it's suspect video or something else, investigate it.

If any files come up as jpg then you can pretty much suspect that this file is not a jpg and has been renamed,
is a corrputed file or even an unfinished download.

Find all the files that are over the size limit you specified, and open a music player that plays MP3's, simply grab all those files, drop them into it and play each one. You don't need to rename it to will play if it's a song..If not, then it's not a song or you may need the right codec or ext for that matter, it might be a wav file or sometihng like this.

BMP's are harder to detect since BMP's are normally over 2 megs and an MP3 can be as low as this size too. But do the same thing for everything you find.

Do the same thing for music and video options in the search while doing each one, not all at once...

I'm sure you know what I mean now...we're searching for files that are suspect in file type and size from the normal standards of the file, so we don't have to search all files and open them to find its ASCII code to verify it as an MP3.

Good Luck
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.