Link to home
Start Free TrialLog in
Avatar of FartingUncle
FartingUncle

asked on

*nix quickie - du command - how to compare *nix/Windows directory sizes

I uploaded a set of files to my Linux server[1] via FTP, from my Win2K machine.

If I look at the file properties of the parent directory (call it 'Files') under Windows, I get the following info:

  Size: 12,147,595 bytes
  Contains: 3,099 files, 277 folders.

After running the upload (which was unattended) I wanted to check that everything was present and correct, so I ran the following command from the remote 'Files' directory to see the total number of files:

  $ find . -type f | wc -l
  3099

As you can see, the file count was correct.  I then ran the following command (again, from the 'Files' directory) to get the total size of the uploaded files:

  $ du -b | tail -n1
  12466972

This size is different to the size reported by Windows.  I expect part of this is due to the fact that directory entries count as 1024 bytes, but even removing 1024 * 277 folders (283,648) I get a total of 12,183,324 - still more than the expected 12,147,595 reported by Windows.

So therefore, 2 questions:

1) Why is the total file size reported by these two methods different.
2) What is the correct *nix command that I should use to get the actual file size of the files only (i.e. the size reported by Windows).

Thanks

- Mark

[1] Slackware 11.0.0, Linux version 2.6.17.13
Avatar of markpalinux
markpalinux
Flag of United States of America image


1) if you use Windows Explorer and right click a file you will see two sizes, size and size on disk, the command line in Windows for a dir command gives you the size of the file - but I am not sure how this works out when you are running a dir of the directory.
2) ?????

There are some FTP clients that compare directories, I think WinSCP ( GPL ) also has a compare but just for files in a directory - and I think WinSCP is limited to looking at just the time stamps on the files.

rsync (GPL)  can do things over ftp.


http://linux.about.com/od/commands/l/blcmdl1_find.htm
-size n[bckw]
    File uses n units of space. The units are 512-byte blocks by default or if `b' follows n, bytes if `c' follows n, kilobytes if `k' follows n, or 2-byte words if `w' follows n. The size does not count indirect blocks, but it does count blocks in sparse files that are not actually allocated.

If you are going to repeat this in an automated method - I would suggest zipping the 13 megs xfer the file, then unzipping.

Sorry, I could not directly answer your two questions  - thought you may find something useful here - I did really search for an answer.

Mark

P.S.
I think mc - Midnight Commander may also have a directory compare function.
I would try to see if there is a size difference on one or various files.

you need to get a file with all files in your windows box with all their sizes
and then you can run find to get all your files and sizes, like in here:
find . -type f | xargs ls -l | awk '{print $5,$8}'

then run a diff on those two files and see if there is a file that could not be correctly transferred, maybe due it is open and still in use or just to see there is a difference on the methods to see the space used for these files.
Avatar of FartingUncle
FartingUncle

ASKER

Hi - thanks for the replies.

markpalinux:

Yes, Windows reports two sizes, and the size I have listed here is the logical file size (not the size on disk, which is about 53MB).

I am not sure what you are referring to with your link to 'find'.  

I have re-uploaded the files as a zip file and unzipped onto the server.  I still get a difference, but this time the uploaded size is 12,469,131 bytes, which gives a difference of 321,536.  This is 1024 * 314, so the figure is a lot less random and so there is probably a 'neat' answer to q1, but even if you take into account 1024 bytes per directory, there are still 37 * 1024 bytes unaccounted for.

Redimido:

I think it's unlikely that there is a file difference in my newly uploaded set of files, and it would be quite a bit of work to generate matching outputs from both OSes, but thanks for the idea.



In general, I am surprised that there isn't a simple *nix command that can give the logical size of all files in a sub-directory and which can match the results given by Windows.  I'm sure this kind of check is something that is commonly required?
well, du is the command you need.
if there is a difference, then
a) there was a problem in the transference
b) it is counting the space used by directories
c) there is a problem on the WINDOWS side

I'm pretty sure there are enough options to du to get the file size you are looking for.
Well,

...it's not (a) as I did a second transfer as a zipped file, which wouldn't have unzipped if there were transfer errors.

...it's not wholly explained by (b) - directories are listed as taking 1024 bytes, and there are 277 directories.  That still leaves about 38Kb unaccounted for.

...I very much doubt (c) as there Google doesn't throw anything up and I severely doubt that an error like this would not have been spotted and publicised by now!

As far as I can see, *nix is clearly counting something in this total that Windows is not.  My original questions can therefore be rephrased as follows:

1) What, apart from the logical file size, does the du command count?  1024 bytes per directory entry is one of the things, but is not the whole story.
2) How can I get an output from du (or some other command) that omits these extra items?
when there are more files into a directory, it would be counted with more bytes. did you check that?
Would the current/parent (. and ..) have any representation in the extra results from du for each folder and subfolder?
Redimido: sorry, I'm not sure what you mean by that.  Can you expand please?

Avatech: A single directory entry is 1024 bytes and there are 277 sub-directories (so 278 including the root).  The total discrepancy is 314 * 1024 bytes, so I don't think this is the whole story.

It appears this question isn't as straight-forward as it first appeared, which is a surprise, to be honest!  I thought it would be one of those things that experienced *nix people would know straight away.  Obviously it's not quite so simple, so I'm upping the points.
ASKER CERTIFIED SOLUTION
Avatar of Gabriel Orozco
Gabriel Orozco
Flag of Mexico image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The required command was

find . -type d | xargs ls -l -d

...the -d switch ensuring that directory entries were listed rather than their contents.

However, you are right!  There are directory entries of higher multiples of 1024... and they account for all the extra space.  So that's q1 answered - the extra space is purely from the directory entries after all.

Now the second part - is there a command that can give me the disk usage of just the files, not including any directory entries themselves.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Redmido: That script doesn't work, I'm afraid.  It gives an even higher number than "du -b"!  I guess this is because "ls -lR" is showing all files and all directories (including both the "." and "..") so it counts every directory twice - the opposite of what I need!

However, the "awk" part of your script was enough for me to figure out a command line that works:

  find . -type f | xargs ls -l | awk '{sum = sum + $5} END {print sum}'

It basically uses 'find' to get all folders, passes them to 'ls' to get the details and then 'awk' sums the sizes.  I've added this as an alias in my startup script, so now I can just type 'filesize' to get the result I am after.

Thanks for your help.  For solving q1 and for providing enough info to figure out q2 I will be awarding you the points.

Cheers,

- Mark
OK - thanks.  If possible, it might be worth removing these last few posts, as they clutter the answer for any future readers.