Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

*nix quickie - du command - how to compare *nix/Windows directory sizes

Posted on 2007-09-28
16
Medium Priority
?
438 Views
Last Modified: 2008-01-09
I uploaded a set of files to my Linux server[1] via FTP, from my Win2K machine.

If I look at the file properties of the parent directory (call it 'Files') under Windows, I get the following info:

  Size: 12,147,595 bytes
  Contains: 3,099 files, 277 folders.

After running the upload (which was unattended) I wanted to check that everything was present and correct, so I ran the following command from the remote 'Files' directory to see the total number of files:

  $ find . -type f | wc -l
  3099

As you can see, the file count was correct.  I then ran the following command (again, from the 'Files' directory) to get the total size of the uploaded files:

  $ du -b | tail -n1
  12466972

This size is different to the size reported by Windows.  I expect part of this is due to the fact that directory entries count as 1024 bytes, but even removing 1024 * 277 folders (283,648) I get a total of 12,183,324 - still more than the expected 12,147,595 reported by Windows.

So therefore, 2 questions:

1) Why is the total file size reported by these two methods different.
2) What is the correct *nix command that I should use to get the actual file size of the files only (i.e. the size reported by Windows).

Thanks

- Mark

[1] Slackware 11.0.0, Linux version 2.6.17.13
0
Comment
Question by:FartingUncle
13 Comments
 
LVL 15

Expert Comment

by:markpalinux
ID: 19978468

1) if you use Windows Explorer and right click a file you will see two sizes, size and size on disk, the command line in Windows for a dir command gives you the size of the file - but I am not sure how this works out when you are running a dir of the directory.
2) ?????

There are some FTP clients that compare directories, I think WinSCP ( GPL ) also has a compare but just for files in a directory - and I think WinSCP is limited to looking at just the time stamps on the files.

rsync (GPL)  can do things over ftp.


http://linux.about.com/od/commands/l/blcmdl1_find.htm
-size n[bckw]
    File uses n units of space. The units are 512-byte blocks by default or if `b' follows n, bytes if `c' follows n, kilobytes if `k' follows n, or 2-byte words if `w' follows n. The size does not count indirect blocks, but it does count blocks in sparse files that are not actually allocated.

If you are going to repeat this in an automated method - I would suggest zipping the 13 megs xfer the file, then unzipping.

Sorry, I could not directly answer your two questions  - thought you may find something useful here - I did really search for an answer.

Mark

P.S.
I think mc - Midnight Commander may also have a directory compare function.
0
 
LVL 19

Expert Comment

by:Gabriel Orozco
ID: 19985140
I would try to see if there is a size difference on one or various files.

you need to get a file with all files in your windows box with all their sizes
and then you can run find to get all your files and sizes, like in here:
find . -type f | xargs ls -l | awk '{print $5,$8}'

then run a diff on those two files and see if there is a file that could not be correctly transferred, maybe due it is open and still in use or just to see there is a difference on the methods to see the space used for these files.
0
 
LVL 1

Author Comment

by:FartingUncle
ID: 19990563
Hi - thanks for the replies.

markpalinux:

Yes, Windows reports two sizes, and the size I have listed here is the logical file size (not the size on disk, which is about 53MB).

I am not sure what you are referring to with your link to 'find'.  

I have re-uploaded the files as a zip file and unzipped onto the server.  I still get a difference, but this time the uploaded size is 12,469,131 bytes, which gives a difference of 321,536.  This is 1024 * 314, so the figure is a lot less random and so there is probably a 'neat' answer to q1, but even if you take into account 1024 bytes per directory, there are still 37 * 1024 bytes unaccounted for.

Redimido:

I think it's unlikely that there is a file difference in my newly uploaded set of files, and it would be quite a bit of work to generate matching outputs from both OSes, but thanks for the idea.



In general, I am surprised that there isn't a simple *nix command that can give the logical size of all files in a sub-directory and which can match the results given by Windows.  I'm sure this kind of check is something that is commonly required?
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 19

Expert Comment

by:Gabriel Orozco
ID: 19991341
well, du is the command you need.
if there is a difference, then
a) there was a problem in the transference
b) it is counting the space used by directories
c) there is a problem on the WINDOWS side

I'm pretty sure there are enough options to du to get the file size you are looking for.
0
 
LVL 1

Author Comment

by:FartingUncle
ID: 19991574
Well,

...it's not (a) as I did a second transfer as a zipped file, which wouldn't have unzipped if there were transfer errors.

...it's not wholly explained by (b) - directories are listed as taking 1024 bytes, and there are 277 directories.  That still leaves about 38Kb unaccounted for.

...I very much doubt (c) as there Google doesn't throw anything up and I severely doubt that an error like this would not have been spotted and publicised by now!

As far as I can see, *nix is clearly counting something in this total that Windows is not.  My original questions can therefore be rephrased as follows:

1) What, apart from the logical file size, does the du command count?  1024 bytes per directory entry is one of the things, but is not the whole story.
2) How can I get an output from du (or some other command) that omits these extra items?
0
 
LVL 19

Expert Comment

by:Gabriel Orozco
ID: 19994114
when there are more files into a directory, it would be counted with more bytes. did you check that?
0
 
LVL 4

Expert Comment

by:avatech
ID: 19994150
Would the current/parent (. and ..) have any representation in the extra results from du for each folder and subfolder?
0
 
LVL 1

Author Comment

by:FartingUncle
ID: 19994802
Redimido: sorry, I'm not sure what you mean by that.  Can you expand please?

Avatech: A single directory entry is 1024 bytes and there are 277 sub-directories (so 278 including the root).  The total discrepancy is 314 * 1024 bytes, so I don't think this is the whole story.

It appears this question isn't as straight-forward as it first appeared, which is a surprise, to be honest!  I thought it would be one of those things that experienced *nix people would know straight away.  Obviously it's not quite so simple, so I'm upping the points.
0
 
LVL 19

Accepted Solution

by:
Gabriel Orozco earned 300 total points
ID: 19994900
What I say is if the directory has many entries, it will take more than 1024 bytes, 2048, 4096 bytes, etc.
there could be the difference.
you can see it with
find . -type d | xargs ls -l
0
 
LVL 1

Author Comment

by:FartingUncle
ID: 19995755
The required command was

find . -type d | xargs ls -l -d

...the -d switch ensuring that directory entries were listed rather than their contents.

However, you are right!  There are directory entries of higher multiples of 1024... and they account for all the extra space.  So that's q1 answered - the extra space is purely from the directory entries after all.

Now the second part - is there a command that can give me the disk usage of just the files, not including any directory entries themselves.
0
 
LVL 19

Assisted Solution

by:Gabriel Orozco
Gabriel Orozco earned 300 total points
ID: 19996176
if du does not have this option, and I don't think it should since the directories are part of the total, I would say:

if you do not specify a directory it will take current one.

dufiles.sh:
#!/bin/bash
DIR=.
[ $# -ne 0 ] && DIR=$1
printf "Size: "
cd $DIR
ls -lR | awk '{sum = sum + $5} END {print sum}'
0
 
LVL 1

Author Comment

by:FartingUncle
ID: 19997708
Redmido: That script doesn't work, I'm afraid.  It gives an even higher number than "du -b"!  I guess this is because "ls -lR" is showing all files and all directories (including both the "." and "..") so it counts every directory twice - the opposite of what I need!

However, the "awk" part of your script was enough for me to figure out a command line that works:

  find . -type f | xargs ls -l | awk '{sum = sum + $5} END {print sum}'

It basically uses 'find' to get all folders, passes them to 'ls' to get the details and then 'awk' sums the sizes.  I've added this as an alias in my startup script, so now I can just type 'filesize' to get the result I am after.

Thanks for your help.  For solving q1 and for providing enough info to figure out q2 I will be awarding you the points.

Cheers,

- Mark
0
 
LVL 1

Author Comment

by:FartingUncle
ID: 19997951
OK - thanks.  If possible, it might be worth removing these last few posts, as they clutter the answer for any future readers.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Why Shell Scripting? Shell scripting is a powerful method of accessing UNIX systems and it is very flexible. Shell scripts are required when we want to execute a sequence of commands in Unix flavored operating systems. “Shell” is the command line i…
This article will show you step-by-step instructions to build your own NTP CentOS server.  The network diagram shows the best practice to setup the NTP server farm for redundancy.  This article also serves as your NTP server documentation.
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Suggested Courses
Course of the Month12 days, left to enroll

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question