We help IT Professionals succeed at work.

Total file size is way off on Vista partition. Any ideas?

smartins
smartins asked
on
263 Views
Last Modified: 2010-04-05
I have an application that displays a progress bar while it goes through all files on a specific partition.

I've just noticed that when summing up all the files size on vista's partition the total file size is way way off the actual used space.

For example, on my Vista partition (40 GB), I have around 30 GB free but I get a total file size of 257 GB when recursively going through all the files size!

Used Disk Space: 30119 MBytes
Used Disk Space #2: 257920 MBytes

If I change the partition to anything other than the one where Vista is installed the issue disappears.
I've tried and duplicated this issue on two different Vista machines.

Anyone has any idea why this happens and how to fix it?

Thanks!
function GetDirSize (dir: string; subdir: boolean): Int64;
var
  rec: TSearchRec;
  found: integer;
begin
  result:=0;
  if dir[length(dir)]<>'\' then dir:=dir+'\';
  found:= findfirst(dir+'*.*', faAnyFile, rec);
  while found=0 do
  begin
    Inc(result, Int64(rec.findData.nFileSizeHigh) shl 32 + rec.findData.nFileSizeLow);
    if (rec.Attr and faDirectory > 0) and (rec.Name[1]<>'.') and (subdir=true) then
      inc(result, getdirsize(dir+rec.Name, true));
    found:=findnext(rec);
  end;
  findclose(rec);
end;
 
[...]
 
var
  FreeAvailable, TotalSpace, TotalFree: TLargeInteger;
  UsedSize: Extended;
  Drive: PChar;
begin
  Drive := 'c:\';
  Windows.GetDiskFreeSpaceEx(Drive, FreeAvailable, TotalSpace, @TotalFree);
  UsedSize := TotalSpace-TotalFree;
  Memo1.Lines.Add('Used Disk Space: ' + FloatToStr(Round(UsedSize/sqr(1024))) + ' MBytes');
  Memo1.Lines.Add('Used Disk Space #2: ' + FloatToStr(Round(GetDirSize(Drive, True)/sqr(1024))) + ' MBytes');
end;

Open in new window

Comment
Watch Question

Top Expert 2007

Commented:
just a hunch.

add in front of
    Inc(result, Int64(rec.findData.nFileSizeHigh) shl 32 + rec.findData.nFileSizeLow);

if rec.attra and fadirectory <> fadirectory then

so that only when you have an actual file will the size be computed.

if that is not it, then I would start by using small directories to find which types of files/directories are making the problem.
Top Expert 2007

Commented:
also,
if (rec.Attr and faDirectory > 0) and (rec.Name[1]<>'.') and (subdir=true) then
should be
if (subdir=true) and (rec.Attr and faDirectory > 0) and (rec.Name<>'.') and (rec.name<>'..') then

why? because:
1) you only want those comparisons made if subdir is true. so this is speed optimal
2) you can have a valid directory named ".test". try it :) you're code will not count that, which is a bug

Author

Commented:
Thanks for the reply. The modifications you suggested did not result in any changes. Like i said before, if running the code on a non vista installed partition it returns the correct values. It's only on a partition where vista is installed that this problem happens :/

Author

Commented:
Update. It seems that the recursive function is going through many more files and folders that what really exist on the drive.

FilesCount: 5269072
FoldersCount: 303454

But getting the drive details through windows properties I get:
224,252 files, 16,600 folders

It seems like the function is possibly going through Symbolic link Files and folders shortcuts which end up reading duplicate files and folders.

Any ideas on how to avoid this? It's only happening on the disk where Vista is installed.

Author

Commented:
I seem to have found the problem and a solution. I need to filter out virtual folders that link to real ones with the attribute 9238 otherwise it would treat these as real folders and increase the count and size like I experienced.

If possible please close this question.
Top Expert 2007

Commented:
my tests on my vista home:

using your version i get 17565 MB of used space
using my version I get 17566 MB of used space
using total commander I get 17566 MB of used space
(and as I said, at least on my drive there ARE directories that start with a dot (for example the pidgin (ex-gaim) has some directories like that. and so do other linux-esh programs)

for your convenience, if you don't use a file manager like total commander, then set your system to show hidden files. then select all files and directories from your root vista drive and right click and select properties. you will see 2 sisez there:
- size on disk
- size of all files

the size of all files will be almost exactly the size you counted with your version and exactly the size counted with my version of the code
the size on disk will be the one reported by Windows.GetDiskFreeSpaceEx.

why the difference? because vista has some directories that have the compress bit set. so the folders (and the files from them) are compressed and thus occupy far less space on disc).
if I run this program on my winxp box, I will get the same result because both program files and documents and settings directories and some others are set to be compressed.

I have tested with your new "finding" of that 9238 attribute, but, you are not totally right.
there are still some files/folders that are being counted by the function and not by the OS.
unit Unit1;
 
interface
 
uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls;
 
type
  TForm1 = class(TForm)
    Memo1: TMemo;
    procedure FormCreate(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;
 
var
  Form1: TForm1;
 
implementation
 
{$R *.dfm}
 
var ff:int64=0;
    dd:int64=0;
    ee:int64=0;
    ss:int64=0;
 
function GetDirSize (dir: string; subdir: boolean): Int64;
var
  rec: TSearchRec;
  found: integer;
begin
  inc(dd);
  result:=0;
  if dir[length(dir)]<>'\' then dir:=dir+'\';
  found:= findfirst(dir+'*.*', faAnyFile-fasymlink-favolumeid, rec);
  while found=0 do
  begin
    if (rec.attr and fadirectory <> fadirectory) and (rec.attr and 9238 <> 9238) then
    begin
      Inc(result, Int64(rec.findData.nFileSizeHigh) shl 32 + rec.findData.nFileSizeLow);
      inc(ff);
    end;
    if rec.attr and 9238=9238 then
    begin
      inc(ee);
      Inc(ss, Int64(rec.findData.nFileSizeHigh) shl 32 + rec.findData.nFileSizeLow);
    end;
    if (subdir=true) and (rec.Attr and faDirectory = fadirectory) and (rec.Name<>'.') and (rec.name<>'..')
      and (rec.attr and 9238 <>9238) then
//    if (rec.Attr and faDirectory > 0) and (rec.Name[1]<>'.') and (subdir=true) then
      inc(result, getdirsize(dir+rec.Name, true));
    found:=findnext(rec);
  end;
  findclose(rec);
end;
 
function MyGetDirSize (dir: string; subdir: boolean): Int64;
var
  rec: TSearchRec;
  found: integer;
begin
  result:=0;
  if dir[length(dir)]<>'\' then dir:=dir+'\';
  found:= findfirst(dir+'*.*', faAnyFile-fasymlink-favolumeid, rec);
  while found=0 do
  begin
    if rec.attr and fadirectory <> fadirectory then
      Inc(result, Int64(rec.findData.nFileSizeHigh) shl 32 + rec.findData.nFileSizeLow);
    if (subdir=true) and (rec.Attr and faDirectory = fadirectory) and (rec.Name<>'.') and (rec.name<>'..') then
      inc(result, MyGetDirSize(dir+rec.Name, true));
    found:=findnext(rec);
  end;
  findclose(rec);
end;
 
procedure TForm1.FormCreate(Sender: TObject);
var
  FreeAvailable, TotalSpace, TotalFree: TLargeInteger;
  UsedSize: Extended;
  Drive: PChar;
begin
  Drive := 'c:\';
  Windows.GetDiskFreeSpaceEx(Drive, FreeAvailable, TotalSpace, @TotalFree);
  UsedSize := TotalSpace-TotalFree;
  Memo1.Lines.Add('Used Disk Space: ' + FloatToStr(Round(UsedSize/sqr(1024))) + ' MBytes');
  Memo1.Lines.Add('Used Disk Space #2: ' + FloatToStr({Round(}GetDirSize(Drive, True){/sqr(1024))}) + ' Bytes');
  Memo1.Lines.Add('Used Disk Space #3: ' + FloatToStr({Round(}MyGetDirSize(Drive, True){/sqr(1024))}) + ' Bytes');
  memo1.lines.add('files: '+inttostr(ff));
  memo1.lines.add('directories: '+inttostr(dd));
  memo1.lines.add('9238: '+inttostr(ee));
  memo1.lines.add('total size of 9238: '+inttostr(ss));
end;
 
end.

Open in new window

Author

Commented:
Just tried your code but I get very different results:

Used Disk Space: 31563268096 MBytes
Used Disk Space #2: 32111407982 Bytes
Used Disk Space #3: 269169765138 Bytes
files: 225164
directories: 15831
9238: 44
total size of 9238: 0

If I filter out the 9238 folder attributes on your code I get the exact same results as in mine.
Top Expert 2007

Commented:
>> If I filter out the 9238 folder attributes on your code I get the exact same results as in mine.
that only means that you don't have directories starting with a dot. but that is not a rule :)

also, you might want to consider again the possibility of you having more than the usual number of compressed folders which gives you this huge difference.

so, I gave it a little time and researched it where it all starts: MSDN: http://msdn2.microsoft.com/en-us/library/aa365740.aspx

so, your 9238 (not sure where you got this value from) is actually:
FILE_ATTRIBUTE_NOT_CONTENT_INDEXED (8192) +
FILE_ATTRIBUTE_REPARSE_POINT (1024) +
FILE_ATTRIBUTE_DIRECTORY (16) +
FILE_ATTRIBUTE_SYSTEM (4) +
FILE_ATTRIBUTE_HIDDEN (2)

as you can see, this is just one type of filedirectory. question is, which ones are actually the ones that don't need to be re-calculated?

we can solve the dillema by using brute force :) but maybe it's not necessary.
I would suggets opening another question in some windows specific technical TA to ask which of the files/directories should not be counted and then having the attributes values, it will be piece of cake to make it work correctly.

whichever way you preffer ;)

Author

Commented:
I arrived at the 9238 value by breaking the recursive cycle where I thought things where going haywire, which are the virtual folders (the ones with an arrow, for example C:\Documents and Settings) that are in fact "links" to different locations, and take note of the attr value.

My progress bar is not critical so if it's off by a few dozen MB it's no problem. It was just that without filtering those 9238 folders I was getting total values way way bigger than the partition itself.

Top Expert 2007
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
You are correct, your approach is a much more elegant solution for this problem. Thanks for taking the time to further look into this.
Top Expert 2007

Commented:
you're welcome. I like puzzling problems :)

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.