Link to home
Start Free TrialLog in
Avatar of benjilloyd
benjilloyd

asked on

Calculate folder structure so that only 127 documents get stored per folder

Hi,

I have been trying to create a function that will take the ID of a documents' database record (starting at ID 1) and store the physical document in a folder based on this ID where we never store more than 127 files in each folder.

This is for a very basic document management system I want to look at writing.

So, with ID 1 the file would be stored in folder disk:\0\
With ID 2 the same would be the case
When we get to the 128th document (having already stored 127 files in folder disk:\0\) the 128th file should be stored in folder "1" (disk:\1\)
This should continue until we fill the folder disk:\126\ (base 0, we start at folder 0) with 127 files - when this occures the next file (the 16129th) should be stored in the folder: disk:\0\0\ (we move up a level) then when that folder is illed with 127 files we would move to disk:\0\1\ and again, once we fill folder disk:\0\126\ we would move to the folder disk:\1\0\ and the cycle continues.

I am pretty sure this is a 'common approach' (or close to what is used with DM systems) for storing documents but I am having difficulty creating the recursive function to calculate the folder a file should be stored in based on its ID.

Hope someone can help,thank you.
Avatar of ADSaunders
ADSaunders

Hi benjilloyd,
How about recursively dividing your document ID by 128?
f = int(id / 128)
if f < 128 then f is the folder no
else l = int(r / 128) where l is the level no.
...

Regards .. Alan
ADSaunders,
> else l = int(r / 128) where l is the level no.
sorry should be
else l = int(f / 128) where l is the level no.

.. Alan
Avatar of benjilloyd

ASKER

Not quite sure how that works.
The result should leave me with a folder path something like \0\0\0\2\
It's a bit of a headwrecker!!
Have a play with Windows calculator in scientific mode (view->scientific), you'll see it better.
Click the decimal toggle, then Enter the id number of the document, and click the Oct toggle.

From the right, the first three digits are ignored, the next three digits are the folder, and the next three are the level
Note these values are octal where 177 == 127Dec
Just Disk:\0-127\0-127\ ... 127 documents will store 2048383 documents.

.. Alan
Nope, you've completely lost me there!  I don't see how I this would work at all.
I may be a lost cause :-)
Thanks anyway
I'm off home now, I'll try to explain further in the morning. Meanwhile could you please post an example of the largest document ID that you would expect.

.. Alan
Hi,
Using 127 documents per folder seems to be a bit confusing, lets try with 100 documents per folder (file ID ending with '00' to '99'). This can be done easily by string splitting the formatted number at two digit intervals from the left. (each split is equivalent to recursively dividing by 100).
Now, the format string needs to be adjusted depending on the highest file id that you expect, in the example code I've used a format big enough to handle an eight-digit file id.
NOTE - I don't know any VB.NET, so you'll have to translate yourself. The following code works in VB6.

Private Function GetPath(FileID As Long, FileName As String) As String
    Dim strFormat As String, fPath As String
    fPath = Format(FileID, "00\\00\\00\\00")     ' Format string must match max expected length of file id
    fPath = Left(fPath, Len(fPath) - 2)               ' truncate the last two digits, we're going to replace them with the file name
    GetPath = "Disk:\" & fPath & FileName
End Function

this is called by:
Mystr = getpath(n, s)

For example
Mystr = GetPath(1234, "Manual.pdf")  ' returns 'Disk:\00\00\12\manual.pdf'  in the variable Mystr
Mystr = GetPath(87654321, "Document.doc") ' returns 'Disk:\87\65\43\Document.doc'  "         "

Is this any clearer?
It's certainly simpler using 100 instead of 127, although the principle of recursively dividing by the no. of documents required still holds.
In this case, you'll have 4 levels of folders, the first three of which each contain only folders, and the last level which actually contains the documents:
Level 1, 100 folders (00 to 99)
Level 2, 10,000 folders (100 * 00 to 99)
Level 3, 1,000,000 folders (10,000 * 00 to 99)
Level 4, 100,000,000 documents (1,000,000 * documents whos ID ends in 00 to 99)

Should be enough for anyone. As with all recipes, adjust for quantity.

Regards .. Alan
There is no limit on the file ID.  Thanks for the post, I will digest...
Hi,
You're actually limited by the maximum value that can be stored in whatever data type is used for your file ID. As a matter of interest, if this is a long integer (VB) according to MS, the maximum positive value is 2,147,483,647 or 10 digits.
so modifying the format string in the above code to "00\\00\\00\\00\\00" will handle anything that will fit into a long.
This gives 4 levels of folders, and the 5th level is the documents themselves.

As a bonus, the following vbscript will create the (empty) folder structure for you to the original 3 levels of folders.
Warning! run it over the weekend if you intend increasing the levels, It took over an hour on a P4 3000 with 512 MB ram for just the three!

.. Alan

Dim fso, f, Count
Sub createpaths(f, l)
    If l > 3 Then Exit Sub ' Change the 3 to the number of folder levels required.
    For i = 0 To 99
        f.SubFolders.Add Right("00" & Trim(CStr(i)), 2) ' Can't get vbscript to format with leading zeros!!
     Next
    For Each g In f.SubFolders
        createpaths g, l + 1
    Next
End Sub

Set fso = CreateObject("Scripting.FileSystemObject")
Set f = fso.GetFolder("D:\Test") ' *** NOTE *** Modify your root path appropriately before running!!
createpaths f, 1
I don't think you solution really works for what I want to do.

I don't want to have to know the top level.  I understand what you are saying about the max it can go up to is the max of the format of the ID.

OK, I have revisited this myself and come up with the following, I arrived at this by simply trying to better describe to you how I want to store my files and I think I have the solution... Let me know what you think...

Block_size = 127
Num_folders_per_level = 16129 (eg: Block_size * Block_size)

The Num_folders_per_level shows us that we have 16129 total files per level eg:

ID=Folder
1=Disk:\0
128=Disk:\1
...
16129=Disk:\127
the next file will exist in the next level, eg:
16130=Disk:\0\0

Based on this here is how I think you would calculate the folder location of file ID=100000

How many levels deep is this file?
100000 / 16129 = 6.2000124000248000496000992001984 (so it'll be 6 levels deep)

We now need to know the remainder from the calc above so we can work out the folder this file should be in at that level:
100000 - (16129 * 6) = 3226
3226 / 127 = 25.401574803149606299212598425197

So file location= Disk:\0\0\0\0\0\0\25\100000.file

Hi, I don't think your calculations are correct for the number of files at each level.
At level 1 yes, there are 16129 files (127 * 127)
At level 2 however, there will be (if the folder tree is to remain balanced) 2048383 files (127 * 127 * 127)
and at level 3 (127 * 127 * 127 * 127) 260144641
etc. ...
And, according to your explanation above, each level can have 127 documents _and_ 127 subfolders ...?

In my scenario, 99,999,999 documents will fit into a tree only 4 levels deep, and that with documents only at the lowest level.

I must admit, I tend to go for simplicity, and easy to follow logic and code. I've been in the game far too long (and had to analyse someone else's code too often) to write code myself that someone else can't pick it up quickly for maintenance. In fact at my last job, part of coding standards insisted that code was easy to follow, and properly documented.

Anyway, it's your call.

.. Alan
I thought my method offered quite a simple solution.

I'm a little concerned that it wouldn't leave the folder balanced as you say - if I look at the calc to me it looks like it will fill the structure fully.  Perhaps I have it wrong.

ASKER CERTIFIED SOLUTION
Avatar of ADSaunders
ADSaunders

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry for the delay in replying, I did test the function extensivily and noticed some odities with it, mainly that the initial folder contained only 126 files, subsequent files then contained 127.  I will try to find time to establish why this is.  I also noticed that the folder didn't start at 0 when going into a new level, instead starting at 1.

This is one of those things that is really quite simple, and I know the Algorithm is real simple, it's just getting to the point of realising it!
Hi, That's because there is no document 0. So the first folder only contains 1 to 127.

.. Alan
Very good point!
What about the second level in not starting at 0, but 1 instead, whereas the first level start at 0.
cheers
Sorry, that sounds daft! of course 1 to 127 has 127 files, first folder should contain 0 to 126, but there is no document 0.

.. Alan

viz: any number from 0 to 126 wil give an integer dividend of 0 and a mod of itself. e.g.
Number      dividend     mod
  0                   0            0   <- This document probably does not exist
  1                   0            1
  2                   0            2
. . .
125                  0         125
126                  0         126
127                  1            0
128                  1            1
. . .
16129            127           0  <- But this does.
16129            127           1

.. Alan
TheLearnedOne
All valid (and tested) code.

.. Alan