Link to home
Start Free TrialLog in
Avatar of Mamata_gd
Mamata_gd

asked on

Accessing directory or file very fast

Hello All,

This question is related to C algorithm and not to any OS as such.

My structure is something like this

Typedef struct file_or_directory
{
      void    **handle;
      char       file_or_Directory_name[256];
      int       date_of_creation;
      int      date_of_modification;
      long      size_in_bytes;
      long      size_in_sector;
      byte      access_proviledges;
      byte      type_of_file;
};

I create new directories or files in my application. Now user will access for file by name and need to access it very fast. How would one do?

Forget about, how file is created? How file is accessed? And all that.

My first question is how would you access it fast?
I will ask next question after this.

Thanks,
M
Avatar of Jaime Olivares
Jaime Olivares
Flag of Peru image

Consider this first:
The first member of struct is 'handle', besides the OS you are using, you cannot open much file handles at a time.

Its is difficult to separate OS from file manipulation because faster methods are specific OS related, I think there is not a "universal" formula to do it.
'portability' sometimes compromise 'high performance'
Avatar of Mamata_gd
Mamata_gd

ASKER

Please forget about OS. I will have to change the question itself.

I have database of student.
typedef struct student
{
      int      student_id;
      char       name[256];
      int      age;
      char      address[1024]
      int       date_of_creation;
      int      date_of_modification;
      int      maths_marks;
      int      physic_marks;
      int      chemistry_marks;
};

If I want to access the student with id, then I should have fast access. Student Ids are not contiguous as Ids are generated by accounts dept as per their convenience, so do not think of 'array of structure'.

There's not much you can do to speed up opening the file for access.  Most OS's have like ONE "open me this file" call.  Very rarely is there a faster alternative.

There are some things you can do to speed up the code though:

(1)  Depending on the OS, you may be allowed to keep 10 to 1000 files open at once.   So if your user is likely to use a file more than once, leave the file open so you don't have to open it again the next time.  Saves 100% of the open time.  Do learn what the limit is on open files, it varies widely by OS and I/O library.

(2)  If you know roughly what path the user is exploring, like they're browsing thru the file system tree, it helps to pre-touch the directories on the next level down.  This loads them into the OS's disk cache, so when you do go to one of those directories, the info will tend to be already in memory.

(3)  If the files are not too large, you could have a separate thread that reads them into memory before the user has even started picking files.  That eliminates 100% of the file opening/reading delay.

I repeat,

"This question is related to C algorithm and not to any OS as such"

M
OK, I guess filename was not the best example.
According to your needs, you have a list of data in tabular format, which is identified by a unique_id field.
Data is fixed lenght and you want to access it randomly.
I will assume all the information will kept in memory, but will be valid for file too.

I guess one of the best method is:
Store all your data in a contiguos array, with growing functionality.
Index all data in a new array of pointers, pointing to true data, and ordered by unique_id field.
Every time you want to add an element, just add **at the end** of the main array.
In case of index, add a new index element in the proper location of array to maintain order, according to new id.
Every time you want fast access to a record, use a binary search, knowing that array is ordered by index.

Hope to be a good starting point.
Jaime.
ASKER CERTIFIED SOLUTION
Avatar of grg99
grg99

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Let me tell you, what exactly, I need to do. If  I add new student it will have student ID greater than last added.

Now if I want to sort the list by highest marks obtained in Physics. Later I will say, sort the list with maths marks with least to highest. Later sort with date of modification.

If this is the case, what data structure is best and how do I achieve this?

Let me know…
M
SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ozo,

Please explain...
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Say I use binary tree and added the student as per there Ids created.  

Now, somebody asks for a student who has obtained highest marks in chemistry.  
Later, somebody asks who is youngest in this school.

How do I sort the binary tree. DO I break the tree and reconstruct. How  do I go about this?

M
Binary tree is not efficient for multiple-index.
Just have to make many index arrays as I have suggested.
Requirement is, I should be able to be able to add and delete students. And at the same time, sort with any of the key.
The add/delete operation is very frequent, then you need a linked list, but still a index array is a good idea for it.

I'm not answering your question but rather curious on why you don't want to use an already existing DB? I too am not familiar with any algorithm for multi-indexing but it seems like any data structure that has been proposed needs to be reconstructed for every particular query unless you want to have bunch of data structures for each field open.
if you **HAVE** to process in memory, then may justify, but if you want to process in file, is really unnecessary.
If you explain your motifs maybe we can help you better.
Sorry I am responding to your questions bit late.

I want to build system where I can add and delete the student structure. I do not want to store or retieve these stucture into any secondary media at this moment. It is plain application, creates the sudent as unser enters data.

But I want to sort them and access these structres efficiently with the different keys. How do I do that?

Jamie, was telling to build the mapping table with help of array. But can not build this mapping table until and unless I know the number of students.


Please let me know...

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial