We help IT Professionals succeed at work.

how lseek work ?

yairy
yairy asked
on
Hi,

I want to build a very fast DB.
I want to fetch records fast by using an entry table at the start of the file, that will give me offsets of all entries ahead.

My question is:
How lseek work ? (not the syntex..., the algorithm behind)
Does lseek work very fast ?


Yair
Comment
Watch Question

CERTIFIED EXPERT
Author of the Year 2009

Commented:
lseek moves the file pointer to a particular location in the file -- a byte offset, typically from the beginning of the file.  Then the next time you read from the file, the data that was at that location is read from disk into memory.

To satisfy your goal, you will need to maintain a sort of directory of information at the front of the file.  You will read that directory into memory.  Then when you want a particular piece of data, you will look into your "directory" to learn the byte offset where the actual data is stored.  Then you will use lseek() to position the file pointer to that location, then you will use read() to read the data into meory.

>>Does lseek work very fast ?

lseek takes almost exactly 0 time to execute.

-- Dan
>> lseek takes almost exactly 0 time to execute.
however, you can expect the next read() to take longer because of the lseek
Commented:
>> (not the syntex..., the algorithm behind)
First of all, you should really be using fstream objects, not FILE *.  They are safer and easier to use.

No one can say how lseek of basic_istream::seekg() works since there are 100s, maybe 1000s, of versions of these functions.  Its up to the implimenter (compiler/library author) to decide on the algorithms they use.   However, its alsmost certianly going to be a reasonably fast operator--much faster than reading the entire from from the start to the location you seek too.

Most likely, the function will be an almost a straight call into an OS function that performs a seek on an OS file.  For example in Windows, the function will probably obtain a windows file handle and then call the windows SetFilePointer() function.  

Most likely all the "real" work is done by the OS and the nature of that work will depend on which OS you are using and possibly other factors, like what file system it is using.  But most likely the process involves searching a file allocation table to find out what sector on disk stores the specified offset to the file and then figure out what offset within that sector you tried to seek to and then stores that information so the next read/write occurs at that location.  This process is slow in a sense as it make involve looping through a table, but the process is much faster than a typical disk read/write operation so its overhead in disk intensive code is negligable.


For this project you might consider looking at more advanced file organizations.  Like using external indexes to speed access, for example.

Commented:
>> >> lseek takes almost exactly 0 time to execute.
>> however, you can expect the next read() to
>> take longer because of the lseek
You can't be sure of either of these statements.  it depends on how the implimenter designed things.  If the implimenter makes lseek() perform the work, then it will take time, otherwise the work will be delayed until the next read/write operation.  (on DOS/Windows its very likely that the work will be delayed until the next read/write).  but no matter what, you will eventually have to "pay the price".

Author

Commented:
>For this project you might consider looking at more advanced file organizations.  Like using external
indexes to speed access, for example.

Can you expand a few words on that.

Yair

Commented:
>> Can you expand a few words on that.
Unfortunatley the expanded words are called "books".

There are many good books dealing with algorithms, especially sorting and especially external sorting (meaning the sorted material is stored on disk) as well as books on DB design.

I like using seperate index and database files.  The database files stores the actual data, sort of like a disk-based array but in random order.  Well basically in the order in which the records were added as each record is added at the end.  Then a seperate index file is used to track the sort keys and the offsets or record nubmers to the records they sort for.  So the index files is the only file that needs to be kept "sorted"  (its only sorted in a sense) and since it is comparitibly small it can be kept sorted more quickly than the larger Database file.  You can access the database records in sorted order by going through the index in sorted order and for each key in the index, looking up (a quick direct seek) the record in the database file.  There are many different types of indexing algorithms, I like knuth's N-way trees.  They are extradonarily fast for, adding, deleting, exact and near searching,  and sequential operations, they are relatively small, having no more than 50% wasted space.  
CERTIFIED EXPERT
Author of the Year 2009

Commented:
>>it depends on how the implimenter designed things.
>> If the implimenter makes lseek() perform the work, then it will take time...

I'm trying to imagine a professional programmer sitting down to write an operating system and or to implement  stream handling and deciding to position the drive head and do other time-consuming operations in the lseek call.  Tyying.... trying... trying... Nope, I just can't imagine it.

-- Dan

Commented:
>> deciding to position the drive head and do other >> time-consuming operations in the lseek call
Where did I say that fseek() woudl reposition the drive head?  fseek() would never position the drive head.  That makes no sense since there can be multiple files open.  However it certainly might (very likely will) reposition the OS file pointer and that is not likely to be a trivial operation.  It will usually involve searching through file allocation tables and other data structures in order to calculate the location on disk cooresponding to the requested file offset.

Author

Commented:
I'll take nietod answer, and I want to thank DanRollins.

Yair