Posted on 2006-10-28
In my adventures with disk-based data structures in C, I'm looking into understanding b-trees.
I've read a lot about how they work, but I have a question about how they're actually stored on disk. A typical b-tree diagram is very structured looking, consisting of multiple levels of nodes and branches, originating from a single root node. But the actual reality seems to be that a b-tree is just a linear sequence of nodes and offset pointers stored one after the other in a file.
My question is, when a new node is created because of a split, where is the new node actually stored? Is it just appended to the end of the file? If so, wouldn't this cause the b-tree to lose locality of reference after a while? Although a typical diagram of a b-tree shows sibling nodes all grouped together, in reality any two sibling nodes may be at totally different ends of the file, right?
Also, if a chain of splits happens all the way up to the root node, a new root node needs to be created. Is this new root node also just appended at the end of the file?