Solved

b-tree implementation

Posted on 2006-10-28
9
1,001 Views
Last Modified: 2008-01-09
In my adventures with disk-based data structures in C, I'm looking into understanding b-trees.  

I've read a lot about how they work, but I have a question about how they're actually stored on disk.  A typical b-tree diagram is very structured looking, consisting of multiple levels of nodes and branches, originating from a single root node.  But the actual reality seems to be that a b-tree is just a linear sequence of nodes and offset pointers stored one after the other in a file.  

My question is, when a new node is created because of a split, where is the new node actually stored?  Is it just appended to the end of the file?  If so, wouldn't this cause the b-tree to lose locality of reference after a while?  Although a typical diagram of a b-tree shows sibling nodes all grouped together, in reality any two sibling nodes may be at totally different ends of the file, right?

Also, if a chain of splits happens all the way up to the root node, a new root node needs to be created.  Is this new root node also just appended at the end of the file?  
0
Comment
Question by:chsalvia
  • 5
  • 3
9 Comments
 
LVL 45

Expert Comment

by:sunnycoder
ID: 17827343
Hi chsalvia,

1. It is possible to implement trees in array to maintain locality of reference ... e.g. a simple binary tree with children of node i located at 2i+1 and 2i+2

2. Split affects at most 2 levels

Cheers!
sunnycoder
0
 

Author Comment

by:chsalvia
ID: 17828002
>> 1. It is possible to implement trees in array to maintain locality of reference ... e.g. a simple binary tree with children of node i located at 2i+1 and 2i+2

Yes, but in a standard b-tree, wouldn't it be the case that splits would often destroy locality of reference?  If for example, you have a b-tree with say 1,000 nodes, and then you do an insert.  The insert requires, say, the 12th node to be split.  So a new node has to be written to disk now.  Where do we store the new node?  We have to store it at the end of the file.  So now it is very far from it's sibling node, the 12th node.

Isn't this something that would typically happen with a b-tree?
0
 
LVL 45

Expert Comment

by:sunnycoder
ID: 17840824
No .. you would have to re-organize some elements towards end of the array but that would be all ... Array representations do not write to end of array ... Every node has a fixed index depending on its position in the hierarchy, if the node is missing, that index is left blank.

Split in this case would mean some reorg in affected level ... moving around data in arrays is expensive, but it wont be very frequent.
0
 
LVL 8

Expert Comment

by:adg080898
ID: 17846184
You should use "file offset" pointers for your tree. Have a root pointer at a known location, then just follow file offsets for walking. This way, the nodes never need to be moved around in bulk.

Locality of reference is not a big deal, because the depth of a b-tree is so shallow, you are not likely to touch a lot of nodes when searching.

0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 45

Expert Comment

by:sunnycoder
ID: 17847624
>But the actual reality seems to be that a b-tree is just a linear sequence of nodes and offset pointers stored one after the other in a file.  

Range searches on a B-tree can indeed touch a lot of nodes. That is where locality of reference is important.
To search a single node, you need to access only one node per level. This holds true not only for B tree but any tree which supports guided search - even binary search tree. So locality of reference does not really count unless you are thinking of range searches. Btw, B+ trees are better suited for range search.
0
 

Author Comment

by:chsalvia
ID: 17848600
>> Btw, B+ trees are better suited for range search.

Yes, I've read about that as well.  My understanding is that B+ trees only store keys on the bottom level, and the higher level nodes just contain pointers.

My question is, in a B+ tree, how would you direct the program from one pointer to another, without doing a key comparison?  Since the keys are stored in the bottom nodes, how does the program get from the root to the bottom node in the first place?  What does it compare to decide which node to go to next?
0
 
LVL 45

Expert Comment

by:sunnycoder
ID: 17848614
It has some values in the internal nodes for guiding the search. However, those values are not (neccessarily) keys.e.g.
                         25
                      /       \
                   12        30
                 /    \      /    \
                6    14   27    35

Keys are at leaf ... values in the non-leaf node are used only to guide the search. Leaves are connected using a bi-directional liked list.
0
 

Author Comment

by:chsalvia
ID: 17848794
Hmm...if the non-leaf nodes just contain small values, it would seem that the fixed-size of leaf nodes would be different than the fixed size of non-leaf nodes.  Also, it seems that the non-leaf nodes would be relatively small.  Perhaps all non-leaf nodes could be loaded into memory, so that only leaf-nodes would require disk reads.
0
 
LVL 45

Accepted Solution

by:
sunnycoder earned 250 total points
ID: 17848815
Value + couple of pointers for forming a doubly linked list. Typically smaller than non-leaf node.

>Perhaps all non-leaf nodes could be loaded into memory, so that only leaf-nodes would require disk reads.
Number of non-leaf nodes could still be large depending on how much data you have. Loading first few levels - depending of resource capability of your system - might be a better idea.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

Have you thought about creating an iPhone application (app), but didn't even know where to get started? Here's how: ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Important pre-programming comments: I’ve never tri…
Preface I don't like visual development tools that are supposed to write a program for me. Even if it is Xcode and I can use Interface Builder. Yes, it is a perfect tool and has helped me a lot, mainly, in the beginning, when my programs were small…
The goal of this video is to provide viewers with basic examples to understand and use pointers in the C programming language.
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use while-loops in the C programming language.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now