Solved

b-tree implementation

Posted on 2006-10-28
9
1,019 Views
Last Modified: 2008-01-09
In my adventures with disk-based data structures in C, I'm looking into understanding b-trees.  

I've read a lot about how they work, but I have a question about how they're actually stored on disk.  A typical b-tree diagram is very structured looking, consisting of multiple levels of nodes and branches, originating from a single root node.  But the actual reality seems to be that a b-tree is just a linear sequence of nodes and offset pointers stored one after the other in a file.  

My question is, when a new node is created because of a split, where is the new node actually stored?  Is it just appended to the end of the file?  If so, wouldn't this cause the b-tree to lose locality of reference after a while?  Although a typical diagram of a b-tree shows sibling nodes all grouped together, in reality any two sibling nodes may be at totally different ends of the file, right?

Also, if a chain of splits happens all the way up to the root node, a new root node needs to be created.  Is this new root node also just appended at the end of the file?  
0
Comment
Question by:chsalvia
  • 5
  • 3
9 Comments
 
LVL 45

Expert Comment

by:sunnycoder
ID: 17827343
Hi chsalvia,

1. It is possible to implement trees in array to maintain locality of reference ... e.g. a simple binary tree with children of node i located at 2i+1 and 2i+2

2. Split affects at most 2 levels

Cheers!
sunnycoder
0
 

Author Comment

by:chsalvia
ID: 17828002
>> 1. It is possible to implement trees in array to maintain locality of reference ... e.g. a simple binary tree with children of node i located at 2i+1 and 2i+2

Yes, but in a standard b-tree, wouldn't it be the case that splits would often destroy locality of reference?  If for example, you have a b-tree with say 1,000 nodes, and then you do an insert.  The insert requires, say, the 12th node to be split.  So a new node has to be written to disk now.  Where do we store the new node?  We have to store it at the end of the file.  So now it is very far from it's sibling node, the 12th node.

Isn't this something that would typically happen with a b-tree?
0
 
LVL 45

Expert Comment

by:sunnycoder
ID: 17840824
No .. you would have to re-organize some elements towards end of the array but that would be all ... Array representations do not write to end of array ... Every node has a fixed index depending on its position in the hierarchy, if the node is missing, that index is left blank.

Split in this case would mean some reorg in affected level ... moving around data in arrays is expensive, but it wont be very frequent.
0
Connect further...control easier

With the ATEN CE624, you can now enjoy a high-quality visual experience powered by HDBaseT technology and the convenience of a single Cat6 cable to transmit uncompressed video with zero latency and multi-streaming for dual-view applications where remote access is required.

 
LVL 8

Expert Comment

by:adg080898
ID: 17846184
You should use "file offset" pointers for your tree. Have a root pointer at a known location, then just follow file offsets for walking. This way, the nodes never need to be moved around in bulk.

Locality of reference is not a big deal, because the depth of a b-tree is so shallow, you are not likely to touch a lot of nodes when searching.

0
 
LVL 45

Expert Comment

by:sunnycoder
ID: 17847624
>But the actual reality seems to be that a b-tree is just a linear sequence of nodes and offset pointers stored one after the other in a file.  

Range searches on a B-tree can indeed touch a lot of nodes. That is where locality of reference is important.
To search a single node, you need to access only one node per level. This holds true not only for B tree but any tree which supports guided search - even binary search tree. So locality of reference does not really count unless you are thinking of range searches. Btw, B+ trees are better suited for range search.
0
 

Author Comment

by:chsalvia
ID: 17848600
>> Btw, B+ trees are better suited for range search.

Yes, I've read about that as well.  My understanding is that B+ trees only store keys on the bottom level, and the higher level nodes just contain pointers.

My question is, in a B+ tree, how would you direct the program from one pointer to another, without doing a key comparison?  Since the keys are stored in the bottom nodes, how does the program get from the root to the bottom node in the first place?  What does it compare to decide which node to go to next?
0
 
LVL 45

Expert Comment

by:sunnycoder
ID: 17848614
It has some values in the internal nodes for guiding the search. However, those values are not (neccessarily) keys.e.g.
                         25
                      /       \
                   12        30
                 /    \      /    \
                6    14   27    35

Keys are at leaf ... values in the non-leaf node are used only to guide the search. Leaves are connected using a bi-directional liked list.
0
 

Author Comment

by:chsalvia
ID: 17848794
Hmm...if the non-leaf nodes just contain small values, it would seem that the fixed-size of leaf nodes would be different than the fixed size of non-leaf nodes.  Also, it seems that the non-leaf nodes would be relatively small.  Perhaps all non-leaf nodes could be loaded into memory, so that only leaf-nodes would require disk reads.
0
 
LVL 45

Accepted Solution

by:
sunnycoder earned 250 total points
ID: 17848815
Value + couple of pointers for forming a doubly linked list. Typically smaller than non-leaf node.

>Perhaps all non-leaf nodes could be loaded into memory, so that only leaf-nodes would require disk reads.
Number of non-leaf nodes could still be large depending on how much data you have. Loading first few levels - depending of resource capability of your system - might be a better idea.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you thought about creating an iPhone application (app), but didn't even know where to get started? Here's how: ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Important pre-programming comments: I’ve never tri…
This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
The goal of this video is to provide viewers with basic examples to understand and use pointers in the C programming language.
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question