[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

b-tree implementation

Posted on 2006-10-28
9
Medium Priority
?
1,044 Views
Last Modified: 2008-01-09
In my adventures with disk-based data structures in C, I'm looking into understanding b-trees.  

I've read a lot about how they work, but I have a question about how they're actually stored on disk.  A typical b-tree diagram is very structured looking, consisting of multiple levels of nodes and branches, originating from a single root node.  But the actual reality seems to be that a b-tree is just a linear sequence of nodes and offset pointers stored one after the other in a file.  

My question is, when a new node is created because of a split, where is the new node actually stored?  Is it just appended to the end of the file?  If so, wouldn't this cause the b-tree to lose locality of reference after a while?  Although a typical diagram of a b-tree shows sibling nodes all grouped together, in reality any two sibling nodes may be at totally different ends of the file, right?

Also, if a chain of splits happens all the way up to the root node, a new root node needs to be created.  Is this new root node also just appended at the end of the file?  
0
Comment
Question by:chsalvia
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
9 Comments
 
LVL 45

Expert Comment

by:sunnycoder
ID: 17827343
Hi chsalvia,

1. It is possible to implement trees in array to maintain locality of reference ... e.g. a simple binary tree with children of node i located at 2i+1 and 2i+2

2. Split affects at most 2 levels

Cheers!
sunnycoder
0
 

Author Comment

by:chsalvia
ID: 17828002
>> 1. It is possible to implement trees in array to maintain locality of reference ... e.g. a simple binary tree with children of node i located at 2i+1 and 2i+2

Yes, but in a standard b-tree, wouldn't it be the case that splits would often destroy locality of reference?  If for example, you have a b-tree with say 1,000 nodes, and then you do an insert.  The insert requires, say, the 12th node to be split.  So a new node has to be written to disk now.  Where do we store the new node?  We have to store it at the end of the file.  So now it is very far from it's sibling node, the 12th node.

Isn't this something that would typically happen with a b-tree?
0
 
LVL 45

Expert Comment

by:sunnycoder
ID: 17840824
No .. you would have to re-organize some elements towards end of the array but that would be all ... Array representations do not write to end of array ... Every node has a fixed index depending on its position in the hierarchy, if the node is missing, that index is left blank.

Split in this case would mean some reorg in affected level ... moving around data in arrays is expensive, but it wont be very frequent.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 8

Expert Comment

by:adg080898
ID: 17846184
You should use "file offset" pointers for your tree. Have a root pointer at a known location, then just follow file offsets for walking. This way, the nodes never need to be moved around in bulk.

Locality of reference is not a big deal, because the depth of a b-tree is so shallow, you are not likely to touch a lot of nodes when searching.

0
 
LVL 45

Expert Comment

by:sunnycoder
ID: 17847624
>But the actual reality seems to be that a b-tree is just a linear sequence of nodes and offset pointers stored one after the other in a file.  

Range searches on a B-tree can indeed touch a lot of nodes. That is where locality of reference is important.
To search a single node, you need to access only one node per level. This holds true not only for B tree but any tree which supports guided search - even binary search tree. So locality of reference does not really count unless you are thinking of range searches. Btw, B+ trees are better suited for range search.
0
 

Author Comment

by:chsalvia
ID: 17848600
>> Btw, B+ trees are better suited for range search.

Yes, I've read about that as well.  My understanding is that B+ trees only store keys on the bottom level, and the higher level nodes just contain pointers.

My question is, in a B+ tree, how would you direct the program from one pointer to another, without doing a key comparison?  Since the keys are stored in the bottom nodes, how does the program get from the root to the bottom node in the first place?  What does it compare to decide which node to go to next?
0
 
LVL 45

Expert Comment

by:sunnycoder
ID: 17848614
It has some values in the internal nodes for guiding the search. However, those values are not (neccessarily) keys.e.g.
                         25
                      /       \
                   12        30
                 /    \      /    \
                6    14   27    35

Keys are at leaf ... values in the non-leaf node are used only to guide the search. Leaves are connected using a bi-directional liked list.
0
 

Author Comment

by:chsalvia
ID: 17848794
Hmm...if the non-leaf nodes just contain small values, it would seem that the fixed-size of leaf nodes would be different than the fixed size of non-leaf nodes.  Also, it seems that the non-leaf nodes would be relatively small.  Perhaps all non-leaf nodes could be loaded into memory, so that only leaf-nodes would require disk reads.
0
 
LVL 45

Accepted Solution

by:
sunnycoder earned 1000 total points
ID: 17848815
Value + couple of pointers for forming a doubly linked list. Typically smaller than non-leaf node.

>Perhaps all non-leaf nodes could be loaded into memory, so that only leaf-nodes would require disk reads.
Number of non-leaf nodes could still be large depending on how much data you have. Loading first few levels - depending of resource capability of your system - might be a better idea.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
Examines three attack vectors, specifically, the different types of malware used in malicious attacks, web application attacks, and finally, network based attacks.  Concludes by examining the means of securing and protecting critical systems and inf…
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use for-loops in the C programming language.
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.
Suggested Courses

656 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question