Solved

# How do I delete and then search more than 20000 strings from Binary Search Tree?

Posted on 2008-10-25
352 Views
I have an assignment where I will have to read 88800 strings and store it in a vector. Then insert all the strings in the Binary Search tree. After which I will have to Random shuffle the vector and pick top 20000 strings subset. delete those strings from BST. Then do a search of the same subset in the BST. The same deletion and search I have to repeat for 40000, 60000 and 80000 respectively after random shuffling the vector each time.

I have implemented the basic BST code. I have problems when  delete and search a large number of data.
Say sometimes even 10000.... I am attaching the code. Pls look at it let me know how to solve....
BinarySearchTree.zip
0
Question by:Beebutter
• 19
• 17

LVL 13

Expert Comment

ID: 22805563
>>I have problems when  delete
Is the problem an exception being thrown in this loop?
for (unsigned int i = 0; i < 20000; i++) {
b.remove(lastNames[i]);
}

I can see in BinarySearchTree::remove, for a leaf node, that for
delete curr;
curr is an invalid pointer.
0

LVL 13

Expert Comment

ID: 22805584
This looks like one problem...see <<<< marker
} else // left child present, no right child
{
if (parent->left == curr) {
parent->left = curr->left;
delete curr;
cout << d << " DELETED!!!!" << endl;
} else {
parent->right = curr->right/*left*/; //<<<<<<<<<<<<<<
delete curr;
cout << d << " DELETED!!!!" << endl;
}
}

Doesn't fix it, though.
0

LVL 13

Accepted Solution

josgood earned 500 total points
ID: 22805620
Also in BinarySearchTree::remove, note the <<<< marker which shows the necessary change
//Node with 2 children
// replace node with smallest value in right subtree
if (curr->left != NULL && curr->right != NULL) {
tree_node* chkr;
chkr = curr->right;
if ((chkr->left == NULL) && (chkr->right == NULL)) {
//---------------------------------------------
//curr = chkr; // this copies the right & left ptrs, as well as the data
// must preserve the left ptr
curr->data = chkr->data; // <<<<<<<<<<<<<<<<<<<
//---------------------------------------------

This fixes the crash.
0

Author Comment

ID: 22805776
Hi Josgood

Thank you very much. I did the change you told me. Actually I am a java programmer and very new to C++. I am finding it dificult to debug the problem. Now the program works without crashing. But still errors do occur. In remove() , I have first done a search and only if the string is found, I do the delete. If string is not found , I display "not found". As I have inserted all the 88800 strings, deleting any number of strings below that number should be done. But for some strings, the remove() displays "NOT FOUND" in deletion.

Also the same error occurs in search() also. After deleting the strings, some strings display as "found". This should not happen. Pls help me fixing the problem.
0

LVL 13

Expert Comment

ID: 22805839
The first example I see where remove does not find a string is "CHANOINE".

Are you seeing the same thing?
0

Author Comment

ID: 22805851
No Josgood. I tried deleting 20000 strings. For me the first one that displayed found was "CHERCHIO". The CHANOINE doesnt appear in my deletion list at all.
0

LVL 13

Expert Comment

ID: 22805887
Well, I guess that makes sense...random_shuffle is random, after all.  If this gets difficult, we should switch to the version that takes a seed value so we can stay in sync.

It looks to me as though there is an issue in the way the BST is being built.

There's no good way to show this in a text window, but let me try.  You could have a BST that looks like this
B
A<   >C
>D
where is A is B->left, C is B->right, and D is C->right.  Using the same notation, watching remove work its way down the tree, I'm seeing (in this run) the following tree...This isn't complete, it is just the branches the code is following
Johnson
Brown<
>Davis     (Davis is Brown->right)
Clark<                           (Clark is Davis->left)
Carter<                         (Carter is Clark->right)
>Chavez
Castillo<
>Chapman
Castro<
and so on
0

Author Comment

ID: 22805897
Thanks for explaining that to me. But is there any way to fix this problem?
0

LVL 13

Expert Comment

ID: 22805909
>>an issue in the way the BST is being built
I take that back.  I just went to refresh my BST knowledge a bit and I think I was wrong.

>>is there any way to fix this problem
Of course.  I need a few minutes to look at this.  I won't go away without telling you.
0

LVL 13

Expert Comment

ID: 22805956
OK, I think I see what is going on here.

The BST code is working properly...it is being tested incorrectly.

Using my example of "CHANOINE", "CHANOINE" exists in the lastNames array and is copied into the BST.  Then "CHANOINE" gets removed from the BST because it is one of the first 20,000 in lastNames.  It actually is one of the first 1000.

So when you
for (unsigned int i = 0; i < 1000; i++) {
b.search(lastNames[i]);
}
"CHANOINE" exists in lastNames but not in the BST because it was deleted from the BST and *not from lastNames*.

Does that make sense?
0

Author Comment

ID: 22805980
In that case if I try deleting 20000 strings and then search 20000 strings, the same problem persists.

I understand that the "lastnames" still has that CHANOINE string but it should not be present in BST as it was deleted. Then why it rreturns "FOUND" after deleting?
Also there are some string which are unable to be found in the remove().
0

LVL 13

Expert Comment

ID: 22805994
>>I understand that the "lastnames" still has that CHANOINE string but it should not be present in BST as it was deleted
For my "CHANOINE" case, that is exactly what is happening.  I thought that was the question you were asking.

>>why it rreturns "FOUND" after deleting?
This is what you're really asking...For my "CHANOINE" case, you would be seeing that search() finds "CHANOINE" in the BST even though it was removed from the BST.  Am I tracking with you?

>>Also there are some string which are unable to be found in the remove()
I will look into this while you answer whether I am tracking with you
0

Author Comment

ID: 22805998
Yes you are tracking with me.
As I told you, when a string from the lastnames vector is removed, why the search() function still finds the string?
0

LVL 13

Expert Comment

ID: 22806071
I'm still here.

I haven't yet duplicated the problem where a string has been removed from the BST and yet search finds it.  Can you tell me a string for which you have this problem?  I recorded all deleted strings in a vector.  For me, every string that search finds is not a member of the vector.

Flipping the question around, is "ZIBERT" an example of a string that should be found by search and isn't?
0

Author Comment

ID: 22806088
For me "ZILBERT" is not a problem.

For the string STANCILL , the remove () displays
STANCILL DELETED!!!!

the search () displays
STANCILL FOUND!!!!!
This is the problem...
0

Author Comment

ID: 22806090
Try deleting and searching 20000 strings then you might replicate the same error I got.
0

Author Comment

ID: 22806193
Hi

Were you able to figure it out?
0

LVL 13

Expert Comment

ID: 22806200
I have now duplicated the problem -- I have it with "REBEIRO" -- the string is confirmed as a string that should have been deleted from the BST, but search finds it anyway.

I need to shut down for the evening, but I will look at this in the morning.  I'm on Pacific time and should look at it around 9AM.

Sorry I couldn't get it fixed tonight, but we should be able to handle it tomorrow.
0

Author Comment

ID: 22806206
Definitely.. it would be very helpful if thats solved.
Thank you
0

Author Comment

ID: 22807201
Is there any one who could help me in the program attached?

In remove() , I have first done a search and only if the string is found, I do the delete. If string is not found , I display "not found". As I have inserted all the 88800 strings, deleting any number of strings below that number should be done. But for some strings, the remove() displays "NOT FOUND" in deletion.

Also the same error occurs in search() also. After deleting the strings, some strings display as "found". This should not happen. Pls help me fixing the problem.
0

LVL 13

Expert Comment

ID: 22807636
Good morning.  I'm looking at it now.
0

Author Comment

ID: 22807654
Thank you josgood
0

LVL 13

Expert Comment

ID: 22807773
Well!  It looks like this bug is *my* fault!

In my second response, I suggested
parent->right = curr->right/*left*/; //<<<<<<<<<<<<<<
and that suggestion was wrong.  I didn't understand your code well enough at that time.

Restore it to the way you originally had it
parent->right = curr->left; // my bad
and the BST code works fine.

0

Author Comment

ID: 22807809
That piece of block you have mentioned, I do have the same way you have suggested now but still the problem persists. i.e.
else // left child present, no right child
{
if (parent->left == curr) {
parent->left = curr->left;
delete curr;
cout << d << " DELETED!!!!" << endl;
} else {
parent->right = curr->left; <<<<<<<<<<<<<<<<<< // I have it the same way you mentioned
delete curr;
cout << d << " DELETED!!!!" << endl;
}
}
0

LVL 13

Expert Comment

ID: 22807831
It must be that our source is out of sync.  Mine seems to be working -- I've modified the main to check every name -- removed names are really removed from the BST and names not removed are still in the BST.

I've made some other changes and something in them must be the relevant difference.

My code is attached.  Would you please post yours?

MyCode.txt
0

Author Comment

ID: 22807929
Look like you have done many changes to the program which I didnt do. I have included my program, after doing the changes you have done in your code.
Seems to me more modifications have to be done. Pls look through this and let me know.
BinarySearchTree.cpp.txt.txt
0

LVL 13

Expert Comment

ID: 22808017
A lot of those changes are just experiments...

I have your code and will start looking.  I'll be a few minutes.
0

Author Comment

ID: 22808084
Ok Josgood. I will keep waiting. Thank you
0

LVL 13

Expert Comment

ID: 22808241
I think I'm on the track of the problem.  Am taking a lunch break and will be back.

I believe that we need to replace the current node's data with that of the greatest (rightmost) name on the left subtree, and then get the pointers right.

Back in a half hour or so.
0

Author Comment

ID: 22808257
Thank you Josgood. I shall definitely wait for that.
0

LVL 13

Expert Comment

ID: 22808595
Are you constrained to an iterative solution or is a recursive solution acceptable?

I have a pair of problem situations fixed and now I see a third.
0

Author Comment

ID: 22808603
I have to do an experiment by trying to search and delete batch of strings.
The experiment should be run iteratively.
But removing and searching each string can be done recursively also.
0

LVL 13

Expert Comment

ID: 22808641
It's a funny thing...sometimes when you give up then the answer comes to you.

I think I have it working...running a lengthy test...guessing 10-15 min.
0

LVL 13

Expert Comment

ID: 22808734
Well the test is still running and it is slower than I expected.  While we're waiting I thought I'd send you the corrected area of the code.

I commented out the printf's just to make my job easier...

Tell me if these changes make sense to you or you would like some explanation.

} else // right child has children
{
//if the node's right child has a left child
// Move all the way down left to locate smallest element

if ((curr->left)->/*left*/right != NULL) {
tree_node* lcurrp = curr;
tree_node* lcurr  = lcurrp->left;
while (lcurr->/*left*/right != NULL) {
lcurrp = lcurr;
lcurr = lcurr->/*left*/right;
}
curr->data = lcurr->data;
lcurrp->right = lcurr->left;
delete lcurr;
//cout << d << " DELETED!!!!" << endl;
} else {
// Current node's left child doesn't have a right child and so is greatest on the left branch
tree_node* tmp;
tmp = curr->/*right*/left;
curr->data = tmp->data;
curr->/*right*/left = tmp->/*right*/left;
delete tmp;
//cout << d << " DELETED!!!!" << endl;
}

}
0

LVL 13

Expert Comment

ID: 22808770
The iterative test
bCopy = new BinarySearchTree(*b);
for (batchSize = 20000; batchSize < lastNames.size(); (batchSize += 20000)) {
// processing
b = new BinarySearchTree(*bCopy);
}
will fail on the second iteration because BinarySearchTree doesn't have a copy constructor that performs a deep copy.

For your purposes, I suggest running the test with batchSize = 20000, running it again with  batchSize = 40000, and so on.  Alternatively, build the tree newly for each test iteration.
0

Author Comment

ID: 22809013
Ok Josgood I shall do that.
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

### Suggested Solutions

Many modern programming languages support the concept of a property -- a class member that combines characteristics of both a data member and a method.  These are sometimes called "smart fields" because you can add logic that is applied automaticallâ€¦
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilationâ€¦
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.