Link to home
Start Free TrialLog in
Avatar of Beebutter
Beebutter

asked on

How do I delete and then search more than 20000 strings from Binary Search Tree?

I have an assignment where I will have to read 88800 strings and store it in a vector. Then insert all the strings in the Binary Search tree. After which I will have to Random shuffle the vector and pick top 20000 strings subset. delete those strings from BST. Then do a search of the same subset in the BST. The same deletion and search I have to repeat for 40000, 60000 and 80000 respectively after random shuffling the vector each time.

I have implemented the basic BST code. I have problems when  delete and search a large number of data.
Say sometimes even 10000.... I am attaching the code. Pls look at it let me know how to solve....
BinarySearchTree.zip
Avatar of josgood
josgood
Flag of United States of America image

>>I have problems when  delete
Is the problem an exception being thrown in this loop?
      for (unsigned int i = 0; i < 20000; i++) {
            b.remove(lastNames[i]);
      }

I can see in BinarySearchTree::remove, for a leaf node, that for
            delete curr;
curr is an invalid pointer.
This looks like one problem...see <<<< marker
            } else // left child present, no right child
            {
                  if (parent->left == curr) {
                        parent->left = curr->left;
                        delete curr;
                        cout << d << " DELETED!!!!" << endl;
                  } else {
                        parent->right = curr->right/*left*/; //<<<<<<<<<<<<<<
                        delete curr;
                        cout << d << " DELETED!!!!" << endl;
                  }
            }

Doesn't fix it, though.
ASKER CERTIFIED SOLUTION
Avatar of josgood
josgood
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Beebutter
Beebutter

ASKER

Hi Josgood

Thank you very much. I did the change you told me. Actually I am a java programmer and very new to C++. I am finding it dificult to debug the problem. Now the program works without crashing. But still errors do occur. In remove() , I have first done a search and only if the string is found, I do the delete. If string is not found , I display "not found". As I have inserted all the 88800 strings, deleting any number of strings below that number should be done. But for some strings, the remove() displays "NOT FOUND" in deletion.

Also the same error occurs in search() also. After deleting the strings, some strings display as "found". This should not happen. Pls help me fixing the problem.
The first example I see where remove does not find a string is "CHANOINE".

Are you seeing the same thing?
No Josgood. I tried deleting 20000 strings. For me the first one that displayed found was "CHERCHIO". The CHANOINE doesnt appear in my deletion list at all.
Well, I guess that makes sense...random_shuffle is random, after all.  If this gets difficult, we should switch to the version that takes a seed value so we can stay in sync.

It looks to me as though there is an issue in the way the BST is being built.  

There's no good way to show this in a text window, but let me try.  You could have a BST that looks like this
      B
 A<   >C
        >D
where is A is B->left, C is B->right, and D is C->right.  Using the same notation, watching remove work its way down the tree, I'm seeing (in this run) the following tree...This isn't complete, it is just the branches the code is following
          Johnson
Brown<
                     >Davis     (Davis is Brown->right)
Clark<                           (Clark is Davis->left)
Carter<                         (Carter is Clark->right)
                     >Chavez
Castillo<
                     >Chapman
Castro<
and so on
Thanks for explaining that to me. But is there any way to fix this problem?
>>an issue in the way the BST is being built
I take that back.  I just went to refresh my BST knowledge a bit and I think I was wrong.

>>is there any way to fix this problem
Of course.  I need a few minutes to look at this.  I won't go away without telling you.
OK, I think I see what is going on here.

The BST code is working properly...it is being tested incorrectly.

Using my example of "CHANOINE", "CHANOINE" exists in the lastNames array and is copied into the BST.  Then "CHANOINE" gets removed from the BST because it is one of the first 20,000 in lastNames.  It actually is one of the first 1000.

So when you
      for (unsigned int i = 0; i < 1000; i++) {
            b.search(lastNames[i]);
      }
"CHANOINE" exists in lastNames but not in the BST because it was deleted from the BST and *not from lastNames*.

Does that make sense?
In that case if I try deleting 20000 strings and then search 20000 strings, the same problem persists.

I understand that the "lastnames" still has that CHANOINE string but it should not be present in BST as it was deleted. Then why it rreturns "FOUND" after deleting?
Also there are some string which are unable to be found in the remove().
>>I understand that the "lastnames" still has that CHANOINE string but it should not be present in BST as it was deleted
For my "CHANOINE" case, that is exactly what is happening.  I thought that was the question you were asking.

>>why it rreturns "FOUND" after deleting?
This is what you're really asking...For my "CHANOINE" case, you would be seeing that search() finds "CHANOINE" in the BST even though it was removed from the BST.  Am I tracking with you?

>>Also there are some string which are unable to be found in the remove()
I will look into this while you answer whether I am tracking with you
Yes you are tracking with me.
As I told you, when a string from the lastnames vector is removed, why the search() function still finds the string?
I'm still here.  

I haven't yet duplicated the problem where a string has been removed from the BST and yet search finds it.  Can you tell me a string for which you have this problem?  I recorded all deleted strings in a vector.  For me, every string that search finds is not a member of the vector.

Flipping the question around, is "ZIBERT" an example of a string that should be found by search and isn't?
For me "ZILBERT" is not a problem.

For the string STANCILL , the remove () displays
STANCILL DELETED!!!!

the search () displays
STANCILL FOUND!!!!!      
This is the problem...
Try deleting and searching 20000 strings then you might replicate the same error I got.
Hi

Were you able to figure it out?
I have now duplicated the problem -- I have it with "REBEIRO" -- the string is confirmed as a string that should have been deleted from the BST, but search finds it anyway.

I need to shut down for the evening, but I will look at this in the morning.  I'm on Pacific time and should look at it around 9AM.

Sorry I couldn't get it fixed tonight, but we should be able to handle it tomorrow.
Definitely.. it would be very helpful if thats solved.
Thank you
Is there any one who could help me in the program attached?

In remove() , I have first done a search and only if the string is found, I do the delete. If string is not found , I display "not found". As I have inserted all the 88800 strings, deleting any number of strings below that number should be done. But for some strings, the remove() displays "NOT FOUND" in deletion.

Also the same error occurs in search() also. After deleting the strings, some strings display as "found". This should not happen. Pls help me fixing the problem.
Good morning.  I'm looking at it now.
Thank you josgood
Well!  It looks like this bug is *my* fault!

In my second response, I suggested
                        parent->right = curr->right/*left*/; //<<<<<<<<<<<<<<
and that suggestion was wrong.  I didn't understand your code well enough at that time.

Restore it to the way you originally had it
                        parent->right = curr->left; // my bad
and the BST code works fine.

Sorry about that.

That piece of block you have mentioned, I do have the same way you have suggested now but still the problem persists. i.e.
else // left child present, no right child
{
if (parent->left == curr) {
parent->left = curr->left;
delete curr;
cout << d << " DELETED!!!!" << endl;
} else {
parent->right = curr->left; <<<<<<<<<<<<<<<<<< // I have it the same way you mentioned
delete curr;
cout << d << " DELETED!!!!" << endl;
}
}
It must be that our source is out of sync.  Mine seems to be working -- I've modified the main to check every name -- removed names are really removed from the BST and names not removed are still in the BST.

I've made some other changes and something in them must be the relevant difference.

My code is attached.  Would you please post yours?

MyCode.txt
Look like you have done many changes to the program which I didnt do. I have included my program, after doing the changes you have done in your code.
Seems to me more modifications have to be done. Pls look through this and let me know.
BinarySearchTree.cpp.txt.txt
A lot of those changes are just experiments...

I have your code and will start looking.  I'll be a few minutes.
Ok Josgood. I will keep waiting. Thank you
I think I'm on the track of the problem.  Am taking a lunch break and will be back.

I believe that we need to replace the current node's data with that of the greatest (rightmost) name on the left subtree, and then get the pointers right.

Back in a half hour or so.
Thank you Josgood. I shall definitely wait for that.
Are you constrained to an iterative solution or is a recursive solution acceptable?

I have a pair of problem situations fixed and now I see a third.
I have to do an experiment by trying to search and delete batch of strings.
The experiment should be run iteratively.
But removing and searching each string can be done recursively also.
It's a funny thing...sometimes when you give up then the answer comes to you.

I think I have it working...running a lengthy test...guessing 10-15 min.
Well the test is still running and it is slower than I expected.  While we're waiting I thought I'd send you the corrected area of the code.

I commented out the printf's just to make my job easier...

Tell me if these changes make sense to you or you would like some explanation.

            } else // right child has children
            {
                  //if the node's right child has a left child
                  // Move all the way down left to locate smallest element

                  if ((curr->left)->/*left*/right != NULL) {
                        tree_node* lcurrp = curr;
                        tree_node* lcurr  = lcurrp->left;
                        while (lcurr->/*left*/right != NULL) {
                              lcurrp = lcurr;
                              lcurr = lcurr->/*left*/right;
                        }
                        curr->data = lcurr->data;
                        lcurrp->right = lcurr->left;
                        delete lcurr;
                        //cout << d << " DELETED!!!!" << endl;
                  } else {
                        // Current node's left child doesn't have a right child and so is greatest on the left branch
                        tree_node* tmp;
                        tmp = curr->/*right*/left;
                        curr->data = tmp->data;
                        curr->/*right*/left = tmp->/*right*/left;
                        delete tmp;
                        //cout << d << " DELETED!!!!" << endl;
                  }

            }
The iterative test
      bCopy = new BinarySearchTree(*b);
      for (batchSize = 20000; batchSize < lastNames.size(); (batchSize += 20000)) {
            // processing
                  b = new BinarySearchTree(*bCopy);
      }
will fail on the second iteration because BinarySearchTree doesn't have a copy constructor that performs a deep copy.

For your purposes, I suggest running the test with batchSize = 20000, running it again with  batchSize = 40000, and so on.  Alternatively, build the tree newly for each test iteration.
Ok Josgood I shall do that.