Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

How do I delete and then search more than 20000 strings from Binary Search Tree?

Posted on 2008-10-25
36
353 Views
Last Modified: 2011-10-19
I have an assignment where I will have to read 88800 strings and store it in a vector. Then insert all the strings in the Binary Search tree. After which I will have to Random shuffle the vector and pick top 20000 strings subset. delete those strings from BST. Then do a search of the same subset in the BST. The same deletion and search I have to repeat for 40000, 60000 and 80000 respectively after random shuffling the vector each time.

I have implemented the basic BST code. I have problems when  delete and search a large number of data.
Say sometimes even 10000.... I am attaching the code. Pls look at it let me know how to solve....
BinarySearchTree.zip
0
Comment
Question by:Beebutter
  • 19
  • 17
36 Comments
 
LVL 13

Expert Comment

by:josgood
ID: 22805563
>>I have problems when  delete
Is the problem an exception being thrown in this loop?
      for (unsigned int i = 0; i < 20000; i++) {
            b.remove(lastNames[i]);
      }

I can see in BinarySearchTree::remove, for a leaf node, that for
            delete curr;
curr is an invalid pointer.
0
 
LVL 13

Expert Comment

by:josgood
ID: 22805584
This looks like one problem...see <<<< marker
            } else // left child present, no right child
            {
                  if (parent->left == curr) {
                        parent->left = curr->left;
                        delete curr;
                        cout << d << " DELETED!!!!" << endl;
                  } else {
                        parent->right = curr->right/*left*/; //<<<<<<<<<<<<<<
                        delete curr;
                        cout << d << " DELETED!!!!" << endl;
                  }
            }

Doesn't fix it, though.
0
 
LVL 13

Accepted Solution

by:
josgood earned 500 total points
ID: 22805620
Also in BinarySearchTree::remove, note the <<<< marker which shows the necessary change
      //Node with 2 children
      // replace node with smallest value in right subtree
      if (curr->left != NULL && curr->right != NULL) {
            tree_node* chkr;
            chkr = curr->right;
            if ((chkr->left == NULL) && (chkr->right == NULL)) {
                  //---------------------------------------------
                  //curr = chkr; // this copies the right & left ptrs, as well as the data
                                 // must preserve the left ptr
                  curr->data = chkr->data; // <<<<<<<<<<<<<<<<<<<
                  //---------------------------------------------

This fixes the crash.  
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 

Author Comment

by:Beebutter
ID: 22805776
Hi Josgood

Thank you very much. I did the change you told me. Actually I am a java programmer and very new to C++. I am finding it dificult to debug the problem. Now the program works without crashing. But still errors do occur. In remove() , I have first done a search and only if the string is found, I do the delete. If string is not found , I display "not found". As I have inserted all the 88800 strings, deleting any number of strings below that number should be done. But for some strings, the remove() displays "NOT FOUND" in deletion.

Also the same error occurs in search() also. After deleting the strings, some strings display as "found". This should not happen. Pls help me fixing the problem.
0
 
LVL 13

Expert Comment

by:josgood
ID: 22805839
The first example I see where remove does not find a string is "CHANOINE".

Are you seeing the same thing?
0
 

Author Comment

by:Beebutter
ID: 22805851
No Josgood. I tried deleting 20000 strings. For me the first one that displayed found was "CHERCHIO". The CHANOINE doesnt appear in my deletion list at all.
0
 
LVL 13

Expert Comment

by:josgood
ID: 22805887
Well, I guess that makes sense...random_shuffle is random, after all.  If this gets difficult, we should switch to the version that takes a seed value so we can stay in sync.

It looks to me as though there is an issue in the way the BST is being built.  

There's no good way to show this in a text window, but let me try.  You could have a BST that looks like this
      B
 A<   >C
        >D
where is A is B->left, C is B->right, and D is C->right.  Using the same notation, watching remove work its way down the tree, I'm seeing (in this run) the following tree...This isn't complete, it is just the branches the code is following
          Johnson
Brown<
                     >Davis     (Davis is Brown->right)
Clark<                           (Clark is Davis->left)
Carter<                         (Carter is Clark->right)
                     >Chavez
Castillo<
                     >Chapman
Castro<
and so on
0
 

Author Comment

by:Beebutter
ID: 22805897
Thanks for explaining that to me. But is there any way to fix this problem?
0
 
LVL 13

Expert Comment

by:josgood
ID: 22805909
>>an issue in the way the BST is being built
I take that back.  I just went to refresh my BST knowledge a bit and I think I was wrong.

>>is there any way to fix this problem
Of course.  I need a few minutes to look at this.  I won't go away without telling you.
0
 
LVL 13

Expert Comment

by:josgood
ID: 22805956
OK, I think I see what is going on here.

The BST code is working properly...it is being tested incorrectly.

Using my example of "CHANOINE", "CHANOINE" exists in the lastNames array and is copied into the BST.  Then "CHANOINE" gets removed from the BST because it is one of the first 20,000 in lastNames.  It actually is one of the first 1000.

So when you
      for (unsigned int i = 0; i < 1000; i++) {
            b.search(lastNames[i]);
      }
"CHANOINE" exists in lastNames but not in the BST because it was deleted from the BST and *not from lastNames*.

Does that make sense?
0
 

Author Comment

by:Beebutter
ID: 22805980
In that case if I try deleting 20000 strings and then search 20000 strings, the same problem persists.

I understand that the "lastnames" still has that CHANOINE string but it should not be present in BST as it was deleted. Then why it rreturns "FOUND" after deleting?
Also there are some string which are unable to be found in the remove().
0
 
LVL 13

Expert Comment

by:josgood
ID: 22805994
>>I understand that the "lastnames" still has that CHANOINE string but it should not be present in BST as it was deleted
For my "CHANOINE" case, that is exactly what is happening.  I thought that was the question you were asking.

>>why it rreturns "FOUND" after deleting?
This is what you're really asking...For my "CHANOINE" case, you would be seeing that search() finds "CHANOINE" in the BST even though it was removed from the BST.  Am I tracking with you?

>>Also there are some string which are unable to be found in the remove()
I will look into this while you answer whether I am tracking with you
0
 

Author Comment

by:Beebutter
ID: 22805998
Yes you are tracking with me.
As I told you, when a string from the lastnames vector is removed, why the search() function still finds the string?
0
 
LVL 13

Expert Comment

by:josgood
ID: 22806071
I'm still here.  

I haven't yet duplicated the problem where a string has been removed from the BST and yet search finds it.  Can you tell me a string for which you have this problem?  I recorded all deleted strings in a vector.  For me, every string that search finds is not a member of the vector.

Flipping the question around, is "ZIBERT" an example of a string that should be found by search and isn't?
0
 

Author Comment

by:Beebutter
ID: 22806088
For me "ZILBERT" is not a problem.

For the string STANCILL , the remove () displays
STANCILL DELETED!!!!

the search () displays
STANCILL FOUND!!!!!      
This is the problem...
0
 

Author Comment

by:Beebutter
ID: 22806090
Try deleting and searching 20000 strings then you might replicate the same error I got.
0
 

Author Comment

by:Beebutter
ID: 22806193
Hi

Were you able to figure it out?
0
 
LVL 13

Expert Comment

by:josgood
ID: 22806200
I have now duplicated the problem -- I have it with "REBEIRO" -- the string is confirmed as a string that should have been deleted from the BST, but search finds it anyway.

I need to shut down for the evening, but I will look at this in the morning.  I'm on Pacific time and should look at it around 9AM.

Sorry I couldn't get it fixed tonight, but we should be able to handle it tomorrow.
0
 

Author Comment

by:Beebutter
ID: 22806206
Definitely.. it would be very helpful if thats solved.
Thank you
0
 

Author Comment

by:Beebutter
ID: 22807201
Is there any one who could help me in the program attached?

In remove() , I have first done a search and only if the string is found, I do the delete. If string is not found , I display "not found". As I have inserted all the 88800 strings, deleting any number of strings below that number should be done. But for some strings, the remove() displays "NOT FOUND" in deletion.

Also the same error occurs in search() also. After deleting the strings, some strings display as "found". This should not happen. Pls help me fixing the problem.
0
 
LVL 13

Expert Comment

by:josgood
ID: 22807636
Good morning.  I'm looking at it now.
0
 

Author Comment

by:Beebutter
ID: 22807654
Thank you josgood
0
 
LVL 13

Expert Comment

by:josgood
ID: 22807773
Well!  It looks like this bug is *my* fault!

In my second response, I suggested
                        parent->right = curr->right/*left*/; //<<<<<<<<<<<<<<
and that suggestion was wrong.  I didn't understand your code well enough at that time.

Restore it to the way you originally had it
                        parent->right = curr->left; // my bad
and the BST code works fine.

Sorry about that.

0
 

Author Comment

by:Beebutter
ID: 22807809
That piece of block you have mentioned, I do have the same way you have suggested now but still the problem persists. i.e.
else // left child present, no right child
{
if (parent->left == curr) {
parent->left = curr->left;
delete curr;
cout << d << " DELETED!!!!" << endl;
} else {
parent->right = curr->left; <<<<<<<<<<<<<<<<<< // I have it the same way you mentioned
delete curr;
cout << d << " DELETED!!!!" << endl;
}
}
0
 
LVL 13

Expert Comment

by:josgood
ID: 22807831
It must be that our source is out of sync.  Mine seems to be working -- I've modified the main to check every name -- removed names are really removed from the BST and names not removed are still in the BST.

I've made some other changes and something in them must be the relevant difference.

My code is attached.  Would you please post yours?

MyCode.txt
0
 

Author Comment

by:Beebutter
ID: 22807929
Look like you have done many changes to the program which I didnt do. I have included my program, after doing the changes you have done in your code.
Seems to me more modifications have to be done. Pls look through this and let me know.
BinarySearchTree.cpp.txt.txt
0
 
LVL 13

Expert Comment

by:josgood
ID: 22808017
A lot of those changes are just experiments...

I have your code and will start looking.  I'll be a few minutes.
0
 

Author Comment

by:Beebutter
ID: 22808084
Ok Josgood. I will keep waiting. Thank you
0
 
LVL 13

Expert Comment

by:josgood
ID: 22808241
I think I'm on the track of the problem.  Am taking a lunch break and will be back.

I believe that we need to replace the current node's data with that of the greatest (rightmost) name on the left subtree, and then get the pointers right.

Back in a half hour or so.
0
 

Author Comment

by:Beebutter
ID: 22808257
Thank you Josgood. I shall definitely wait for that.
0
 
LVL 13

Expert Comment

by:josgood
ID: 22808595
Are you constrained to an iterative solution or is a recursive solution acceptable?

I have a pair of problem situations fixed and now I see a third.
0
 

Author Comment

by:Beebutter
ID: 22808603
I have to do an experiment by trying to search and delete batch of strings.
The experiment should be run iteratively.
But removing and searching each string can be done recursively also.
0
 
LVL 13

Expert Comment

by:josgood
ID: 22808641
It's a funny thing...sometimes when you give up then the answer comes to you.

I think I have it working...running a lengthy test...guessing 10-15 min.
0
 
LVL 13

Expert Comment

by:josgood
ID: 22808734
Well the test is still running and it is slower than I expected.  While we're waiting I thought I'd send you the corrected area of the code.

I commented out the printf's just to make my job easier...

Tell me if these changes make sense to you or you would like some explanation.

            } else // right child has children
            {
                  //if the node's right child has a left child
                  // Move all the way down left to locate smallest element

                  if ((curr->left)->/*left*/right != NULL) {
                        tree_node* lcurrp = curr;
                        tree_node* lcurr  = lcurrp->left;
                        while (lcurr->/*left*/right != NULL) {
                              lcurrp = lcurr;
                              lcurr = lcurr->/*left*/right;
                        }
                        curr->data = lcurr->data;
                        lcurrp->right = lcurr->left;
                        delete lcurr;
                        //cout << d << " DELETED!!!!" << endl;
                  } else {
                        // Current node's left child doesn't have a right child and so is greatest on the left branch
                        tree_node* tmp;
                        tmp = curr->/*right*/left;
                        curr->data = tmp->data;
                        curr->/*right*/left = tmp->/*right*/left;
                        delete tmp;
                        //cout << d << " DELETED!!!!" << endl;
                  }

            }
0
 
LVL 13

Expert Comment

by:josgood
ID: 22808770
The iterative test
      bCopy = new BinarySearchTree(*b);
      for (batchSize = 20000; batchSize < lastNames.size(); (batchSize += 20000)) {
            // processing
                  b = new BinarySearchTree(*bCopy);
      }
will fail on the second iteration because BinarySearchTree doesn't have a copy constructor that performs a deep copy.

For your purposes, I suggest running the test with batchSize = 20000, running it again with  batchSize = 40000, and so on.  Alternatively, build the tree newly for each test iteration.
0
 

Author Comment

by:Beebutter
ID: 22809013
Ok Josgood I shall do that.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

791 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question