Link to home
Start Free TrialLog in
Avatar of killer455
killer455

asked on

Sorting, Integers/Int Strings

This is basically what im working with.  

So far:
It takes a file name in from the command line.
Inserts it into an array.
Runs the array through various sorts.

End Result:
I'm wanting to be able to count the number of comparisons that take place in each sort function and output them to the user.  Some functions I have tried to implement a comparison counter but I dont think im doing this right??

I also want to output the sorted array to a file.  However im wanting to use the input filename + a suffix.  Could this be done in a seperate output() function by passing the sorted array, I guess i dont know how to keep track of the input filename to use it in the output name??

I am planning on using integer data files but inserting them into an array once as integers and then a second time as integers as strings.  Then performing sort on both, I guess this mean I need to change the sorting algorithms and use compare() where needed??

Here is what I have so far... any design tips, hints, points, anything would be appreciated.  There is no specific question here, just a little collaboration on how I am doing so far. Thanks.

#include <iostream>
#include <string.h>
#include <fstream>
using namespace std;

void compare(int a, int b) {
  // will compare the integers here and return result
}

void compare(char* a, char* b) {
  // will compare here also, except as integer strings
}

template <class T>
void InsertionSort(T A[], int size)
{
  int compares = 0;
  for(int j=1; j<size; j++)
    {
      T key = A[j];
      int i = j - 1;
      while ((i >= 0) && (A[i] > key))
      {
        A[i+1] = A[i];
        i -= 1;
        compares++;
      }
      A[i+1] = key;
    }
}

template <class T>
void mergesort(T A[], int left, int right)
{
  if(left<right)
    {
      int mid=(left + right)/2;
      mergesort(A, left, mid);
      mergesort(A, mid+1, right);
      merge(A,left,right);
    }
}

template <class T>
void merge(T A[],int left, int right)
{
  int size=right-left+1,
  mid=(left+right)/2;

  T *b=new T[size];
  int count=0;
  for(int x=left;x<=mid;x++,count++)
    {
      b[count]=A[x];
    }
  for(int x=right;x>=mid+1;x--,count++)
    {
      b[count]=A[x];
    }
  for(int x=0,y=size-1,put=left;x<=y;put++)
    {
      if(b[x]<=b[y])
      {
        A[put]=b[x];
        x++;
      }
      else
      {
        A[put]=b[y];
        y--;
      }
    }
  delete[] b;
}

template <class T>
void quicksort(T A[], int left, int right, int pivot)
{
  int left_side, right_side;

  left_side = left;
  right_side = right;
 
  do
    {
      while (A[right_side] > pivot)
      right_side--;
      while (A[left_side] < pivot)
      left_side++;
      if (left_side <= right_side)
      {
        swap(A[left_side], A[right_side]);
        left_side++;
        right_side--;
      }
    }
  while (right_arrow >= left_arrow);
 
  if (left < right_arrow)
    quick_sort(A, left, right_arrow);
  if (left_arrow < right)
    quick_sort(A, left_arrow, right);
}  

template <class T>
void Heapsort(T A[], int size)
{
  BuildHeap(A, size);
  for(i=size; i>1; i--)
    {
      int tmp = A[0];
      A[0] = A[i-1];
      A[i-1] = tmp;
      size--;
      Heapify(A, 1, size);
    }
}

template <class T>
void BuildHeap(T A[], int size)
{
  for(i=size/2; i>0; i--)
    Heapify(A, i, size);
}

template <class T>
void Heapify(T A[], int i, int size)
{      
  int  l = 2 * i;
  int  r = 2 * i + 1;
  int largest = 0;

  if(l <= heapsize && A[l-1] > A[i-1])
    largest = l;
  else
    largest = i;

  if(r <= heapsize && A[r-1] > A[largest-1])
    largest = r;
  if(largest != i)
    {
      int tmp = A[i-1];
      A[i-1] = A[largest-1];
      A[largest-1] = tmp;
      Heapify(A, largest, heapsize);
    }
}

int main(int argc, char* argv[])
{
  ifstream input;  // initializes input
  int dataLength;

  if(argc > 1)
    {
      string inData = argv[1];
      input.open(inData);

      if (!input) {
      cerr << "Unable to open file." << endl;
      exit (1);
      }

      input >> itemCount;
      arraySize=itemCount-1;
      int Array[arraySize];
      int arrayCount = 0;

      while(input >> x) {
      Array[arrayCount] = x;
      arrayCount++;
      }

      intertionsort(Array, arraySize);
      mergesort(Array, 0, arraySize);

      int lastPivot = Array[arraySize];
      int middlePivot = Array[arraySize]/2;

      quicksort(Array, 0, arraySize, lastPivot);
      quicksort(Array, 0, arraySize, middlePivot);
      heapsort(Array, arraySize);
    }
}
Avatar of bcladd
bcladd

(1) Input file name: You have it in inData in the scope of the if statement in main. If you want to add ".out" to the file and use it for output you could use (just before end of if statement body)
Ooops.

You could try

    string outData = inData + ".out";
    output(Array, outData);

Assuming that output takes an array and the name of a file and uses an ofstream to print the data.

(2) My architecture hint from your other question pertains here even more so: write the comparison function and replace all your comparisons with calls to that function. It may need to be a templated function so that you can compare arbitrary types. Include inside the function the incrementing of the counter. You will still have to reset the comparison counter between sorts (if it is global). Alternatively you could make the comparison function a comparison functor that has the count contained in it.

(3) Your calculation of middlePivot is wrong. I think you want the /2 INSIDE the square brackets.

-bcl
Avatar of killer455

ASKER

Won't i need to output the array into the file with the suffix name added?  If so shouldnt this be done in a seperate output().  How would i get the input file name there?
Ok, when u say it may need to be templated.  Are you saying the compare function?  Well I want the program to be able to sort files (containing integers) as either integers or integer strings.  To do this wouldnt i need to use the function overloading (as above).??

The signature for the output function I dreamed up in my previous post was

    void output(int A[], const string & ofName);

thus you can open ofName for output.

Note: I just looked at the declaration of Array. C++ does not support dynamically sized arrays declared like that. You must either declare the array on the heap (using new[]) or use some dynamically sized container (might I suggest std::vector).

-bcl
Suggestion: You might want to comment out most of your program and see if you can read the input file and write the exact same array to the output file. That gives you a chance to play with C++'s syntax for declaring arrays and leads you to a working program sooner. Then uncomment one sort (one you think works) and work the kinks out of it. That will give you a feel for the syntax of using your comparison functions and making sure your templates work in one case.

I think you will find a templated compare function will make your life easier. Also, don't use char * if you can avoid it: use std::string.

-bcl
Overloading should work. I just thought it through and it should work fine. I still suggest trading char * for string. Building the integer strings will be easier, comparing them will be much easier,  populating the arrays will be much, much easier, etc.

-bcl
killer445, when I suggested that you use function overloading I had something else in mind (that you weren't gonna use templates) and I guess you misunderstood my last comment from the previous post. Here since you're using a function template, my suggestion would be to have 2 different arrays if you were going to do this. An array of type int, and the other one can be of a type you define to hold char*.
class myString {
   char* num;
   public:
   myString() { }
   setVal(int number) {
     //convert from your number into the array num
   }
   bool operator == (const myString& s) const {
      return strcmp(num, s.num) == 0;
   }
   bool operator > (const myString& s) const {
      return strcmp(s.num, num) < 0;
   }
   bool operator < (const myString& s) const {
      return strcmp(s.num, num) > 0;
   }
};

and in your main create an array of this:
  myString  mS[/*Same size as your int array*/];
  and when you get the data from file, make sure you call
  mS[/*index*/].setVal(/*the_int_data*/);

This class overloads your compare operators, so they'll work fine with the sort functions you're having now.
>>I think you will find a templated compare function will make your life easier. Also, don't use char * if you can
>> avoid it: use std::string.

I totally agree, there are >, <, == operators already defined for std::string so you can just make you compare function a function template and that should work. I think bcladd's got your answer for you there.
Ok those last couple posts kinda got me confused.  Here is some updated code, just to show u what im working on.  

Basically im confused about the whole templates thing now.  By a templated compare function.. do you mean.

template <class T>
void compare(T a, T b) {
  // will compare the integers here and return result
compares++;
}

instead of the two seperate ones I have below?

#include <iostream>
#include <string.h>
#include <fstream>
using namespace std;

void compare(int a, int b) {
  // will compare the integers here and return result
compares++;
}

void compare(string a, string b) {
  // will compare here also, except as integer strings
compares++;
}

template <class T>
void InsertionSort(T A[], int size)
{
  for(int j=1; compare(j, size); j++)     // compare j < size
    {
      T key = A[j];
      int i = j - 1;
      while ((compare(i, 0) && (compare(A[i], key))  // compare i>=0 and then A[i] > key
     {
       A[i+1] = A[i];
       i -= 1;
     }
      A[i+1] = key;
    }
}

template <class T>
void mergesort(T A[], int left, int right)
{
  if(compare(left, right))   // compare left<right
    {
      int mid=(left + right)/2;
      mergesort(A, left, mid);
      mergesort(A, mid+1, right);
      merge(A,left,right);
    }
}

template <class T>
void merge(T A[],int left, int right)
{
  int size=right-left+1,
  mid=(left+right)/2;

  T *b=new T[size];
  int count=0;
  for(int x=left;compare(x,mid);x++,count++)     // compare x<=mid
    {
      b[count]=A[x];
    }
  for(int x=right;compare(x, mid+1);x--,count++)      // compare x>=mid+1
    {
      b[count]=A[x];
    }
  for(int x=0,y=size-1,put=left;compare(x,y);put++)      // compare x<=y
    {
      if(compare(b[x],b[y]))        // compare b[x]<=b[y]
     {
       A[put]=b[x];
       x++;
     }
      else
     {
       A[put]=b[y];
       y--;
     }
    }
  delete[] b;
}

template <class T>
void quicksort(T A[], int left, int right, int pivot)
{
  int left_side, right_side;

  left_side = left;
  right_side = right;
 
  do
    {
      while (compare(A[right_side],pivot))    // compare A[right_side] > pivot
     right_side--;
      while (compare(A[left_side],pivot))     // compare A[left_side] < pivot
     left_side++;
      if (compare(left_side,right_side))      // compare left_side<=right_side
     {
       swap(A[left_side], A[right_side]);
       left_side++;
       right_side--;
     }
    }
  while (compare(right_side,left_side));   // compare right_side>=left_sude
 
  if (compare(left,right_side))                        // compare left < right_side
    quick_sort(A, left, right_side);
  if (compare(left_side,right))                     // compare left_side < right
    quick_sort(A, left_side, right);
}  

template <class T>
void Heapsort(T A[], int size)
{
  BuildHeap(A, size);
  for(i=size; compare(i,1); i--)           // compare i>1
    {
      int tmp = A[0];
      A[0] = A[i-1];
      A[i-1] = tmp;
      size--;
      Heapify(A, 1, size);
    }
}

template <class T>
void BuildHeap(T A[], int size)
{
  for(i=size/2; compare(i,0); i--)                 // compare i > 0
    Heapify(A, i, size);
}

template <class T>
void Heapify(T A[], int i, int size)
{      
  int  l = 2 * i;
  int  r = 2 * i + 1;
  int largest = 0;

  if(compare(l,heapsize) && compare(A[l-1],A[i-1]))  // l <= heapsize and A[l-1] > A[i-1]
    largest = l;
  else
    largest = i;

  if(compare(r,heapsize) && compare(A[r-1],A[largest-1]))   //  (r <= heap : A[r-1] > A
        largest = r;

  if(compare(largest,i)      // compare largest != i
    {
      int tmp = A[i-1];
      A[i-1] = A[largest-1];
      A[largest-1] = tmp;
      Heapify(A, largest, heapsize);
    }
}

int main(int argc, char* argv[])
{
  ifstream input;  // initializes input
  int dataLength;

  if(argc > 1)
    {
      string inData = argv[1];
      input.open(inData);

      if (!input) {
     cerr << "Unable to open file." << endl;
     exit (1);
      }

      input >> itemCount;
      arraySize=itemCount-1;
      int Array[arraySize];
      int arrayCount = 0;

      while(input >> x) {
     Array[arrayCount] = x;
     arrayCount++;
      }

      intertionsort(Array, arraySize);
      mergesort(Array, 0, arraySize);

      int lastPivot = Array[arraySize];
      int middlePivot = Array[arraySize/2];

      quicksort(Array, 0, arraySize, lastPivot);
      quicksort(Array, 0, arraySize, middlePivot);
      heapsort(Array, arraySize);
    }
}
(1) Yes, a single templated compare function will suffice.
(2) That way the templated compare function works with any class that supports your method of comparison (since you're using strings, you can use < inside compare and have it work).

-bcl
I'm sorry I didn't mention this sooner but "compare" is a terrible name for the function. When it is called, how can you tell when it should be true and when it should be false? Calling it "lessThan" or "less" or even "LT" (Fortran flashbacks for some here) makes it clear what you are comparing in the if statements (and while loops) in the sort functions.

-bcl
bcladd,

> void output(int A[], const string & ofName);
> thus you can open ofName for output.

If this is called from each sort seperately, how do i pass ofName?? since its not used in the sorts but in the main()

> Note: I just looked at the declaration of Array. C++ does not support dynamically sized >arrays declared like that. You must either declare the array on the heap (using new[]) or >use some dynamically sized container (might I suggest std::vector).

Don't quite know what u mean by this?  Is this a problem in another area of my program?
(1) Taking your questions in reverse order:

The following code WILL NOT COMPILE IN C++.  It is almost identical to the code you have allocating Array in your main function.

      int someSize;
      someSize = 19;
      int Array[someSize];

So you need to declare Array in some other way. Either declare it to be a huge, huge array that you know you will never overflow (this is a bad choice; ask Microsoft about the concept of "buffer overruns") or, having read the size from the file, allocate space to hold the array on the heap using an int * and new or use std::vector (a templated, dynamically sized array replacement in the STL).

This is a problem you are going to face.

(2) You are planning on each sort printing the array to a file? If you are only printing the results (the sorted array), then you should be able to do the printing in main() AFTER calling the sort. If you are going to print intermediate results during processing, you can pass in the name (outData) as a string parameter to each of the sort routines. They can then open the file and print their results.

An alternative would be to have a global ofstream, fout, that you open in main() before calling any of the sort routines and that all of your sort routines use for output. Note that most people frown on global variables but this one is very, very close to the global cout stream used for standard output.

So you need to declare Array in some other way. Either declare it to be a huge, huge array that you know you will never overflow (this is a bad choice; ask Microsoft about the concept of "buffer overruns") or, having read the size from the file, allocate space to hold the array on the heap using an int * and new or use std::vector (a templated, dynamically sized array replacement in the STL).

Ok i might need some help with this, havent worked with it before.  I could have sworn i did a past program using:

     int someSize;
     someSize = 19;
     int Array[someSize];

But i will take your word for it obviously :)  I dont have my files on this comp so i cant check.

Maybe this will change things, probably not but the first number in each file im going to be used is the size of the array, like this: (file)

3
123
234
234

It will take input from the file and use that as the input, but this still wont work will it?
No. The error in Borland's C++ compiler is "Constant expression required in function main(int,char * *)" at the declaration of Array. The problem is that C++ sets aside space in the function activation record for all local variables AT COMPILE TIME so it needs to know the size of every one of them at compile time. The code will work if you declare someSize to be a const int and initialize it on the same line:

    const int someSize = 19;
    int Array[someSize];

This works because the compiler knows the size at compile time and you cannot change the size at run time. Exactly what you don't want.

My suggestion:

#include<vector>

(you already are using namespace std to get string so that means std::vector is just vector)

vector<int> Array;

Then in your loop you can use the push_back method (that puts a new element at the back of the vector). You don't need to include the size in the file (vectors grow as needed) and you don't have to count the nubmer of elements because vectors have a size() method that returns the number of elements in the vector. vectors also have a [] operator that is efficient and works just like an array (okay, almost exactly like an array).

    while(input >> x) {
         Array.push_back(x);
      }

There is a way to dynamically declare an array but I don't want to get into it until you ask.

This means the signature of each of your sort functions will change to be vector<T> A rather than T A[]. You don't have to pass in the size unless you need it explicitly and if you need it from main you can refer to it as Array.size()

-bcl
I'm sorry, but why don't you simply time those operations and report the time it tooks ?

It's obvious that what is faster does less or faster operations...

have a look around QueryPerformanceCounter() if interested.
EarthQuaker-

Exact operation timing is often worthwhile but it is also the case that understanding the big-O running time of various algorithms can be very useful. When studying big-O running times it is annoying to have to take into account whether the user was watching streaming video or had an IM session in the background or...you get the idea. That sort of noise can be filtered out of the data by simply counting operations of interest (as you suggest with the performance counter) and those operations are typically at a high level of abstraction (is it more important to know the number of comparisons charged to the sort at the C++ level or the number of times the CMP machine instruction was executed? What percentage of the CMP should be charged to the sort and which were set-up/tear-down code emitted by the compiler that would be different on a different compiler/architecture).

Just some thoughts on why you might want to count comparisons (or swaps; even harder to track at the machine language level) rather than time or machine instructions.

-bcl
blcadd,

In some of my sorts... what is the different between doing:
void insertionSort(int A[], int size);
Than doing:
void insertionSort(int *A, int size);

Is one better to use or...???







Both are fundamentally the same.

The difference is that you could change the value of A in the second example (You could use ++A or A += 2 or similar), that is, when it is a pointer.
You CANNOT change A in that way when it is passed as an unsized array.

The fact that using the two is all but the same indicates that C/C++ treats arrays as pointers at sequences of objects of the same type (this is under the hood) and explains why address arithmetic works with pointers and with arrays.

Is one better than the other? The [] notation indicates that you really expect to be working with sequences of items rather than a single item. Thus as a documentary device it can be useful. Otherwise no difference in performance so just pick one or the other (to be consistent) and get comfortable with reading it.

-bcl
Also the comparison count is going to differ depending on whether the array is full of integers or integer strings.

The number of comparisons of integers in an array in quicksort will differ from the number of comparisons on the array if the numbers were treated as integer strings.

How can you use only one compare function for both integers and integer strings? doesnt there need to be 2?
ASKER CERTIFIED SOLUTION
Avatar of bcladd
bcladd

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ok, but the "comparison" count that is incremented inside that function will not be the same for integers and integer strings.

For example

template <class T>
bool lessThan(int a, int b) {

// Whatever is in here wont work for both integers and integer strings correct?
// The count that is incremented is going to be different also.
// I'm wanting to compare how many comparisons take place for regular integers
// and then for integer strings.

compares++;
}
Here is an outline of what im trying to say i guess:

1.  An array is pulled in from a file, the numbers stored as regular integers.
    - This array is run through all of the sorts.
    - Each different sort produces a comparison count that is displayed.
    - Then the final sorted array it sent to an output file.

2. A second array is then built with the same file, except stored as integer strings.
    - Same as above

This should allow comparison between the two different comparison counts that are produced.  So shouldnt  this mean that integer strings should be compared differently than regular integers?  If not then the count would be the same and there wouldnt be a reason for observation of the difference in comparisons.
Also in BuildHeap

  if(compare(largest,i)      // compare largest != i
    {
      int tmp = A[i-1];
      A[i-1] = A[largest-1];
      A[largest-1] = tmp;
      Heapify(A, largest, heapsize);
    }

I have to compare largest and i   --> largest != i

How can I do this with only one comparison function, I guess I dont see how to implement the function and be able to make all the comparisons possible.
I guess I don't understand what you mean by "integer string". It was my assumption that you would read the following file:

23
456
2200
213
3456

and produce the following sorted files:

integers
23
213
456
2200
3456

strings
213
2200
23
3456
456

(the strings are in lexicographic order).

If this is what you mean to do with your sort, then the comparison will be counted correctly if you use < in the your lessThan function. That is, one string comparison will be counted as one comparison. If you want to count how many character comparisons are done when comparing those strings then you will need two different versions of lessThan so you can write your own string compare and hook a count to the character comparisons (could be done using an int/char version of lessThan, actually).

Since the sort orders are different, it stands to reason that the amount of shuffling required to go from the original order to the sorted order will be different in the two scenarios so you will have something to compare.

If I am misunderstanding what you want to do, please let me know.

-bcl
You understand me correct, the sorted lists you have above are what im looking to do with this program.  However wont running the program on an array of ints and array of strings produce the exact same comparison count?  If so this isnt what im looking for.  I'm looking to observe the differences in the comparison count in each situation.  So I guess this means I want to count the character comparisons correct?

Then this leads me to my next question.  If this is the case, then how do I use two seperate functions and still call a compare within my sort functions?

If im getting ahead of myself, sorry, trying to grasp this.
Well nevermind i read your last statement again.  You're right the sort orders will be different.  Yes this is what i want, forget what i said about character comparisons.
I guess I dont understand how it can be done with one function.  How you "tell" the function that the incoming item is an integer or a string, and then performing the correct comparison depending on that.
Cool. The big win in using std::string to hold the string versions of the integers is that operator< is defined for std::string and is defined to order the strings lexicographically. So a template that uses < inside it will work with int, string, float, etc. It would not work (without a specialization) for char * because < would compare the addresses of the strings in memory, not the contents of the strings.

-bcl
Does this work?  Trying to get it to work for all types of comparisons.  Obviously if
!(a<b) then it has to be (a>=b).  However some of the comparisons require !=

template <class T>
void compare(T a, T b) {

if(a < b) { return 1; }
if(a > b) { return 0; }
if(a==b) { return 1; }

compares++;
}

If you need/want me to start a new thread let me know.  But your continued help/tips are very appreciated and having it all together helps also.
(1) Yes it works.

(2) Does it count as one comparison?

The second question is for you to answer.

Remember that in an ordered set, a != b is equavlent to ((a < b) || (b < a)); two comparisons but only using <

-bcl
(2) Does it count as one comparison?

Well inside the compare functions there may be more than one comparison being made.  But ultimately shouldnt it only be incremented once because the two items are being compared "once" in the sort function.  Since compare i only called once

template <class T>
void compare(T a, T b) {

  if(a < b) {
   compares++;
   return 1;
  }

  if(a > b) {
    compares++;
    return 0;
  }

  if(a==b) {
    compares++;
    return 1;
  }
}
Is that what you meant above  ^?
Also since the there has to be two arrays created (one for ints and one for strings) and then either create output in the sorts, or do it in the main (which would require making more copies of the arrays), isnt there some way to get around such repeated code?? Perhaps writing a function that builds the array and then that function calls the sorts?

int main()
{
...
      buildArray <int> (input);
      buildArray <string> (input);
...
}

template <class T>
buildArray(ifstream input)
{
      T* Array = NULL;  
      int itemCount;      
      input >> itemCount;
      int arraySize = itemCount-1;    
      a = new T[arraySize];
      int arrayCount = 0;

      while(input >> x) {
        Array[arrayCount] = x;
        arrayCount++;
      }

      intertionsort(Array, arraySize);
      mergesort(Array, 0, arraySize);

      int lastPivot = Array[arraySize];
      int middlePivot = Array[arraySize/2];
      quicksort(Array, 0, arraySize, lastPivot);
      quicksort(Array, 0, arraySize, middlePivot);

      heapsort(Array, arraySize);
}
Yep. You can make it a little simpler (and closer to your original) with

    template <class T>
    void compare(T a, T b) {    
        compares++;

        if (a < b) { return 1; }
        if (a > b) { return 0; }
        if (a==b) { return 1; }
    }

-bcl
Sorry, I didn't see your second posting above. It is a nice templated function.  
    Why is ArraySize itemCount - 1?
    You never declare a variable a so assigning to it is an error.
    How many output files will you have?
    When does compares get set to zero?

Also, you asked (a few back) about setting up another question. If you want to ask a different, specific, question or just get a broader perspective, feel free. If you asked I could even refrain from chiming in. I am not so much of a point-monger to need you to set up another question (not saying I won't accept the points for this one or even more than shown above; I just want to keep some dignity while grovelling...).  I like the teaching so I will give you what guidance you ask for.

-bcl
> Why is ArraySize itemCount - 1?

Because itemcount is equal to the number of items in the array.
Shouldnt the first item be in A[0].  So for 80 items, itemCount = 80.  Making ArraySize=79
Inputing items in A[0]....A[79]  which is 80 items.

> You never declare a variable a so assigning to it is an error.

Oops, thanks for the correction.

> How many output files will you have?

After one run of the program.  There will be 10 output files.  Five, one from each sort, pertaining to the array of ints.  Five more, one from each sort, pertaining to the array of strings.

> When does compares get set to zero?

I was unsure about this one.  I guess it depends on how I handle the output?? I could set compares to 0 at the beginning of each sort and then passing it to the compare function, but it seems to me that setting it up there would kinda be confusing.  I guess I don't know where I want to do the output at, or where is the most logical
Pertaining to the output, one reason I dont know where to do it is because I want to add a suffix to the input filename.  Like DataFile.inths.

Input filename = DataFile
Type of data = int
Type of sort = heapsort or "hs"
ArraySize is 80, not 79.  Yes, the indices run from 0 - 79 but the number in the square brackets after a type in new (or in-line when declaring an arry in C/C++) is the NUMBER OF ELEMENTS. So if you want to store 80 elements you had best declare 80 elements, not 79. C/C++ declares ranges (in arrays/vectors/strings) with the first element's index and ONE PAST the last element's index. So, with an implicit start of 0, you use the number of slots to declare. It makes for loops easy to write for counting up:

  for (int i = 0; i != ArraySize; ++i) {
    // do something with Array[i] here
  }

On output: You want output with the number of comparisons and the final sorted array generated for each sort, right? Assuming you don't care about intermediate results, it makes sense to alternate calls to sorts with output statements in main(). Why there? Because it leaves the sorts as general purpose as possible (they just sort their input and are not tied to this particular program except through their need for the lessThan function) and they can be reused. Because main() already has the name of the input file so it can generate related output file names without passing the file name around. Because one global output stream would make sense; ten global output streams makes no sense.

This is also where you should reset compares. That way if you wanted to count two different sorts together you could skip initialization between the two (compares continues counting across the two). Also keeps program specifics out of the sort routines.

-bcl
Yes im wanting to display the comparisons.  The sorted array should not be displayed, only sent to the file as output.  Is this what you meant for the main() dealing with output?

int main(int argc, char* argv[])
{
  ifstream input;
  ofstream output;
  int dataLength;

  if(argc > 1)
    {
      string inData = argv[1];
      input.open(inData);

      if (!input) {
        cerr << "Unable to open file." << endl;
        exit (1);
      }


      int lastPivot = Array[arraySize];
      int middlePivot = Array[arraySize/2];
     
      buildArray <int> (input);

      intertionsort(Array, arraySize);
      output.open(inData+.intis);
      // output sorted array here

      mergesort(Array, 0, arraySize);
      output.open(inData+.intms);
      // output sorted array here

      quicksort(Array, 0, arraySize, lastPivot);
      output.open(inData+.intqsl);
      // output sorted array here

      quicksort(Array, 0, arraySize, middlePivot);
      output.open(inData+.intqsm);
      // output sorted array here

      heapsort(Array, arraySize);
      output.open(inData+.inths);
      // output sorted array here

      buildArray <string> (input);

      intertionsort(Array, arraySize);
      output.open(inData+.stris);
      // output sorted array here

      mergesort(Array, 0, arraySize);
      output.open(inData+.strms);
      // output sorted array here

      quicksort(Array, 0, arraySize, lastPivot);
      output.open(inData+.strqsl);
      // output sorted array here

      quicksort(Array, 0, arraySize, middlePivot);
      output.open(inData+.strqsm);
      // output sorted array here

      heapsort(Array, arraySize);
      output.open(inData+.strhs);
      // output sorted array here

    }
}
Something like that. You need to put the extentions in quotes and make sure you close output when you are done writing the array.

To me that seems like a lot of code, but i guess its necessary right?  I guess I can always clean it up later after i get the general program working.
Also if i use the buildArray function, how do i pass the array back to the main for use in the sort function calls?
(1) How are you going to pass the Array into the sort routines? It needs to be modified inside the sort function, right? That makes it very similar to a newly created array. I suggest you use a reference to a pointer of the appropriate type (for readability you should declare a new type for the pointers so you can pass a reference to the new type).

(2) A lot of code? I act surprised because I wonder where you planned on having less code. You can simplify main() if you want to use function pointers (and you make all of your sort functions have exactly the same signature), construct an array, and cycle through the array. You should probably write an output routine (similar to the input routine) that prints an entire array. That way you just pass the file name, the array, and the length of the array and it prints it all out. Makes each call to sort look something like this:

    init_sort(); // initializes any counters and stuff like that (set compares to 0)
    intArrayType heapArray ;
    copy(heapArray, Array, arraySize);
    heapsort(heapArray, arraySize);
    saveArray(inData+".ihs", heapArray, arraySize);
    show_sort_count(); // displays the statistics

Notice that I used a bunch of functions you actually haven't written but the should all be short and easy to write.

-bcl
(1) How are you going to pass the Array into the sort routines? It needs to be modified inside the sort function, right? That makes it very similar to a newly created array. I suggest you use a reference to a pointer of the appropriate type (for readability you should declare a new type for the pointers so you can pass a reference to the new type).

Yes it needs to be modified inside the sort function.  Do you mean something like this?


template <class T>
void InsertionSort(T* A, int size)
{
  A = new T[]
  for(int j=1; compare(j, size); j++)     // compare j < size
    {
      T key = A[j];
      int i = j - 1;
      while ((compare(i, 0) && (compare(A[i], key))  // compare i>=0 and then A[i] > key
           {
             A[i+1] = A[i];
             i -= 1;
           }
      A[i+1] = key;
    }
}
template <class T>
buildArray(ifstream input)
{
      T* Array = NULL;  
      int itemCount;      
      input >> itemCount;
      int arraySize = itemCount-1;    
      a = new T[arraySize];
      int arrayCount = 0;

      while(input >> x) {
        Array[arrayCount] = x;
        arrayCount++;
      }
}

Pasted from one of your earlier posts. This will not work for several reasons: you cannot pass ifstream by value (MUST be by reference; there is no public copy constructor for ifstreams); you called this function twice with the same stream (that will read the file the first time and then be at the end of the file when the second call BEGINS); you don't (as you realized above) pass any information back to main().

Let's address these in reverse order:

(1) Passing information back: what information do we generate in the function? The array and the size of the array. Multiple data items going back implies using reference parameters:

    template <class T>
    buildArray(ifstream input, T*& Array, int & arraySize)
    { ... }

You don't want to declare them as local variables anymore, either.

(2) Actually, we'll address both of the other problems by NOT passing an ifstream at all. Instead the function will open (and close) and ifstream internally and we will just provide the name of the file as the first parameter to the function:

    template <class T>
    bool buildArray(const string & inFName, T*& Array, int & arraySize)
    {
        ifstream input(inFName);
        if (!input) {
             arraySize = 0;
             return false;
        }
        input >> arraySize;
        Array = new T[arraySize];
        int arrayCount = 0;
        T x;
        while(input >> x) {
            Array[arrayCount] = x;
            ++arrayCount;
         }
        input.close();
        return true;
    } // BuildArray

   
Your saying this should be in the main? or in an output function?

    init_sort(); // initializes any counters and stuff like that (set compares to 0)
    intArrayType heapArray ;
    copy(heapArray, Array, arraySize);
    heapsort(heapArray, arraySize);
    saveArray(inData+".ihs", heapArray, arraySize);
    show_sort_count(); // displays the statistics
The grouping of code is repeated over and over in main(), once for each kind of sort.
In some of your code above, what is:    
Dont understand where this is coming from?

intArrayType heapArray ;
Assume there is a typedef for the intArrayType.

It is equivalent to int * in your work. I was just trying to make it easier for me to understand by assuming a typedef:

typedef int * intArrayType;

-bcl
I think im pretty lost.  I'll work on what you have said and then make another post later.  Thanks for the continued help.
Ok instead of posting the code here.
Its here: http://www.geocities.com/bjohnson045/program.txt

Just to show you how im coming along on the coding.
Thanks again for your help.
Relax. Take a deep breath and think about what you want to do. If I understand you correctly you want to do the following:

Load an array of integers.
For each type of sort
    Convert an array of integers in the order of the original file to sorted order using the sort, counting comparisons
    Save the sorted array to a specially named file
Then do it with strings (I am ignoring that for the moment)

Okay. There are several "gotcha" moments in the above:
    Each sort should start with the SAME ELEMENTS in the SAME ORDER. This means repeatedly reading the file or sorting copies of the original array.
    The comparison count must be reset (if done using a global variable) between sorts
    The output file name needs to be known at the same time as there is a sorted array...either passed into sorts or handled in main()

Do these "high level" concerns make sense? The code I entered earlier, for heapsorting an array of integers, was designed to address these problems:

    init_sort();  // resets compares
    int* heapArray ;
    copy(heapArray, Array, arraySize); // copies the original data (you need to write this routine)
    heapsort(heapArray, arraySize);
    saveArray(inData+".ihs", heapArray, arraySize); // save the sorted contents of heapArray in the named file
    show_sort_count(); // displays the statistics

    init_sort(); // reset compares
    int* insertionArray ;
    copy(insertionArray, Array, arraySize);
    insertionsort(insertionArray, arraySize);
    saveArray(inData+".iis", insertionheapArray, arraySize); // save the sorted contents of insertionArray in the named file
    show_sort_count(); // displays the statistics

Hope this makes what I was trying to say clearer.

-bcl
Ok thats helps, updated here: http://www.geocities.com/bjohnson045/program.txt

Im a little unsure if I am passing everything right.
Also how with the sequence above differ with strings? if i should even ask that yet :)

(1) I'll take a look at the code after work (in a couple of hours).

(2) With templated functions the only thing that really changes with strings is the declaration of the "local" array. Of course you want to use different extensions for the names of the files and all of that, too, but the functions themselves will be instantiated and called by the compiler.

-bcl
(1) Your comparison counter, comparisons, is static in init_sort and is initialized to zero. There are two problems here:
    (a) A variable declared inside of a function is only in scope within that function. Thus no other function can see comparisons so you wil not be able to increment it and you will not be able to print it out.
    (b) A static variable is intialized only once. No matter how many times you call init_sort, the value of comparisons will only be set to 0 once (unless the code itself sets it to something else).
(1.5) You have an extra ; at the end of the function declaration

(2) copy needs to take T*&, not T&[], at least for the first parameter. You also need to allocate space (just like in BuildArray) for the new array.

(3) missing parameter type in the declaration of saveArray.

There are a whole lot of additional errors when those are fixed. I think most of them come from comarisons being static and locally scoped. Try moving it out to file scope (if you want you can declare is static there, though static at the file scope is a linkage directive and doesn't mean exactly the same thing as it does inside of a function).

Make sure you declare your variables before using them (in main there are problems).

Let me know if you have any more questions.

-bcl
In regards to the static comparison count.  This variable should be global right?  I guess i thought static meant "global", am i wrong in this?  If not, should it be declared static in the main and then set to 0 inside the init() function??


Note: its not letting me increment points anymore? because its at 500 or something?
(1) I know nothing about points. I would guess the max is 500 (since that is the highest I have seen on any single question). I think the standard is to post another question with the title "Points for bcladd" and then award them when I respond to that question. Haven't seen so much of that recently so I don't really know the etiquette.
 
(2)  You have aroused the lecturer within me. Be warned.

A variable declared outside of all functions in a C/C++ source file is a _global_ variable. This means that it is _in scope_ (visible) from the point of declaration to the end of the file.  This means that any function can refer to it by name. It also means that the variable is global to the _linker_ as well as the compiler and that if the variable is properly declared in another C/C++ source file (using the extern declaration modifier) it can even be used in other source files. To prevent such external linking a global variable declaration can be modifed to be static which means that it is not externally linkable. Since the variable is still global in scope in the current source file, nothing in the current file is changed by modifying a global variable to have static linkage.

Now, you knew most of that (and probably had little interest in the strange, fine points of C/C++ linkage). It is unfortunate but the term static is used when discussing the lifetime of variables as well as their linkage. You know what scope is, right? A given variable is only "visible" from some fraction of the program and the further nested in {} a declaration is, the smaller that fraction (typically) is. Have you ever considered the _duration_ of a given variable? That is, how long does the memory holding the value remain valid and contain the last value stored in it. In C/C++ there are three storage durations: _static_, _automatic_, and _allocated_.

Static storage duration means the storage is allocated by the program and last the duration of the execution. Variables declared outside of any block (global variables) and variables declared within a block using the static keyword have static duration. (Completely separate from the other use of static: global variables have static duration with or without static linkage.) Note that a variable declared inside a block has limited scope but "unlimited" lifetime.

Automatic storage duration applies to any variable declared inside a block without the static keyword and means that the variable's lifetime is approximately equal to the time when the enclosing block is executing. That is, variables declared within a function "come into being" when the function is entered and "go out of being" when the flow of control leaves the function's block.

Allocated storage duration applies to explicitly allocated storage (dynamically allocated storage is another way to say that; storage returned from new, malloc(), etc.) and means that the lifetime of the variable continues until it is explicitly released (delete, free()) or the program terminates.

Hope this helps.
-bcl
Ok. Well I updated the code again.
Another question:

I'm passing "Array" to buildArray in the main().  So this needs to be declared first. However, do i need to declare it twice for both buildArray <int> () and buildArray <string> ().  Such as:

int intArray[];
string strArray[];

buildArray <int> (inData, intArray, arraySize);
buildArray <string> (inData, strArray, arraySize);
Declare two of them but declare them to be pointers, not []. Also, there should be no need to use buildArray<int>; the compiler should be able to determine the template types from the types of the parameters.

-bcl
Do I need to delete anything throughout my program or is it done automatically once execution of the program has ceased?
Also how am i suppose to pass the correct, LAST and MIDDLE pivot values to the two different quicksorts.  Because the array isnt constructed until the buildArray function.  Simply place them after this function call then correct?

(1) You SHOULD clean up after yourself. After printing the local copy of the sorted array you can call delete[] on the pointer. Or you can let the system recover the memory when your program finishes.

(2) You seem to have a problem with calling quicksort. You have 4 parameters in the formal parameter list but your recursive calls only pass 3 parameters. Why do you pass the pivot in? And why do you only pass one pivot in? Quicksort typically calculates the pivot from the input range (either taking the first element, taking a random element, or calculating some approximation of the median value in the range (there is an efficient way to pick the median of 5 values so 5 values are chosen from the range and the median of the 5 values is used as the pivot)). That way you only need to pass in 3 parameters and the pivot is calculated inside EACH recursive call.

-bcl
I'm wanting to calculate the pivot because I am doing two different versions of quicksort.  One which uses the last element as the pivot and one that used the middle element as the pivot.

But you don't want to calculate the pivot once; the pivot location will be different for each recursive call (whether it is the middle or last element). You may want to pass a flag to quicksort indicating HOW to calculate the pivot but each recursive call will have to calculate its own pivot index.,

-bcl
So the actual quicksort function needs to be recoded to calculate the pivot?

Might it be better to use a version of quicksort that uses the partition function?

template <class T>
void quickSort(T *a, int first, int last) {
   if(first < last) {
      int loc = partition(a, first, last);
      quickSort(a, first, loc - 1);
      quickSort(a, loc + 1, last);
   }
}

template <class T>
int partition(T *a, int first, int last) {
   int i      = first;
   int loc    = last + 1;
   T pivot    = a[first];

   while(i < loc) {
      do {
         ++i;
      } while((a[i] < pivot) && (i < last));
      do {
         --loc;
      } while(a[loc] > pivot);

      if(i < loc)
         swap(a[i], a[loc]);
   }
   swap(a[first], a[loc]);
   return(loc);
}
Sure, use the partition function. Note that the pivot is ALWAYS the first element in the range in the version of partition you show.

If you want the pivot to differ, you will need to have some way of indicating the pivot to the partition function. That can be as a parameter (which would have to be passed in through quickSort) or a global flag (set in main() before each call to quickSort).

Only important if you really want to select different pivots on different calls to THE SAME function (yep, a third solution is to have quickSortFirst and quickSortMiddle (and partitionFirst and partitionMiddle) and call one or the other from main()).

-bcl
If i use this version that means i am going to pass the first and last values also.  I guess im unsure how I attain these values considering the array is built inside the buildArray function.
You tell buildArray the size of array you want. In C++, all array indices begin with 0 so first (to the top-most call of quickSort) is 0, last is sizeOfArray-1 (the value passed back from buildArray).

-bcl
buildArray determines the size of the array on its own by taking in the first value from the file.  Do I have to declare arraySize=0; in main() and then pass this to buildArray, and then determine the first and last values, and then call the sorts with those values?

After this question ill post a new thread, this one is huge, plus i can do the points easier.
You pass in a reference to an integer. That means that you get the size of the array BACK from buildArray. So you know how big it is.

-bcl