Link to home
Start Free TrialLog in
Avatar of nir_sh
nir_sh

asked on

Sorting file.

Hello people.
I have to sort a huge (about 30MB) file of integers.
Please send me an efficient algorithm or program in C++.
Thanks to ozo I've decided to use qsort. However, I still have a problem. In order to use qsort I have to put my data in some kind of array, but it's impossible to create an array that size (my array should contain at least 800000 integers). What can I do?

Please answer quickly.
Thanks.
Avatar of snoegler
snoegler

Beyond this link:

  http://now.cs.berkeley.edu/NowSort

you can find the fastest(not sure about that) algorithm for disk based sorting.
I am not sure if you can get the algorithm there, but i think so. I would try it, because the
algorithm works even for 6.4 GB(In the benchmark you can see there, in less than 1 hour with
32 workstations).

Another conventional sorting algorithm for disk files is the bucket sort. But i don't have a good
link, try to search for it.
Hope this helped...
Avatar of ozo
30MB isn't that large.
If it fits in memory you might just
#include <stdlib.h>
int compatr(const void *a, const void *b){
      return *((int *)a)-*((int *)b);
}

qsort(base, nel, sizeof(int), &compar);


What about calling an already existing sort program? On UX just call "sort -n < in  > out"
I would think that 30mb would fit in memory, so if I were you I'd use ozo's method.
Avatar of nir_sh

ASKER

Edited text of question
Avatar of nir_sh

ASKER

Edited text of question
Avatar of nir_sh

ASKER

Adjusted points to 400
C++ can easily declare an array containing 800000 integers,
but perhaps your system doesn't have the memory to load it after all.
So, I would either pipe it to sort, if you have it,
or split it into pieces that you can sort in memory, and then merge the pieces.
what do you know about the data?
If these are integers, aren't there going to be a lot of duplicates?

Can you then use a map like map<val, count>?  Read it all in, insert into/increment the map as you go, and then traverse the map on output, writing count copies of val.  

Hi, some times ago I have same problem(but data in file).
My way is very close to "ozo" solution:
1. Define number of parts that can be in memory: N
2. Loop (for i = 0 ; i < N; i++)
{
   read i-th part of file. Sort it by qsort and save in temp file
}
3. Union temp files: (this is problem with max.number of open files  and i make union by 10 files ).
  for union I open up 10 files, read 1-st line from all files,
  qsort them, save min, read line from that file, where is min
and ..., untill end of all files.
Can be this help, Alex
Get a bat and beat it with a stick.
That should make it work.

demoneye - If you have nothing of any value to add to this site, please refrain from commenting or answering.  Experts exchange is a professional site.  Wasting room and people's time is usually not accepted very well by the general populous.  If you have other questions, please refer to the FAQ page and HELP page.
Avatar of nir_sh

ASKER

I've managed to create a very big array using the farmalloc function. However, the array I'm using should be bigger, my pointer should be huge and not far. This is my last problem, if you solve this problem for me, everything is solved.
when you talk about "huge" pointers, did you check the memory model in which you are compiling?
ASKER CERTIFIED SOLUTION
Avatar of msa092298
msa092298

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial