Solved

Need a powerful multi-purpose sort utility

Posted on 2013-05-13
12
427 Views
Last Modified: 2013-09-07
Hi gurus, I currently call a 3rd party sort utility within my Delphi app called "Opt-tech Sort". It's very good and does *almost* everything I need, including:

- handling files of unlimited size
- has a facility to Include/Omit records based on user-specified criteria
- has a facility to modify/rearrange the format of the output record any way you like with regards to field order.

The only drawback to this, and it's a big one for me, is that Opt-tech cannot handle fields from a CSV file that have multiple quotes (""") back to back like that. For instance, here is a legit field from a CSV file according to CSV convention:

,"""+971502347269"" <0971502347269>",

Opt-Tech sort will see those first two back-to-back quotes and then disregard the whole rest of the field - thinking it has encountered an "empty" field with back-to-back quotes. This situation, naturally, will have disastrous results in my application. Does anyone know of a sort utility (commercial or free) that will handle the above case and tick all my boxes?

Thanks!
    Shawn
0
Comment
Question by:shawn857
12 Comments
 
LVL 25

Expert Comment

by:Sinisa Vuk
Comment Utility
Try to use kbmMemTable + kbmMemCSVStreamFormat component. There should be working this and it is quite powerful.
Go to http://www.torry.net/pages.php?id=572 and find TkbmMem Table v.5.52.
0
 

Author Comment

by:shawn857
Comment Utility
Thanks Sinisav... this component will be able to handle and sort a file if many gigabytes? I don't see how it can, if it has to bring all data into memory...

Thanks
    Shawn
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
Comment Utility
Have you looked at Syncsort?

Work-arounds
* you might use your Delphi application as a pre-processor for the Opt-Tech code, calling in Record mode, rather than File mode.  Your Delphi code could remove the doubled quotes that hang-up the sorting code (replace three consecutive quote characters with a single quote character).

* Similar to above, except you read/write each record before calling the Opt-Tech code in File mode as you currently do.

* Ask the Opt-Tech folks if they support exit routines.  If so, you might create a small program that cleans up the record before Opt-Tech code begins its processing.

* Look at PowerShell

* It might be possible to use ADO with a defining schema.ini file to accomplish the filtering, sorting, column reordering.

===========
You might use TSMImportFromText library to treat the file as a data source and use the data routines to perform the functions of the Opt-Tech code.
http://www.scalabium.com/smi/

Co-sort
http://www.iri.com/solutions/Data_Transformation/Sort_Merge

Sort Solution
http://www.mwlabs.de/psortsol.htm

If you have SAS on your system, you might use its Proc Sort
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
If you have a SQL server instance, you might kick off an SSIS task to import the data and then save the sorted/massaged data.
0
 

Author Comment

by:shawn857
Comment Utility
Hi Aikimark, thanks for all your suggestions... here's my thoughts:


Have you looked at Syncsort?

>> Mega-expensive!! Same with Co-sort


Work-arounds
* you might use your Delphi application as a pre-processor for the Opt-Tech code, calling in Record mode, rather than File mode.  Your Delphi code could remove the doubled quotes that hang-up the sorting code (replace three consecutive quote characters with a single quote character).

* Similar to above, except you read/write each record before calling the Opt-Tech code in File mode as you currently do.


>> Do-able... but this is kind of my last resort as this would require 3 passes through the >>data instead of just one. Speed of processing is going to be a big selling point for my >>application and this would pretty much defeat that.


* Ask the Opt-Tech folks if they support exit routines.  If so, you might create a small program that cleans up the record before Opt-Tech code begins its processing.


>> Again, this would require 3 complete passes through the data.


* Look at PowerShell


>> I had a brief look at it, embarrassed to say I had never heard of it before! it seems like it would be similar to calling a DOS routine from my application? Not so sure if I like this idea...


* It might be possible to use ADO with a defining schema.ini file to accomplish the filtering, sorting, column reordering.

>> Not familiar at all with ADO Mark... what is it?


You might use TSMImportFromText library to treat the file as a data source and use the data routines to perform the functions of the Opt-Tech code.
http://www.scalabium.com/smi/


>> Would I be able to import - into memory - files of many many gigabytes??? Doesn't sound feasible, or possible.


Co-sort
http://www.iri.com/solutions/Data_Transformation/Sort_Merge

Sort Solution
http://www.mwlabs.de/psortsol.htm


>> I tried Sort Solution in the past, it doesn't have the features I require.


If you have SAS on your system, you might use its Proc Sort


>> Once again, I have to say I'm not familiar with SAS Mark. What does that stand for?


If you have a SQL server instance, you might kick off an SSIS task to import the data and then save the sorted/massaged data.

>> I know nothing about SQL, sounds like it would be a whole new can of works... plus can >>it handle gigs and gigs of data in one shot?
>> The way I look at it is I have 2 basic alternatives: (1) Find another sort routine that will do everything I need, or (2) Stick with Opt-tech sort and if I encounter the problematic double-quote situation, then do the pre and post processing as you recommend.
  I have sent out emails to a few other sort utility firms - Ahlsort (but they seem expensive too, and are not royalty-free for distribution), and XSM sort (www.hhns.fr). I'm still waiting to hear back from the XSM sort people.

Thanks
    Shawn
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 45

Expert Comment

by:aikimark
Comment Utility
What are your budgetary constraints?  You didn't mention that money was a consideration in your question text.

>>3 passes
If you can use record mode, then it only requires two passes.

>>speed of processing as a selling point
Then you had better describe what you are doing in more detail.  What kind of data is in these files?  What does your application do?

>>Again, (exit routines) would require 3 passes
No.  Opt-Tech is doing the same reading/writing, but would pass the record to your code (in memory) for processing.

>>PS -- similar to calling a DOS routine
How are you calling Opt-Tech sort now?
What don't you like about this idea?  Requirements and problem constraint details should be supplied to experts.

>>ADO
ActiveX Data Object -- It is a (DLL) library of Microsoft database components moved forward from the Jet/Access database.

>>files of many gigabytes?
Your processing space in a 32-bit environment is limited to 2GB.  I thought you might use the import to roll-your-own external sort routine, processing chunks of the huge file into sorted temp files that are then merged.

>>SAS
Statistical Analysis System from a company named SAS in Cary, NC

>>SQL server
Yes.  It is one of the large scale relational databases.
0
 

Author Comment

by:shawn857
Comment Utility
Hi Aikimark... sorry for the long absence. To reply to your comments:


>What are your budgetary constraints?  You didn't mention that money was a >consideration in your question text.


Well, ideally I'd like to find a free solution, but if I had to spend, I would say a few hundred dollars.



>>3 passes
>If you can use record mode, then it only requires two passes.

>>speed of processing as a selling point
>Then you had better describe what you are doing in more detail.  What kind of data is in >these files?  What does your application do?



My data files are basically CSV files containing information about VOIP telephone calls. My app reads these files and prepares reports. A typical record might look like this:

10548509,9.71502E+11,4.43452E+11,routing,"""+971502347269"" <0971502347269>",SIP/217.168.44.132-000907e4,Local/10548509@voice-6ed4;1,Dial,"Local/10548509@voice,45,gL(3599000)",10/5/2012 11:25,10/5/2012 11:25,10/5/2012 11:25,5,5,ANSWERED,1349436313



>>Again, (exit routines) would require 3 passes
>No.  Opt-Tech is doing the same reading/writing, but would pass the record to your code >(in memory) for processing.



This could be the best option. I am currently just now discovering and investigating the "Record Sort" option of OptTech
sort and have emailed the author about it - it could be the answer for me. I've tried finding an alternative sort routine
and it doesn't look like one exists that will do all I need. I've talked with the people at Co-Sort, Ahlsort, XSM Sort, etc.
The answer *could* be right under my nose woth the Record Sort option of Opttech sort.



>>PS -- similar to calling a DOS routine
>How are you calling Opt-Tech sort now?


Programmatically from my Delphi app via DLL... nice and seamless.



>What don't you like about this idea?  Requirements and problem constraint details >should be supplied to experts.


Well, some years ago I made an app that called a DOS-based sort called "RPsort". It was pretty cumbersome and "ugly" - a big
black DOS window would appear on top of my app when the sort was being called. Not very seamless.


Will update you when I learn more about the OptTech "Record Sort" option... thanks!


Cheers
  Shawn
0
 
LVL 36

Expert Comment

by:Geert Gruwez
Comment Utility
I've requested that this question be deleted for the following reason:

The question has either no comments or not enough useful information to be called an "answer".
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@shawn857

Thanks for the points.  What utility/software/configuration did you finally use to solve your problem?
0
 

Author Comment

by:shawn857
Comment Utility
Hi Aikimark, I stuck with my Opt-Tech sort and decided to write a pre-processing routine in my app to check and modify problematic records.

Thanks!
   Shawn
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now