Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 233
  • Last Modified:

how can i process these file?

I have some txt file to process, their structure as

jason_one 1999-2-2 jason@hotmail.com 26
jane  2000-3-32 jane@msn.com 24
jason_two 1998-12-22 jason@hotmail.com 26
..........

each file has about 30,000 records, even more,I want to:

1. sort records by email domain in a file,such as "hotmail.com".
2. delete repeated record(email repeated) in a file.
3. merge two file to a new file,ignore the repeated record.

how can i do that efficiently? TStringList?? TList? TMemo?
0
GanQuan
Asked:
GanQuan
1 Solution
 
JAPerrettCommented:
My preference would be a Tlist, assuming that the data is not persistant. If the data was more persistent ,which you do not suggest, then storing the data in a database would be more efficient, as duplicate keys etc can be accounted for in the data structure.

If the process is to merge files to create a new text file as an output , with no need to keep the data, then I would use a TList component.
As the tlist is a just a list of pointers, then you can point to any data type. by defining a record type, then you can break the data into fields at the point of insertion. this would mean that the record would only have to be broken into fields when read from the file, or when writing to the file. As you are using only one sort order, then a binary search will find the insertion point, at which point you can check for duplicate records, and iether skip record or overwrite the record.
0
 
robert_marquardtCommented:
TStringList
Have a look at TStringList.CustomSort in the help.
Write up a procedure which gives you the elements of an address line.

Read from file with List.LoadFromFile('filename');

1. call CustomSort with a compare function which sorts by domain part first
   the compare function should only return 0 if two entries are fully the same

2. on a sorted list (see 1)
  for List.Count-1 downto 1 do
    if CompareFunction(List[I], List[I-1]) = 0 then
      List.Delete(I);

3. merge the two string lists in a new list and then 1 followed by 2.

Write to file with List.SaveToFile('filename');

The above is not as efficient as it could be, but 30000 entries does not yet take that much time to worry about efficiency.
0
 
JAPerrettCommented:
Tstringlist is ok, but a Tlist allows you to add records to a list, and not just strings.
the sort procedure of a list is pretty straightforward.

1:
define a function that takes two pointers eg

function customsort(item1,item2:Pointer):integer;
var
x:Tmyrec absolute item1;
y:Tmyrec absolute item2;

begin
result:=comparestr(x.domain,y.domain)
end;                                                      

you need a sort procedure anyway, to check for duplicate recs , so theres no overhead in programming

you can iether sort the list after you have added all the recs, then delete out the duplicates , or use an insert proc to check at the time of insertion

e.g.
function customsort(item1,item2:Pointer):integer;
var
x:Tmyrec absolute item1;
y:Tmyrec absolute item2;

begin
result:=comparestr(x.s1, y.s1)
end;



procedure insertrec;
var
top,bottom,current:integer;
newrec,comprec :Tmyrec;
skip:boolean;
cmp:integer;

begin
top:=mylist.count;bottom:=0;

current:=0;
cmp:=-1;

new(newrec);
 //read data from file into newrec

while  (top-bottom>1) and (cmp<>0) do
  begin
    current:=((top-bottom) div 2) + bottom;
    cmp:=customsort(newrec,mylist[current]);
    if cmp<0 then top:=current
    else if cmp>0 then bottom:=current

  end;

if cmp<>0 then
  begin
    current:=top;
    mylist.insert(current,newrec)
  end

end;

procedure buildlist;
begin
mylist:=tlist.Create;
insertrec;
end;

0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
JAPerrettCommented:
Tstringlist is ok, but a Tlist allows you to add records to a list, and not just strings.
the sort procedure of a list is pretty straightforward.

1:
define a function that takes two pointers eg

function customsort(item1,item2:Pointer):integer;
var
x:Tmyrec absolute item1;
y:Tmyrec absolute item2;

begin
result:=comparestr(x.domain,y.domain)
end;                                                      

you need a sort procedure anyway, to check for duplicate recs , so theres no overhead in programming

you can iether sort the list after you have added all the recs, then delete out the duplicates , or use an insert proc to check at the time of insertion

e.g.
function customsort(item1,item2:Pointer):integer;
var
x:Tmyrec absolute item1;
y:Tmyrec absolute item2;

begin
result:=comparestr(x.s1, y.s1)
end;



procedure insertrec;
var
top,bottom,current:integer;
newrec,comprec :Tmyrec;
skip:boolean;
cmp:integer;

begin
top:=mylist.count;bottom:=0;

current:=0;
cmp:=-1;

new(newrec);
 //read data from file into newrec

while  (top-bottom>1) and (cmp<>0) do
  begin
    current:=((top-bottom) div 2) + bottom;
    cmp:=customsort(newrec,mylist[current]);
    if cmp<0 then top:=current
    else if cmp>0 then bottom:=current

  end;

if cmp<>0 then
  begin
    current:=top;
    mylist.insert(current,newrec)
  end

end;

procedure buildlist;
begin
mylist:=tlist.Create;
insertrec;
end;

0
 
robert_marquardtCommented:
The problem with a TList is that you have to implement the allocation and the deallocation (which is missing from your sample) of the elements yourself.
If anyone now calls List.Clear or List.Free you will have orphaned all allocated list elements.
0
 
JAPerrettCommented:
if cmp<>0 then
 begin
   current:=top;
   mylist.insert(current,newrec)
 end
else
  dispose(newrec)

should have been there.
 And at the end of the routine, once the list has been written to file it needs freeing.

I agree - I have not included deallocation in this example, but the points were

1] sort the list as you build it not after -  that way your doing half the work

2] dont delete duplicates - just omit adding them

3] by using a Tlist, you are not restricted to just strings, you can add record structures, objects, well anything and sort it according to your own rules in a simple sort procedure.

as far as someone else calling clear?
well - i read what I wrote and take your point.

the  procedure buildlist should obviously be a loop

similar to

mylist:=tlist.create;
while not eof(inputfile) do
  insertrec;


I have not included writing out this list to a new file, and freeing up the used memory, but its a small overhead in time.

I suppose the argument is

a] do you want the flexibility of a Tlist, and the easier assosiation of data,

or

b] do you want the simplicity of being able to add and delete items in a stringlist with less concern over memory management.

everyone to their own :)
0
 
wildhorseleiCommented:
U can define a class,such as
   type
     TLetter=Class
       private
 
         FTitle:string;
         FSendName:string;
         FLetterBody:TStrings;
         ....
         function GetTitle(Value:string):string;
         ....
        public
          constructor Create(Value: string);              
          destructor Destroy;
        pulished
          property Title:string read GetTitle write SetTitle;
          ....        

     then you can another class
       TLetterList=Class
         FYourLetters:array of TLetter;


    I think u can know what i mean,good luck!  
0
 
GanQuanAuthor Commented:
thanx all!
0

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

Tackle projects and never again get stuck behind a technical roadblock.
Join Now