?
Solved

how can i process these file?

Posted on 2003-03-31
8
Medium Priority
?
228 Views
Last Modified: 2010-04-04
I have some txt file to process, their structure as

jason_one 1999-2-2 jason@hotmail.com 26
jane  2000-3-32 jane@msn.com 24
jason_two 1998-12-22 jason@hotmail.com 26
..........

each file has about 30,000 records, even more,I want to:

1. sort records by email domain in a file,such as "hotmail.com".
2. delete repeated record(email repeated) in a file.
3. merge two file to a new file,ignore the repeated record.

how can i do that efficiently? TStringList?? TList? TMemo?
0
Comment
Question by:GanQuan
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 

Expert Comment

by:JAPerrett
ID: 8237271
My preference would be a Tlist, assuming that the data is not persistant. If the data was more persistent ,which you do not suggest, then storing the data in a database would be more efficient, as duplicate keys etc can be accounted for in the data structure.

If the process is to merge files to create a new text file as an output , with no need to keep the data, then I would use a TList component.
As the tlist is a just a list of pointers, then you can point to any data type. by defining a record type, then you can break the data into fields at the point of insertion. this would mean that the record would only have to be broken into fields when read from the file, or when writing to the file. As you are using only one sort order, then a binary search will find the insertion point, at which point you can check for duplicate records, and iether skip record or overwrite the record.
0
 
LVL 11

Expert Comment

by:robert_marquardt
ID: 8237549
TStringList
Have a look at TStringList.CustomSort in the help.
Write up a procedure which gives you the elements of an address line.

Read from file with List.LoadFromFile('filename');

1. call CustomSort with a compare function which sorts by domain part first
   the compare function should only return 0 if two entries are fully the same

2. on a sorted list (see 1)
  for List.Count-1 downto 1 do
    if CompareFunction(List[I], List[I-1]) = 0 then
      List.Delete(I);

3. merge the two string lists in a new list and then 1 followed by 2.

Write to file with List.SaveToFile('filename');

The above is not as efficient as it could be, but 30000 entries does not yet take that much time to worry about efficiency.
0
 

Accepted Solution

by:
JAPerrett earned 300 total points
ID: 8238034
Tstringlist is ok, but a Tlist allows you to add records to a list, and not just strings.
the sort procedure of a list is pretty straightforward.

1:
define a function that takes two pointers eg

function customsort(item1,item2:Pointer):integer;
var
x:Tmyrec absolute item1;
y:Tmyrec absolute item2;

begin
result:=comparestr(x.domain,y.domain)
end;                                                      

you need a sort procedure anyway, to check for duplicate recs , so theres no overhead in programming

you can iether sort the list after you have added all the recs, then delete out the duplicates , or use an insert proc to check at the time of insertion

e.g.
function customsort(item1,item2:Pointer):integer;
var
x:Tmyrec absolute item1;
y:Tmyrec absolute item2;

begin
result:=comparestr(x.s1, y.s1)
end;



procedure insertrec;
var
top,bottom,current:integer;
newrec,comprec :Tmyrec;
skip:boolean;
cmp:integer;

begin
top:=mylist.count;bottom:=0;

current:=0;
cmp:=-1;

new(newrec);
 //read data from file into newrec

while  (top-bottom>1) and (cmp<>0) do
  begin
    current:=((top-bottom) div 2) + bottom;
    cmp:=customsort(newrec,mylist[current]);
    if cmp<0 then top:=current
    else if cmp>0 then bottom:=current

  end;

if cmp<>0 then
  begin
    current:=top;
    mylist.insert(current,newrec)
  end

end;

procedure buildlist;
begin
mylist:=tlist.Create;
insertrec;
end;

0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Expert Comment

by:JAPerrett
ID: 8238035
Tstringlist is ok, but a Tlist allows you to add records to a list, and not just strings.
the sort procedure of a list is pretty straightforward.

1:
define a function that takes two pointers eg

function customsort(item1,item2:Pointer):integer;
var
x:Tmyrec absolute item1;
y:Tmyrec absolute item2;

begin
result:=comparestr(x.domain,y.domain)
end;                                                      

you need a sort procedure anyway, to check for duplicate recs , so theres no overhead in programming

you can iether sort the list after you have added all the recs, then delete out the duplicates , or use an insert proc to check at the time of insertion

e.g.
function customsort(item1,item2:Pointer):integer;
var
x:Tmyrec absolute item1;
y:Tmyrec absolute item2;

begin
result:=comparestr(x.s1, y.s1)
end;



procedure insertrec;
var
top,bottom,current:integer;
newrec,comprec :Tmyrec;
skip:boolean;
cmp:integer;

begin
top:=mylist.count;bottom:=0;

current:=0;
cmp:=-1;

new(newrec);
 //read data from file into newrec

while  (top-bottom>1) and (cmp<>0) do
  begin
    current:=((top-bottom) div 2) + bottom;
    cmp:=customsort(newrec,mylist[current]);
    if cmp<0 then top:=current
    else if cmp>0 then bottom:=current

  end;

if cmp<>0 then
  begin
    current:=top;
    mylist.insert(current,newrec)
  end

end;

procedure buildlist;
begin
mylist:=tlist.Create;
insertrec;
end;

0
 
LVL 11

Expert Comment

by:robert_marquardt
ID: 8238221
The problem with a TList is that you have to implement the allocation and the deallocation (which is missing from your sample) of the elements yourself.
If anyone now calls List.Clear or List.Free you will have orphaned all allocated list elements.
0
 

Expert Comment

by:JAPerrett
ID: 8238700
if cmp<>0 then
 begin
   current:=top;
   mylist.insert(current,newrec)
 end
else
  dispose(newrec)

should have been there.
 And at the end of the routine, once the list has been written to file it needs freeing.

I agree - I have not included deallocation in this example, but the points were

1] sort the list as you build it not after -  that way your doing half the work

2] dont delete duplicates - just omit adding them

3] by using a Tlist, you are not restricted to just strings, you can add record structures, objects, well anything and sort it according to your own rules in a simple sort procedure.

as far as someone else calling clear?
well - i read what I wrote and take your point.

the  procedure buildlist should obviously be a loop

similar to

mylist:=tlist.create;
while not eof(inputfile) do
  insertrec;


I have not included writing out this list to a new file, and freeing up the used memory, but its a small overhead in time.

I suppose the argument is

a] do you want the flexibility of a Tlist, and the easier assosiation of data,

or

b] do you want the simplicity of being able to add and delete items in a stringlist with less concern over memory management.

everyone to their own :)
0
 

Expert Comment

by:wildhorselei
ID: 8242569
U can define a class,such as
   type
     TLetter=Class
       private
 
         FTitle:string;
         FSendName:string;
         FLetterBody:TStrings;
         ....
         function GetTitle(Value:string):string;
         ....
        public
          constructor Create(Value: string);              
          destructor Destroy;
        pulished
          property Title:string read GetTitle write SetTitle;
          ....        

     then you can another class
       TLetterList=Class
         FYourLetters:array of TLetter;


    I think u can know what i mean,good luck!  
0
 

Author Comment

by:GanQuan
ID: 8339099
thanx all!
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
Introduction I have seen many questions in this Delphi topic area where queries in threads are needed or suggested. I know bumped into a similar need. This article will address some of the concepts when dealing with a multithreaded delphi database…
NetCrunch network monitor is a highly extensive platform for network monitoring and alert generation. In this video you'll see a live demo of NetCrunch with most notable features explained in a walk-through manner. You'll also get to know the philos…
Add bar graphs to Access queries using Unicode block characters. Graphs appear on every record in the color you want. Give life to numbers. Hopes this gives you ideas on visualizing your data in new ways ~ Create a calculated field in a query: …
Suggested Courses
Course of the Month12 days, 5 hours left to enroll

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question