• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 217
  • Last Modified:

WORKING WITH VERY LARGE WORD LISTS

Dear Delphi Experts,

How I starting to do a program that involves comparing two very large word lists (~40.000 and 70.000 words) and finding out which words
are on one list and not on the other (and/or vice versa).

Each large word lists must be textfiles (.txt) and the results must be
presetend in a third textfile or listbox.

Many Thanks

LeTchev
0
letchev
Asked:
letchev
  • 4
1 Solution
 
SmortexCommented:
Try this :
Load each word list in a TStringList (One word per line) avec sort them.

Create a function that search an item in a list.

Here is an example. In order to make it easier to read, I used 2 TListBox. When an item is clicked, a TLabel get the caption "True" if the item selected is found in the second TListBox, "False" if it was not found :

procedure TForm1.ListBox1Click(Sender: TObject);
  function FindTheOther(AWord: string; AList: TStringList): Boolean;
  var
    Offset, Step, CompResult: integer;
  begin
    Offset := AList.Count div 2;
    Step   := AList.Count;
    while Step <> 0 do
    begin
      Step := Step div 2;
      if Offset + Step >= AList.Count then
        Step := AList.Count - Offset - 1;
      CompResult := CompareText(AWord,AList[offset]);
      if CompResult = 0 then
      begin
        Result := True;
        Exit;
      end
      else
        if CompResult > 0 then
          Offset := Offset + Step
        else
          Offset := Offset - Step;
    end;
    Result := False;
  end;
begin
  Label1.Caption := BoolToStr(FindTheOther(ListBox1.Items[ListBox1.ItemIndex],TStringList(ListBox2.Items)),True);
end;

Hope that help :)

Regards
0
 
SmortexCommented:
Ooops....

This function should work better ;)

procedure TForm1.ListBox1Click(Sender: TObject);
  function FindTheOther(AWord: string; AList: TStringList): Boolean;
  var
    Offset, Step, CompResult: integer;
    LastChance : Integer;
  begin
    Offset := Ceil(AList.Count / 2);
    Step   := Ceil(AList.Count / 2);
    LastChance := 2;
    while LastChance <> 0 do
    begin

      Step := Ceil(Step / 2);

      if Step = 1 then
        Dec(LastChance);

      if Offset < 0 then
        Offset := 0;
      if Offset >= AList.Count then
        Offset := AList.Count - 1;

      CompResult := CompareText(AWord,AList[offset]);
      if CompResult = 0 then
      begin
        Result := True;
        Exit;
      end
      else
        if CompResult > 0 then
        begin
          Offset := Offset + Step;
        end
        else
        begin
          Offset := Offset - Step;
        end;
    end;
    Result := False;
  end;
begin
  Label1.Caption := BoolToStr(FindTheOther(ListBox1.Items[ListBox1.itemindex],TStringList(ListBox2.Items)),True);
end;

Sorry....

Regards
0
 
letchevAuthor Commented:
Dear Smortex,

Sorry, but it is not I want. Firstly I would need a routine for reading wordlists from .txt or .csv files. for example

List1.LoadFromFile('c:\1.txt');
List2.LoadFromFile('c:\2.txt');

ListBox3.Items.Add (here the common words found in list1 and list2

it is possible?

Thank you for your patience.

Letchev
0
 
SmortexCommented:
Won can do this very easely using my function :

  List1 := TStringList.Create;
  List2 := TStringList.Create;
  try
    List1.LoadFromFile('c:\1.txt');
    List2.LoadFromFile('c:\2.txt');

    List1.Sort;
    List2.Sort;

    for i := 0 to Pred(List1.Count) do
      if FindTheOther(List1[i],List2) then
        ListBox1.Items.Add(List1[i]);

  finally
    List1.Free;
    List2.Free;
  end;

Regards
0
 
SmortexCommented:
Note that if you dont make any search (with my "FindTheOther" function) on the first list (List1) you do not have to sort it :)
This line can so be removed :
    List1.Sort;

Regards
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now