Solved

WORKING WITH VERY LARGE WORD LISTS

Posted on 2003-11-16
5
211 Views
Last Modified: 2010-04-05
Dear Delphi Experts,

How I starting to do a program that involves comparing two very large word lists (~40.000 and 70.000 words) and finding out which words
are on one list and not on the other (and/or vice versa).

Each large word lists must be textfiles (.txt) and the results must be
presetend in a third textfile or listbox.

Many Thanks

LeTchev
0
Comment
Question by:letchev
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
5 Comments
 

Expert Comment

by:Smortex
ID: 9759832
Try this :
Load each word list in a TStringList (One word per line) avec sort them.

Create a function that search an item in a list.

Here is an example. In order to make it easier to read, I used 2 TListBox. When an item is clicked, a TLabel get the caption "True" if the item selected is found in the second TListBox, "False" if it was not found :

procedure TForm1.ListBox1Click(Sender: TObject);
  function FindTheOther(AWord: string; AList: TStringList): Boolean;
  var
    Offset, Step, CompResult: integer;
  begin
    Offset := AList.Count div 2;
    Step   := AList.Count;
    while Step <> 0 do
    begin
      Step := Step div 2;
      if Offset + Step >= AList.Count then
        Step := AList.Count - Offset - 1;
      CompResult := CompareText(AWord,AList[offset]);
      if CompResult = 0 then
      begin
        Result := True;
        Exit;
      end
      else
        if CompResult > 0 then
          Offset := Offset + Step
        else
          Offset := Offset - Step;
    end;
    Result := False;
  end;
begin
  Label1.Caption := BoolToStr(FindTheOther(ListBox1.Items[ListBox1.ItemIndex],TStringList(ListBox2.Items)),True);
end;

Hope that help :)

Regards
0
 

Accepted Solution

by:
Smortex earned 125 total points
ID: 9760080
Ooops....

This function should work better ;)

procedure TForm1.ListBox1Click(Sender: TObject);
  function FindTheOther(AWord: string; AList: TStringList): Boolean;
  var
    Offset, Step, CompResult: integer;
    LastChance : Integer;
  begin
    Offset := Ceil(AList.Count / 2);
    Step   := Ceil(AList.Count / 2);
    LastChance := 2;
    while LastChance <> 0 do
    begin

      Step := Ceil(Step / 2);

      if Step = 1 then
        Dec(LastChance);

      if Offset < 0 then
        Offset := 0;
      if Offset >= AList.Count then
        Offset := AList.Count - 1;

      CompResult := CompareText(AWord,AList[offset]);
      if CompResult = 0 then
      begin
        Result := True;
        Exit;
      end
      else
        if CompResult > 0 then
        begin
          Offset := Offset + Step;
        end
        else
        begin
          Offset := Offset - Step;
        end;
    end;
    Result := False;
  end;
begin
  Label1.Caption := BoolToStr(FindTheOther(ListBox1.Items[ListBox1.itemindex],TStringList(ListBox2.Items)),True);
end;

Sorry....

Regards
0
 

Author Comment

by:letchev
ID: 9791630
Dear Smortex,

Sorry, but it is not I want. Firstly I would need a routine for reading wordlists from .txt or .csv files. for example

List1.LoadFromFile('c:\1.txt');
List2.LoadFromFile('c:\2.txt');

ListBox3.Items.Add (here the common words found in list1 and list2

it is possible?

Thank you for your patience.

Letchev
0
 

Expert Comment

by:Smortex
ID: 9799850
Won can do this very easely using my function :

  List1 := TStringList.Create;
  List2 := TStringList.Create;
  try
    List1.LoadFromFile('c:\1.txt');
    List2.LoadFromFile('c:\2.txt');

    List1.Sort;
    List2.Sort;

    for i := 0 to Pred(List1.Count) do
      if FindTheOther(List1[i],List2) then
        ListBox1.Items.Add(List1[i]);

  finally
    List1.Free;
    List2.Free;
  end;

Regards
0
 

Expert Comment

by:Smortex
ID: 9844803
Note that if you dont make any search (with my "FindTheOther" function) on the first list (List1) you do not have to sort it :)
This line can so be removed :
    List1.Sort;

Regards
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Convert a string into a TDateTime 5 79
update joined tables 2 72
Delphi: making a BW image transparent 10 136
Delphi Firemonkey send email on Android 1 76
This article explains how to create forms/units independent of other forms/units object names in a delphi project. Have you ever created a form for user input in a Delphi project and then had the need to have that same form in a other Delphi proj…
Hello everybody This Article will show you how to validate number with TEdit control, What's the TEdit control? TEdit is a standard Windows edit control on a form, it allows to user to write, read and copy/paste single line of text. Usua…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

737 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question