?
Solved

WORKING WITH VERY LARGE WORD LISTS

Posted on 2003-11-16
5
Medium Priority
?
213 Views
Last Modified: 2010-04-05
Dear Delphi Experts,

How I starting to do a program that involves comparing two very large word lists (~40.000 and 70.000 words) and finding out which words
are on one list and not on the other (and/or vice versa).

Each large word lists must be textfiles (.txt) and the results must be
presetend in a third textfile or listbox.

Many Thanks

LeTchev
0
Comment
Question by:letchev
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
5 Comments
 

Expert Comment

by:Smortex
ID: 9759832
Try this :
Load each word list in a TStringList (One word per line) avec sort them.

Create a function that search an item in a list.

Here is an example. In order to make it easier to read, I used 2 TListBox. When an item is clicked, a TLabel get the caption "True" if the item selected is found in the second TListBox, "False" if it was not found :

procedure TForm1.ListBox1Click(Sender: TObject);
  function FindTheOther(AWord: string; AList: TStringList): Boolean;
  var
    Offset, Step, CompResult: integer;
  begin
    Offset := AList.Count div 2;
    Step   := AList.Count;
    while Step <> 0 do
    begin
      Step := Step div 2;
      if Offset + Step >= AList.Count then
        Step := AList.Count - Offset - 1;
      CompResult := CompareText(AWord,AList[offset]);
      if CompResult = 0 then
      begin
        Result := True;
        Exit;
      end
      else
        if CompResult > 0 then
          Offset := Offset + Step
        else
          Offset := Offset - Step;
    end;
    Result := False;
  end;
begin
  Label1.Caption := BoolToStr(FindTheOther(ListBox1.Items[ListBox1.ItemIndex],TStringList(ListBox2.Items)),True);
end;

Hope that help :)

Regards
0
 

Accepted Solution

by:
Smortex earned 500 total points
ID: 9760080
Ooops....

This function should work better ;)

procedure TForm1.ListBox1Click(Sender: TObject);
  function FindTheOther(AWord: string; AList: TStringList): Boolean;
  var
    Offset, Step, CompResult: integer;
    LastChance : Integer;
  begin
    Offset := Ceil(AList.Count / 2);
    Step   := Ceil(AList.Count / 2);
    LastChance := 2;
    while LastChance <> 0 do
    begin

      Step := Ceil(Step / 2);

      if Step = 1 then
        Dec(LastChance);

      if Offset < 0 then
        Offset := 0;
      if Offset >= AList.Count then
        Offset := AList.Count - 1;

      CompResult := CompareText(AWord,AList[offset]);
      if CompResult = 0 then
      begin
        Result := True;
        Exit;
      end
      else
        if CompResult > 0 then
        begin
          Offset := Offset + Step;
        end
        else
        begin
          Offset := Offset - Step;
        end;
    end;
    Result := False;
  end;
begin
  Label1.Caption := BoolToStr(FindTheOther(ListBox1.Items[ListBox1.itemindex],TStringList(ListBox2.Items)),True);
end;

Sorry....

Regards
0
 

Author Comment

by:letchev
ID: 9791630
Dear Smortex,

Sorry, but it is not I want. Firstly I would need a routine for reading wordlists from .txt or .csv files. for example

List1.LoadFromFile('c:\1.txt');
List2.LoadFromFile('c:\2.txt');

ListBox3.Items.Add (here the common words found in list1 and list2

it is possible?

Thank you for your patience.

Letchev
0
 

Expert Comment

by:Smortex
ID: 9799850
Won can do this very easely using my function :

  List1 := TStringList.Create;
  List2 := TStringList.Create;
  try
    List1.LoadFromFile('c:\1.txt');
    List2.LoadFromFile('c:\2.txt');

    List1.Sort;
    List2.Sort;

    for i := 0 to Pred(List1.Count) do
      if FindTheOther(List1[i],List2) then
        ListBox1.Items.Add(List1[i]);

  finally
    List1.Free;
    List2.Free;
  end;

Regards
0
 

Expert Comment

by:Smortex
ID: 9844803
Note that if you dont make any search (with my "FindTheOther" function) on the first list (List1) you do not have to sort it :)
This line can so be removed :
    List1.Sort;

Regards
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
Sometimes it takes a new vantage point, apart from our everyday security practices, to truly see our Active Directory (AD) vulnerabilities. We get used to implementing the same techniques and checking the same areas for a breach. This pattern can re…
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…
Suggested Courses
Course of the Month13 days, 13 hours left to enroll

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question