Solved

routine to import from a txt file

Posted on 2006-11-30
14
237 Views
Last Modified: 2010-04-05
Hello guys,

I need to create a routine to import this list.
I need to get the words after "s.' and before any ".", see the this line

House (English-Portugese) s. casa v. abrigar

SomeTimes I will have lines like this:

House (English-Portugese) s. casa, assembléia v. abrigar, hospedar adj. caseiro  pron. mais do que um conj. onde, alem do mais (uma casa)

So, I will need to separate each option of my line in different variable, I will need to get:

 if there is (uma casa) or any meaning between ( ) different (English-Portugues) I will need to get this too
 When there is ( ) I will have note about the word between the brackets

 ( s. meanings until v.)
 ( v. meanings until adj.)
 (adj. meanings until pron.)
 (pron. meanings until "if doesn't have any . further, so it must to get everything)


any doubt, let me know
thanks
0
Comment
Question by:hidrau
  • 9
  • 5
14 Comments
 
LVL 1

Author Comment

by:hidrau
ID: 18046676
If you want you can get a small txt list here

http://www.infosoftlanguages.com.br/arquivos/list.txt
0
 
LVL 26

Expert Comment

by:Russell Libby
ID: 18047520

What are you looking to split each line up into (variable wise)?  Individual words? groups of words? eg:

>> House (English-Portugese) s. casa, assembléia v. abrigar, hospedar adj. caseiro  pron. mais do que um conj. onde, alem do mais (uma casa)

House (English-Portugese)
------------------------------
s. = casa, assembléia
v. = abrigar, hospedar
adj. = caseiro
pron. = mais do que um
conj. = onde, alem do mais (uma casa)

And should each option be a list of words, or a single string, eg: "casa, assembléia"

---
Rusell

0
 
LVL 1

Author Comment

by:hidrau
ID: 18047871
I want to split the list into Group of words.

V. = verbs
S. = noun
Adj. = adjectives
Conj. = conjunctions

After all group splitted, I will record into a table

House is the head word

when you have a word between brackets, this will be a note of the word.

For each group I will have a field to record.


(English-Portugese) = this need to be off
 
s. = casa, assembléia
v. = abrigar, hospedar
adj. = caseiro
pron. = mais do que um
conj. = onde, alem do mais
note = (uma casa)


You see!
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 26

Expert Comment

by:Russell Libby
ID: 18048328

I see *almost*....
What should happen if a a line contains more than one *note*? eg:

Abilene  s. cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

word = Abilene
s.=  cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

Should the notes be left in? should they be removed? If you want them removed and kept for the note, how should they be merged?

eg:
word = Abilene
s.=  cidade no Texas; cidade em Kansas (E.U.A.)
note=(E.U.A.) (E.U.A.) <-- what to do here.

In many cases there are multiple notes, and the notes are NOT the same text.

Russell

0
 
LVL 1

Author Comment

by:hidrau
ID: 18048368
I take a look in the list and I noteced that note belongs a specific subject.

 s.=  cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

so, this is the correct, leave the note in the subject, as you showed me.

0
 
LVL 1

Author Comment

by:hidrau
ID: 18048377
you don't need to take it to place it in another variable.

you see?
0
 
LVL 26

Expert Comment

by:Russell Libby
ID: 18048737
Yeah, I think I got it.

Russell

----

//   Include the RegExprEx unit in the uses clause, the source can be downloaded
//   from my site @:
//
//   http://users.adelphia.net/~rllibby/downloads/regexprex.zip

type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS);

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.'
                       );
type
  TDictWord         =  packed record
     Name:          String;
     Description:   String;
     Parts:         Array [wpAdj..wpS] of String;
  end;

procedure TForm1.Button2Click(Sender: TObject);
var  listFile:      TStringList;
     lpValue:       TDictWord;
     wpIndex:       TWordPart;
     szPart:        String;
     reParse:       TRegExpr;
     dwIndex:       Integer;
begin

  // Create list
  listFile:=TStringList.Create;

  // Resource protection
  try
     // Load from file
     listFile.LoadFromFile('c:\list.txt');  // You need to specify your file to process
     // Create regular expression parser
     reParse:=TRegExpr.Create;
     // Set the parser pattern
     reParse.Pattern:='(:<Part>(adj\.|adv\.|s\.|v\.|pron\.|conj\.|[^\s]*))\s*(:<Data>!(adj\.|adv\.|s\.|v\.|pron\.|conj\.)*)';
     // Parse each line
     for dwIndex:=0 to Pred(listFile.Count) do
     begin
        // Set source string to parse
        reParse.Source:=PChar(listFile[dwIndex]);
        // Match first
        if reParse.MatchFirst then
        begin
           // Set the word and description
           lpValue.Name:=reParse.NamedBackReference['Part'];
           lpValue.Description:=reParse.NamedBackReference['Data'];
           // Clear other part fields
           for wpIndex:=wpAdj to wpS do SetLength(lpValue.Parts[wpIndex], 0);
           // Repeat until no more matches
           while reParse.MatchNext do
           begin
              // Get the part name
              szPart:=Lowercase(reParse.NamedBackReference['Part']);
              // Loop the types
              for wpIndex:=wpAdj to wpS do
              begin
                 // Check the word part
                 if (CompareStr(szPart, WORDPART_VALUES[wpIndex]) = 0) then
                 begin
                    // Set the data for the part
                    lpValue.Parts[wpIndex]:=reParse.NamedBackReference['Data'];
                    // Done processing part
                    break;
                 end;
              end;
           end;

           //
           // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           for wpIndex:=wpAdj to wpS do
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
           if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           // --- end of example ---

        end;
     end;
  finally
     // Free the list
     listFile.Free;
  end;


end;
0
 
LVL 1

Author Comment

by:hidrau
ID: 18052627
rllibby , I forgot this grammar class 'interj'

I tried to add in the list

type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS, Wpinterj);

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.','interj.'
                       );

But I am having a problem

Number of elements differs from declaration, why?
0
 
LVL 1

Author Comment

by:hidrau
ID: 18052629
Ah, it was very cool your code
thanks for your big help

:))
0
 
LVL 1

Author Comment

by:hidrau
ID: 18052765
I tried to change the name of WORDPART_VALUES but I am having an error when I run the code,

:(

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adjetivo', 'adverbio', 'pronome', 'conjuncao', 'verbo', 'substantivo'
                       );


I changed because each name will correspond the name of field in the table: See below



       // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           Word.Append;
           Word.FieldByName('Palavra').AsString := lpValue.Name;
           Word.FieldByName('Obs').AsString     := lpValue.Description;
           for wpIndex:=wpAdj to wpS do
           Begin
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
              Word.FieldByName(WORDPART_VALUES[wpIndex]).AsString := lpValue.Parts[wpIndex];
             // if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           End;
           Word.Post;
        end;
     end;
0
 
LVL 1

Author Comment

by:hidrau
ID: 18053311
Please, the lasp post was a mistake mine, ignore the last post.

Thanks
0
 
LVL 26

Accepted Solution

by:
Russell Libby earned 500 total points
ID: 18053813

You need to make a couple of changes to add the new item in (also to handle in the regex parsing). Also, define a second array to use for the record filling, because it needs to match the exact value in the string, eg "v." in order to perform the record part assignment. Then use the other array as the descriptive. Full example follows:


type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS, wpInterj);

const
  WORDPART_COMP:    Array [wpAdj..wpInterj] of String =
                       (
                          // These need to be the actual match values (all lowercase)
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.', 'interj.'
                       );
  WORDPART_VALUES:  Array [wpAdj..wpInterj] of String =
                       (
                          // These can be used as descriptives. Sorry, dont know
                          // Portuguese for interjection
                          'adjetivo', 'adverbio', 'pronome', 'conjuncao', 'verbo', 'substantivo', 'interjection'
                       );
type
  TDictWord         =  packed record
     Name:          String;
     Description:   String;
     Parts:         Array [wpAdj..wpInterj] of String;
  end;

procedure TForm1.Button2Click(Sender: TObject);
var  listFile:      TStringList;
     lpValue:       TDictWord;
     wpIndex:       TWordPart;
     szPart:        String;
     reParse:       TRegExpr;
     dwIndex:       Integer;
begin

  // Create list
  listFile:=TStringList.Create;

  // Resource protection
  try
     // Load from file
     listFile.LoadFromFile('c:\list.txt');  // You need to specify your file to process
     // Create regular expression parser
     reParse:=TRegExpr.Create;
     // Set the parser pattern
     reParse.Pattern:='(:<Part>(adj\.|adv\.|s\.|v\.|pron\.|conj\.|interj\.|[^\s]*))\s*(:<Data>!(adj\.|adv\.|s\.|v\.|pron\.|conj\.|interj\.)*)';
     // Parse each line
     for dwIndex:=0 to Pred(listFile.Count) do
     begin
        // Set source string to parse
        reParse.Source:=PChar(listFile[dwIndex]);
        // Match first
        if reParse.MatchFirst then
        begin
           // Set the word and description
           lpValue.Name:=reParse.NamedBackReference['Part'];
           lpValue.Description:=reParse.NamedBackReference['Data'];
           // Clear other part fields
           for wpIndex:=wpAdj to wpInterj do SetLength(lpValue.Parts[wpIndex], 0);
           // Repeat until no more matches
           while reParse.MatchNext do
           begin
              // Get the part name
              szPart:=Lowercase(reParse.NamedBackReference['Part']);
              // Loop the types
              for wpIndex:=wpAdj to wpInterj do
              begin
                 // Check the word part
                 if (CompareStr(szPart, WORDPART_COMP[wpIndex]) = 0) then
                 begin
                    // Set the data for the part
                    lpValue.Parts[wpIndex]:=reParse.NamedBackReference['Data'];
                    // Done processing part
                    break;
                 end;
              end;
           end;

           //
           // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           for wpIndex:=wpAdj to wpInterj do
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
           if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           // --- end of example ---

        end;
     end;
  finally
     // Free the list
     listFile.Free;
  end;

end;
0
 
LVL 1

Author Comment

by:hidrau
ID: 18053931
Now I understood your code better, thanks very much.

0
 
LVL 26

Expert Comment

by:Russell Libby
ID: 18053959
No problem,

Russell
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

A lot of questions regard threads in Delphi.   One of the more specific questions is how to show progress of the thread.   Updating a progressbar from inside a thread is a mistake. A solution to this would be to send a synchronized message to the…
Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

679 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question