routine to import from a txt file

Hello guys,

I need to create a routine to import this list.
I need to get the words after "s.' and before any ".", see the this line

House (English-Portugese) s. casa v. abrigar

SomeTimes I will have lines like this:

House (English-Portugese) s. casa, assembléia v. abrigar, hospedar adj. caseiro  pron. mais do que um conj. onde, alem do mais (uma casa)

So, I will need to separate each option of my line in different variable, I will need to get:

 if there is (uma casa) or any meaning between ( ) different (English-Portugues) I will need to get this too
 When there is ( ) I will have note about the word between the brackets

 ( s. meanings until v.)
 ( v. meanings until adj.)
 (adj. meanings until pron.)
 (pron. meanings until "if doesn't have any . further, so it must to get everything)


any doubt, let me know
thanks
LVL 1
hidrauAsked:
Who is Participating?
 
Russell LibbyConnect With a Mentor Software Engineer, Advisory Commented:

You need to make a couple of changes to add the new item in (also to handle in the regex parsing). Also, define a second array to use for the record filling, because it needs to match the exact value in the string, eg "v." in order to perform the record part assignment. Then use the other array as the descriptive. Full example follows:


type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS, wpInterj);

const
  WORDPART_COMP:    Array [wpAdj..wpInterj] of String =
                       (
                          // These need to be the actual match values (all lowercase)
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.', 'interj.'
                       );
  WORDPART_VALUES:  Array [wpAdj..wpInterj] of String =
                       (
                          // These can be used as descriptives. Sorry, dont know
                          // Portuguese for interjection
                          'adjetivo', 'adverbio', 'pronome', 'conjuncao', 'verbo', 'substantivo', 'interjection'
                       );
type
  TDictWord         =  packed record
     Name:          String;
     Description:   String;
     Parts:         Array [wpAdj..wpInterj] of String;
  end;

procedure TForm1.Button2Click(Sender: TObject);
var  listFile:      TStringList;
     lpValue:       TDictWord;
     wpIndex:       TWordPart;
     szPart:        String;
     reParse:       TRegExpr;
     dwIndex:       Integer;
begin

  // Create list
  listFile:=TStringList.Create;

  // Resource protection
  try
     // Load from file
     listFile.LoadFromFile('c:\list.txt');  // You need to specify your file to process
     // Create regular expression parser
     reParse:=TRegExpr.Create;
     // Set the parser pattern
     reParse.Pattern:='(:<Part>(adj\.|adv\.|s\.|v\.|pron\.|conj\.|interj\.|[^\s]*))\s*(:<Data>!(adj\.|adv\.|s\.|v\.|pron\.|conj\.|interj\.)*)';
     // Parse each line
     for dwIndex:=0 to Pred(listFile.Count) do
     begin
        // Set source string to parse
        reParse.Source:=PChar(listFile[dwIndex]);
        // Match first
        if reParse.MatchFirst then
        begin
           // Set the word and description
           lpValue.Name:=reParse.NamedBackReference['Part'];
           lpValue.Description:=reParse.NamedBackReference['Data'];
           // Clear other part fields
           for wpIndex:=wpAdj to wpInterj do SetLength(lpValue.Parts[wpIndex], 0);
           // Repeat until no more matches
           while reParse.MatchNext do
           begin
              // Get the part name
              szPart:=Lowercase(reParse.NamedBackReference['Part']);
              // Loop the types
              for wpIndex:=wpAdj to wpInterj do
              begin
                 // Check the word part
                 if (CompareStr(szPart, WORDPART_COMP[wpIndex]) = 0) then
                 begin
                    // Set the data for the part
                    lpValue.Parts[wpIndex]:=reParse.NamedBackReference['Data'];
                    // Done processing part
                    break;
                 end;
              end;
           end;

           //
           // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           for wpIndex:=wpAdj to wpInterj do
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
           if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           // --- end of example ---

        end;
     end;
  finally
     // Free the list
     listFile.Free;
  end;

end;
0
 
hidrauAuthor Commented:
If you want you can get a small txt list here

http://www.infosoftlanguages.com.br/arquivos/list.txt
0
 
Russell LibbySoftware Engineer, Advisory Commented:

What are you looking to split each line up into (variable wise)?  Individual words? groups of words? eg:

>> House (English-Portugese) s. casa, assembléia v. abrigar, hospedar adj. caseiro  pron. mais do que um conj. onde, alem do mais (uma casa)

House (English-Portugese)
------------------------------
s. = casa, assembléia
v. = abrigar, hospedar
adj. = caseiro
pron. = mais do que um
conj. = onde, alem do mais (uma casa)

And should each option be a list of words, or a single string, eg: "casa, assembléia"

---
Rusell

0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
hidrauAuthor Commented:
I want to split the list into Group of words.

V. = verbs
S. = noun
Adj. = adjectives
Conj. = conjunctions

After all group splitted, I will record into a table

House is the head word

when you have a word between brackets, this will be a note of the word.

For each group I will have a field to record.


(English-Portugese) = this need to be off
 
s. = casa, assembléia
v. = abrigar, hospedar
adj. = caseiro
pron. = mais do que um
conj. = onde, alem do mais
note = (uma casa)


You see!
0
 
Russell LibbySoftware Engineer, Advisory Commented:

I see *almost*....
What should happen if a a line contains more than one *note*? eg:

Abilene  s. cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

word = Abilene
s.=  cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

Should the notes be left in? should they be removed? If you want them removed and kept for the note, how should they be merged?

eg:
word = Abilene
s.=  cidade no Texas; cidade em Kansas (E.U.A.)
note=(E.U.A.) (E.U.A.) <-- what to do here.

In many cases there are multiple notes, and the notes are NOT the same text.

Russell

0
 
hidrauAuthor Commented:
I take a look in the list and I noteced that note belongs a specific subject.

 s.=  cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

so, this is the correct, leave the note in the subject, as you showed me.

0
 
hidrauAuthor Commented:
you don't need to take it to place it in another variable.

you see?
0
 
Russell LibbySoftware Engineer, Advisory Commented:
Yeah, I think I got it.

Russell

----

//   Include the RegExprEx unit in the uses clause, the source can be downloaded
//   from my site @:
//
//   http://users.adelphia.net/~rllibby/downloads/regexprex.zip

type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS);

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.'
                       );
type
  TDictWord         =  packed record
     Name:          String;
     Description:   String;
     Parts:         Array [wpAdj..wpS] of String;
  end;

procedure TForm1.Button2Click(Sender: TObject);
var  listFile:      TStringList;
     lpValue:       TDictWord;
     wpIndex:       TWordPart;
     szPart:        String;
     reParse:       TRegExpr;
     dwIndex:       Integer;
begin

  // Create list
  listFile:=TStringList.Create;

  // Resource protection
  try
     // Load from file
     listFile.LoadFromFile('c:\list.txt');  // You need to specify your file to process
     // Create regular expression parser
     reParse:=TRegExpr.Create;
     // Set the parser pattern
     reParse.Pattern:='(:<Part>(adj\.|adv\.|s\.|v\.|pron\.|conj\.|[^\s]*))\s*(:<Data>!(adj\.|adv\.|s\.|v\.|pron\.|conj\.)*)';
     // Parse each line
     for dwIndex:=0 to Pred(listFile.Count) do
     begin
        // Set source string to parse
        reParse.Source:=PChar(listFile[dwIndex]);
        // Match first
        if reParse.MatchFirst then
        begin
           // Set the word and description
           lpValue.Name:=reParse.NamedBackReference['Part'];
           lpValue.Description:=reParse.NamedBackReference['Data'];
           // Clear other part fields
           for wpIndex:=wpAdj to wpS do SetLength(lpValue.Parts[wpIndex], 0);
           // Repeat until no more matches
           while reParse.MatchNext do
           begin
              // Get the part name
              szPart:=Lowercase(reParse.NamedBackReference['Part']);
              // Loop the types
              for wpIndex:=wpAdj to wpS do
              begin
                 // Check the word part
                 if (CompareStr(szPart, WORDPART_VALUES[wpIndex]) = 0) then
                 begin
                    // Set the data for the part
                    lpValue.Parts[wpIndex]:=reParse.NamedBackReference['Data'];
                    // Done processing part
                    break;
                 end;
              end;
           end;

           //
           // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           for wpIndex:=wpAdj to wpS do
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
           if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           // --- end of example ---

        end;
     end;
  finally
     // Free the list
     listFile.Free;
  end;


end;
0
 
hidrauAuthor Commented:
rllibby , I forgot this grammar class 'interj'

I tried to add in the list

type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS, Wpinterj);

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.','interj.'
                       );

But I am having a problem

Number of elements differs from declaration, why?
0
 
hidrauAuthor Commented:
Ah, it was very cool your code
thanks for your big help

:))
0
 
hidrauAuthor Commented:
I tried to change the name of WORDPART_VALUES but I am having an error when I run the code,

:(

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adjetivo', 'adverbio', 'pronome', 'conjuncao', 'verbo', 'substantivo'
                       );


I changed because each name will correspond the name of field in the table: See below



       // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           Word.Append;
           Word.FieldByName('Palavra').AsString := lpValue.Name;
           Word.FieldByName('Obs').AsString     := lpValue.Description;
           for wpIndex:=wpAdj to wpS do
           Begin
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
              Word.FieldByName(WORDPART_VALUES[wpIndex]).AsString := lpValue.Parts[wpIndex];
             // if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           End;
           Word.Post;
        end;
     end;
0
 
hidrauAuthor Commented:
Please, the lasp post was a mistake mine, ignore the last post.

Thanks
0
 
hidrauAuthor Commented:
Now I understood your code better, thanks very much.

0
 
Russell LibbySoftware Engineer, Advisory Commented:
No problem,

Russell
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.