Solved

routine to import from a txt file

Posted on 2006-11-30
14
236 Views
Last Modified: 2010-04-05
Hello guys,

I need to create a routine to import this list.
I need to get the words after "s.' and before any ".", see the this line

House (English-Portugese) s. casa v. abrigar

SomeTimes I will have lines like this:

House (English-Portugese) s. casa, assembléia v. abrigar, hospedar adj. caseiro  pron. mais do que um conj. onde, alem do mais (uma casa)

So, I will need to separate each option of my line in different variable, I will need to get:

 if there is (uma casa) or any meaning between ( ) different (English-Portugues) I will need to get this too
 When there is ( ) I will have note about the word between the brackets

 ( s. meanings until v.)
 ( v. meanings until adj.)
 (adj. meanings until pron.)
 (pron. meanings until "if doesn't have any . further, so it must to get everything)


any doubt, let me know
thanks
0
Comment
Question by:hidrau
  • 9
  • 5
14 Comments
 
LVL 1

Author Comment

by:hidrau
ID: 18046676
If you want you can get a small txt list here

http://www.infosoftlanguages.com.br/arquivos/list.txt
0
 
LVL 26

Expert Comment

by:Russell Libby
ID: 18047520

What are you looking to split each line up into (variable wise)?  Individual words? groups of words? eg:

>> House (English-Portugese) s. casa, assembléia v. abrigar, hospedar adj. caseiro  pron. mais do que um conj. onde, alem do mais (uma casa)

House (English-Portugese)
------------------------------
s. = casa, assembléia
v. = abrigar, hospedar
adj. = caseiro
pron. = mais do que um
conj. = onde, alem do mais (uma casa)

And should each option be a list of words, or a single string, eg: "casa, assembléia"

---
Rusell

0
 
LVL 1

Author Comment

by:hidrau
ID: 18047871
I want to split the list into Group of words.

V. = verbs
S. = noun
Adj. = adjectives
Conj. = conjunctions

After all group splitted, I will record into a table

House is the head word

when you have a word between brackets, this will be a note of the word.

For each group I will have a field to record.


(English-Portugese) = this need to be off
 
s. = casa, assembléia
v. = abrigar, hospedar
adj. = caseiro
pron. = mais do que um
conj. = onde, alem do mais
note = (uma casa)


You see!
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 26

Expert Comment

by:Russell Libby
ID: 18048328

I see *almost*....
What should happen if a a line contains more than one *note*? eg:

Abilene  s. cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

word = Abilene
s.=  cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

Should the notes be left in? should they be removed? If you want them removed and kept for the note, how should they be merged?

eg:
word = Abilene
s.=  cidade no Texas; cidade em Kansas (E.U.A.)
note=(E.U.A.) (E.U.A.) <-- what to do here.

In many cases there are multiple notes, and the notes are NOT the same text.

Russell

0
 
LVL 1

Author Comment

by:hidrau
ID: 18048368
I take a look in the list and I noteced that note belongs a specific subject.

 s.=  cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

so, this is the correct, leave the note in the subject, as you showed me.

0
 
LVL 1

Author Comment

by:hidrau
ID: 18048377
you don't need to take it to place it in another variable.

you see?
0
 
LVL 26

Expert Comment

by:Russell Libby
ID: 18048737
Yeah, I think I got it.

Russell

----

//   Include the RegExprEx unit in the uses clause, the source can be downloaded
//   from my site @:
//
//   http://users.adelphia.net/~rllibby/downloads/regexprex.zip

type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS);

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.'
                       );
type
  TDictWord         =  packed record
     Name:          String;
     Description:   String;
     Parts:         Array [wpAdj..wpS] of String;
  end;

procedure TForm1.Button2Click(Sender: TObject);
var  listFile:      TStringList;
     lpValue:       TDictWord;
     wpIndex:       TWordPart;
     szPart:        String;
     reParse:       TRegExpr;
     dwIndex:       Integer;
begin

  // Create list
  listFile:=TStringList.Create;

  // Resource protection
  try
     // Load from file
     listFile.LoadFromFile('c:\list.txt');  // You need to specify your file to process
     // Create regular expression parser
     reParse:=TRegExpr.Create;
     // Set the parser pattern
     reParse.Pattern:='(:<Part>(adj\.|adv\.|s\.|v\.|pron\.|conj\.|[^\s]*))\s*(:<Data>!(adj\.|adv\.|s\.|v\.|pron\.|conj\.)*)';
     // Parse each line
     for dwIndex:=0 to Pred(listFile.Count) do
     begin
        // Set source string to parse
        reParse.Source:=PChar(listFile[dwIndex]);
        // Match first
        if reParse.MatchFirst then
        begin
           // Set the word and description
           lpValue.Name:=reParse.NamedBackReference['Part'];
           lpValue.Description:=reParse.NamedBackReference['Data'];
           // Clear other part fields
           for wpIndex:=wpAdj to wpS do SetLength(lpValue.Parts[wpIndex], 0);
           // Repeat until no more matches
           while reParse.MatchNext do
           begin
              // Get the part name
              szPart:=Lowercase(reParse.NamedBackReference['Part']);
              // Loop the types
              for wpIndex:=wpAdj to wpS do
              begin
                 // Check the word part
                 if (CompareStr(szPart, WORDPART_VALUES[wpIndex]) = 0) then
                 begin
                    // Set the data for the part
                    lpValue.Parts[wpIndex]:=reParse.NamedBackReference['Data'];
                    // Done processing part
                    break;
                 end;
              end;
           end;

           //
           // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           for wpIndex:=wpAdj to wpS do
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
           if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           // --- end of example ---

        end;
     end;
  finally
     // Free the list
     listFile.Free;
  end;


end;
0
 
LVL 1

Author Comment

by:hidrau
ID: 18052627
rllibby , I forgot this grammar class 'interj'

I tried to add in the list

type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS, Wpinterj);

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.','interj.'
                       );

But I am having a problem

Number of elements differs from declaration, why?
0
 
LVL 1

Author Comment

by:hidrau
ID: 18052629
Ah, it was very cool your code
thanks for your big help

:))
0
 
LVL 1

Author Comment

by:hidrau
ID: 18052765
I tried to change the name of WORDPART_VALUES but I am having an error when I run the code,

:(

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adjetivo', 'adverbio', 'pronome', 'conjuncao', 'verbo', 'substantivo'
                       );


I changed because each name will correspond the name of field in the table: See below



       // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           Word.Append;
           Word.FieldByName('Palavra').AsString := lpValue.Name;
           Word.FieldByName('Obs').AsString     := lpValue.Description;
           for wpIndex:=wpAdj to wpS do
           Begin
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
              Word.FieldByName(WORDPART_VALUES[wpIndex]).AsString := lpValue.Parts[wpIndex];
             // if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           End;
           Word.Post;
        end;
     end;
0
 
LVL 1

Author Comment

by:hidrau
ID: 18053311
Please, the lasp post was a mistake mine, ignore the last post.

Thanks
0
 
LVL 26

Accepted Solution

by:
Russell Libby earned 500 total points
ID: 18053813

You need to make a couple of changes to add the new item in (also to handle in the regex parsing). Also, define a second array to use for the record filling, because it needs to match the exact value in the string, eg "v." in order to perform the record part assignment. Then use the other array as the descriptive. Full example follows:


type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS, wpInterj);

const
  WORDPART_COMP:    Array [wpAdj..wpInterj] of String =
                       (
                          // These need to be the actual match values (all lowercase)
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.', 'interj.'
                       );
  WORDPART_VALUES:  Array [wpAdj..wpInterj] of String =
                       (
                          // These can be used as descriptives. Sorry, dont know
                          // Portuguese for interjection
                          'adjetivo', 'adverbio', 'pronome', 'conjuncao', 'verbo', 'substantivo', 'interjection'
                       );
type
  TDictWord         =  packed record
     Name:          String;
     Description:   String;
     Parts:         Array [wpAdj..wpInterj] of String;
  end;

procedure TForm1.Button2Click(Sender: TObject);
var  listFile:      TStringList;
     lpValue:       TDictWord;
     wpIndex:       TWordPart;
     szPart:        String;
     reParse:       TRegExpr;
     dwIndex:       Integer;
begin

  // Create list
  listFile:=TStringList.Create;

  // Resource protection
  try
     // Load from file
     listFile.LoadFromFile('c:\list.txt');  // You need to specify your file to process
     // Create regular expression parser
     reParse:=TRegExpr.Create;
     // Set the parser pattern
     reParse.Pattern:='(:<Part>(adj\.|adv\.|s\.|v\.|pron\.|conj\.|interj\.|[^\s]*))\s*(:<Data>!(adj\.|adv\.|s\.|v\.|pron\.|conj\.|interj\.)*)';
     // Parse each line
     for dwIndex:=0 to Pred(listFile.Count) do
     begin
        // Set source string to parse
        reParse.Source:=PChar(listFile[dwIndex]);
        // Match first
        if reParse.MatchFirst then
        begin
           // Set the word and description
           lpValue.Name:=reParse.NamedBackReference['Part'];
           lpValue.Description:=reParse.NamedBackReference['Data'];
           // Clear other part fields
           for wpIndex:=wpAdj to wpInterj do SetLength(lpValue.Parts[wpIndex], 0);
           // Repeat until no more matches
           while reParse.MatchNext do
           begin
              // Get the part name
              szPart:=Lowercase(reParse.NamedBackReference['Part']);
              // Loop the types
              for wpIndex:=wpAdj to wpInterj do
              begin
                 // Check the word part
                 if (CompareStr(szPart, WORDPART_COMP[wpIndex]) = 0) then
                 begin
                    // Set the data for the part
                    lpValue.Parts[wpIndex]:=reParse.NamedBackReference['Data'];
                    // Done processing part
                    break;
                 end;
              end;
           end;

           //
           // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           for wpIndex:=wpAdj to wpInterj do
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
           if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           // --- end of example ---

        end;
     end;
  finally
     // Free the list
     listFile.Free;
  end;

end;
0
 
LVL 1

Author Comment

by:hidrau
ID: 18053931
Now I understood your code better, thanks very much.

0
 
LVL 26

Expert Comment

by:Russell Libby
ID: 18053959
No problem,

Russell
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Find and Replace Stream with 0s 8 69
How to build JSON File in Delphi 6 3 45
Installshield for Embarcadero EX 10.1 Berlin 4 60
shape, triangle, dbctrlgrid 3 17
A lot of questions regard threads in Delphi.   One of the more specific questions is how to show progress of the thread.   Updating a progressbar from inside a thread is a mistake. A solution to this would be to send a synchronized message to the…
In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question