Solved

routine to import from a txt file

Posted on 2006-11-30
14
232 Views
Last Modified: 2010-04-05
Hello guys,

I need to create a routine to import this list.
I need to get the words after "s.' and before any ".", see the this line

House (English-Portugese) s. casa v. abrigar

SomeTimes I will have lines like this:

House (English-Portugese) s. casa, assembléia v. abrigar, hospedar adj. caseiro  pron. mais do que um conj. onde, alem do mais (uma casa)

So, I will need to separate each option of my line in different variable, I will need to get:

 if there is (uma casa) or any meaning between ( ) different (English-Portugues) I will need to get this too
 When there is ( ) I will have note about the word between the brackets

 ( s. meanings until v.)
 ( v. meanings until adj.)
 (adj. meanings until pron.)
 (pron. meanings until "if doesn't have any . further, so it must to get everything)


any doubt, let me know
thanks
0
Comment
Question by:hidrau
  • 9
  • 5
14 Comments
 
LVL 1

Author Comment

by:hidrau
Comment Utility
If you want you can get a small txt list here

http://www.infosoftlanguages.com.br/arquivos/list.txt
0
 
LVL 26

Expert Comment

by:Russell Libby
Comment Utility

What are you looking to split each line up into (variable wise)?  Individual words? groups of words? eg:

>> House (English-Portugese) s. casa, assembléia v. abrigar, hospedar adj. caseiro  pron. mais do que um conj. onde, alem do mais (uma casa)

House (English-Portugese)
------------------------------
s. = casa, assembléia
v. = abrigar, hospedar
adj. = caseiro
pron. = mais do que um
conj. = onde, alem do mais (uma casa)

And should each option be a list of words, or a single string, eg: "casa, assembléia"

---
Rusell

0
 
LVL 1

Author Comment

by:hidrau
Comment Utility
I want to split the list into Group of words.

V. = verbs
S. = noun
Adj. = adjectives
Conj. = conjunctions

After all group splitted, I will record into a table

House is the head word

when you have a word between brackets, this will be a note of the word.

For each group I will have a field to record.


(English-Portugese) = this need to be off
 
s. = casa, assembléia
v. = abrigar, hospedar
adj. = caseiro
pron. = mais do que um
conj. = onde, alem do mais
note = (uma casa)


You see!
0
 
LVL 26

Expert Comment

by:Russell Libby
Comment Utility

I see *almost*....
What should happen if a a line contains more than one *note*? eg:

Abilene  s. cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

word = Abilene
s.=  cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

Should the notes be left in? should they be removed? If you want them removed and kept for the note, how should they be merged?

eg:
word = Abilene
s.=  cidade no Texas; cidade em Kansas (E.U.A.)
note=(E.U.A.) (E.U.A.) <-- what to do here.

In many cases there are multiple notes, and the notes are NOT the same text.

Russell

0
 
LVL 1

Author Comment

by:hidrau
Comment Utility
I take a look in the list and I noteced that note belongs a specific subject.

 s.=  cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

so, this is the correct, leave the note in the subject, as you showed me.

0
 
LVL 1

Author Comment

by:hidrau
Comment Utility
you don't need to take it to place it in another variable.

you see?
0
 
LVL 26

Expert Comment

by:Russell Libby
Comment Utility
Yeah, I think I got it.

Russell

----

//   Include the RegExprEx unit in the uses clause, the source can be downloaded
//   from my site @:
//
//   http://users.adelphia.net/~rllibby/downloads/regexprex.zip

type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS);

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.'
                       );
type
  TDictWord         =  packed record
     Name:          String;
     Description:   String;
     Parts:         Array [wpAdj..wpS] of String;
  end;

procedure TForm1.Button2Click(Sender: TObject);
var  listFile:      TStringList;
     lpValue:       TDictWord;
     wpIndex:       TWordPart;
     szPart:        String;
     reParse:       TRegExpr;
     dwIndex:       Integer;
begin

  // Create list
  listFile:=TStringList.Create;

  // Resource protection
  try
     // Load from file
     listFile.LoadFromFile('c:\list.txt');  // You need to specify your file to process
     // Create regular expression parser
     reParse:=TRegExpr.Create;
     // Set the parser pattern
     reParse.Pattern:='(:<Part>(adj\.|adv\.|s\.|v\.|pron\.|conj\.|[^\s]*))\s*(:<Data>!(adj\.|adv\.|s\.|v\.|pron\.|conj\.)*)';
     // Parse each line
     for dwIndex:=0 to Pred(listFile.Count) do
     begin
        // Set source string to parse
        reParse.Source:=PChar(listFile[dwIndex]);
        // Match first
        if reParse.MatchFirst then
        begin
           // Set the word and description
           lpValue.Name:=reParse.NamedBackReference['Part'];
           lpValue.Description:=reParse.NamedBackReference['Data'];
           // Clear other part fields
           for wpIndex:=wpAdj to wpS do SetLength(lpValue.Parts[wpIndex], 0);
           // Repeat until no more matches
           while reParse.MatchNext do
           begin
              // Get the part name
              szPart:=Lowercase(reParse.NamedBackReference['Part']);
              // Loop the types
              for wpIndex:=wpAdj to wpS do
              begin
                 // Check the word part
                 if (CompareStr(szPart, WORDPART_VALUES[wpIndex]) = 0) then
                 begin
                    // Set the data for the part
                    lpValue.Parts[wpIndex]:=reParse.NamedBackReference['Data'];
                    // Done processing part
                    break;
                 end;
              end;
           end;

           //
           // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           for wpIndex:=wpAdj to wpS do
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
           if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           // --- end of example ---

        end;
     end;
  finally
     // Free the list
     listFile.Free;
  end;


end;
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 1

Author Comment

by:hidrau
Comment Utility
rllibby , I forgot this grammar class 'interj'

I tried to add in the list

type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS, Wpinterj);

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.','interj.'
                       );

But I am having a problem

Number of elements differs from declaration, why?
0
 
LVL 1

Author Comment

by:hidrau
Comment Utility
Ah, it was very cool your code
thanks for your big help

:))
0
 
LVL 1

Author Comment

by:hidrau
Comment Utility
I tried to change the name of WORDPART_VALUES but I am having an error when I run the code,

:(

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adjetivo', 'adverbio', 'pronome', 'conjuncao', 'verbo', 'substantivo'
                       );


I changed because each name will correspond the name of field in the table: See below



       // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           Word.Append;
           Word.FieldByName('Palavra').AsString := lpValue.Name;
           Word.FieldByName('Obs').AsString     := lpValue.Description;
           for wpIndex:=wpAdj to wpS do
           Begin
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
              Word.FieldByName(WORDPART_VALUES[wpIndex]).AsString := lpValue.Parts[wpIndex];
             // if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           End;
           Word.Post;
        end;
     end;
0
 
LVL 1

Author Comment

by:hidrau
Comment Utility
Please, the lasp post was a mistake mine, ignore the last post.

Thanks
0
 
LVL 26

Accepted Solution

by:
Russell Libby earned 500 total points
Comment Utility

You need to make a couple of changes to add the new item in (also to handle in the regex parsing). Also, define a second array to use for the record filling, because it needs to match the exact value in the string, eg "v." in order to perform the record part assignment. Then use the other array as the descriptive. Full example follows:


type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS, wpInterj);

const
  WORDPART_COMP:    Array [wpAdj..wpInterj] of String =
                       (
                          // These need to be the actual match values (all lowercase)
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.', 'interj.'
                       );
  WORDPART_VALUES:  Array [wpAdj..wpInterj] of String =
                       (
                          // These can be used as descriptives. Sorry, dont know
                          // Portuguese for interjection
                          'adjetivo', 'adverbio', 'pronome', 'conjuncao', 'verbo', 'substantivo', 'interjection'
                       );
type
  TDictWord         =  packed record
     Name:          String;
     Description:   String;
     Parts:         Array [wpAdj..wpInterj] of String;
  end;

procedure TForm1.Button2Click(Sender: TObject);
var  listFile:      TStringList;
     lpValue:       TDictWord;
     wpIndex:       TWordPart;
     szPart:        String;
     reParse:       TRegExpr;
     dwIndex:       Integer;
begin

  // Create list
  listFile:=TStringList.Create;

  // Resource protection
  try
     // Load from file
     listFile.LoadFromFile('c:\list.txt');  // You need to specify your file to process
     // Create regular expression parser
     reParse:=TRegExpr.Create;
     // Set the parser pattern
     reParse.Pattern:='(:<Part>(adj\.|adv\.|s\.|v\.|pron\.|conj\.|interj\.|[^\s]*))\s*(:<Data>!(adj\.|adv\.|s\.|v\.|pron\.|conj\.|interj\.)*)';
     // Parse each line
     for dwIndex:=0 to Pred(listFile.Count) do
     begin
        // Set source string to parse
        reParse.Source:=PChar(listFile[dwIndex]);
        // Match first
        if reParse.MatchFirst then
        begin
           // Set the word and description
           lpValue.Name:=reParse.NamedBackReference['Part'];
           lpValue.Description:=reParse.NamedBackReference['Data'];
           // Clear other part fields
           for wpIndex:=wpAdj to wpInterj do SetLength(lpValue.Parts[wpIndex], 0);
           // Repeat until no more matches
           while reParse.MatchNext do
           begin
              // Get the part name
              szPart:=Lowercase(reParse.NamedBackReference['Part']);
              // Loop the types
              for wpIndex:=wpAdj to wpInterj do
              begin
                 // Check the word part
                 if (CompareStr(szPart, WORDPART_COMP[wpIndex]) = 0) then
                 begin
                    // Set the data for the part
                    lpValue.Parts[wpIndex]:=reParse.NamedBackReference['Data'];
                    // Done processing part
                    break;
                 end;
              end;
           end;

           //
           // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           for wpIndex:=wpAdj to wpInterj do
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
           if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           // --- end of example ---

        end;
     end;
  finally
     // Free the list
     listFile.Free;
  end;

end;
0
 
LVL 1

Author Comment

by:hidrau
Comment Utility
Now I understood your code better, thanks very much.

0
 
LVL 26

Expert Comment

by:Russell Libby
Comment Utility
No problem,

Russell
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Suggested Solutions

Have you ever had your Delphi form/application just hanging while waiting for data to load? This is the article to read if you want to learn some things about adding threads for data loading in the background. First, I'll setup a general applica…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now