Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

routine to import from a txt file

Posted on 2006-11-30
14
Medium Priority
?
241 Views
Last Modified: 2010-04-05
Hello guys,

I need to create a routine to import this list.
I need to get the words after "s.' and before any ".", see the this line

House (English-Portugese) s. casa v. abrigar

SomeTimes I will have lines like this:

House (English-Portugese) s. casa, assembléia v. abrigar, hospedar adj. caseiro  pron. mais do que um conj. onde, alem do mais (uma casa)

So, I will need to separate each option of my line in different variable, I will need to get:

 if there is (uma casa) or any meaning between ( ) different (English-Portugues) I will need to get this too
 When there is ( ) I will have note about the word between the brackets

 ( s. meanings until v.)
 ( v. meanings until adj.)
 (adj. meanings until pron.)
 (pron. meanings until "if doesn't have any . further, so it must to get everything)


any doubt, let me know
thanks
0
Comment
Question by:hidrau
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 5
14 Comments
 
LVL 1

Author Comment

by:hidrau
ID: 18046676
If you want you can get a small txt list here

http://www.infosoftlanguages.com.br/arquivos/list.txt
0
 
LVL 26

Expert Comment

by:Russell Libby
ID: 18047520

What are you looking to split each line up into (variable wise)?  Individual words? groups of words? eg:

>> House (English-Portugese) s. casa, assembléia v. abrigar, hospedar adj. caseiro  pron. mais do que um conj. onde, alem do mais (uma casa)

House (English-Portugese)
------------------------------
s. = casa, assembléia
v. = abrigar, hospedar
adj. = caseiro
pron. = mais do que um
conj. = onde, alem do mais (uma casa)

And should each option be a list of words, or a single string, eg: "casa, assembléia"

---
Rusell

0
 
LVL 1

Author Comment

by:hidrau
ID: 18047871
I want to split the list into Group of words.

V. = verbs
S. = noun
Adj. = adjectives
Conj. = conjunctions

After all group splitted, I will record into a table

House is the head word

when you have a word between brackets, this will be a note of the word.

For each group I will have a field to record.


(English-Portugese) = this need to be off
 
s. = casa, assembléia
v. = abrigar, hospedar
adj. = caseiro
pron. = mais do que um
conj. = onde, alem do mais
note = (uma casa)


You see!
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 26

Expert Comment

by:Russell Libby
ID: 18048328

I see *almost*....
What should happen if a a line contains more than one *note*? eg:

Abilene  s. cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

word = Abilene
s.=  cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

Should the notes be left in? should they be removed? If you want them removed and kept for the note, how should they be merged?

eg:
word = Abilene
s.=  cidade no Texas; cidade em Kansas (E.U.A.)
note=(E.U.A.) (E.U.A.) <-- what to do here.

In many cases there are multiple notes, and the notes are NOT the same text.

Russell

0
 
LVL 1

Author Comment

by:hidrau
ID: 18048368
I take a look in the list and I noteced that note belongs a specific subject.

 s.=  cidade no Texas (E.U.A.); cidade em Kansas (E.U.A.)

so, this is the correct, leave the note in the subject, as you showed me.

0
 
LVL 1

Author Comment

by:hidrau
ID: 18048377
you don't need to take it to place it in another variable.

you see?
0
 
LVL 26

Expert Comment

by:Russell Libby
ID: 18048737
Yeah, I think I got it.

Russell

----

//   Include the RegExprEx unit in the uses clause, the source can be downloaded
//   from my site @:
//
//   http://users.adelphia.net/~rllibby/downloads/regexprex.zip

type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS);

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.'
                       );
type
  TDictWord         =  packed record
     Name:          String;
     Description:   String;
     Parts:         Array [wpAdj..wpS] of String;
  end;

procedure TForm1.Button2Click(Sender: TObject);
var  listFile:      TStringList;
     lpValue:       TDictWord;
     wpIndex:       TWordPart;
     szPart:        String;
     reParse:       TRegExpr;
     dwIndex:       Integer;
begin

  // Create list
  listFile:=TStringList.Create;

  // Resource protection
  try
     // Load from file
     listFile.LoadFromFile('c:\list.txt');  // You need to specify your file to process
     // Create regular expression parser
     reParse:=TRegExpr.Create;
     // Set the parser pattern
     reParse.Pattern:='(:<Part>(adj\.|adv\.|s\.|v\.|pron\.|conj\.|[^\s]*))\s*(:<Data>!(adj\.|adv\.|s\.|v\.|pron\.|conj\.)*)';
     // Parse each line
     for dwIndex:=0 to Pred(listFile.Count) do
     begin
        // Set source string to parse
        reParse.Source:=PChar(listFile[dwIndex]);
        // Match first
        if reParse.MatchFirst then
        begin
           // Set the word and description
           lpValue.Name:=reParse.NamedBackReference['Part'];
           lpValue.Description:=reParse.NamedBackReference['Data'];
           // Clear other part fields
           for wpIndex:=wpAdj to wpS do SetLength(lpValue.Parts[wpIndex], 0);
           // Repeat until no more matches
           while reParse.MatchNext do
           begin
              // Get the part name
              szPart:=Lowercase(reParse.NamedBackReference['Part']);
              // Loop the types
              for wpIndex:=wpAdj to wpS do
              begin
                 // Check the word part
                 if (CompareStr(szPart, WORDPART_VALUES[wpIndex]) = 0) then
                 begin
                    // Set the data for the part
                    lpValue.Parts[wpIndex]:=reParse.NamedBackReference['Data'];
                    // Done processing part
                    break;
                 end;
              end;
           end;

           //
           // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           for wpIndex:=wpAdj to wpS do
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
           if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           // --- end of example ---

        end;
     end;
  finally
     // Free the list
     listFile.Free;
  end;


end;
0
 
LVL 1

Author Comment

by:hidrau
ID: 18052627
rllibby , I forgot this grammar class 'interj'

I tried to add in the list

type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS, Wpinterj);

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.','interj.'
                       );

But I am having a problem

Number of elements differs from declaration, why?
0
 
LVL 1

Author Comment

by:hidrau
ID: 18052629
Ah, it was very cool your code
thanks for your big help

:))
0
 
LVL 1

Author Comment

by:hidrau
ID: 18052765
I tried to change the name of WORDPART_VALUES but I am having an error when I run the code,

:(

const
  WORDPART_VALUES:  Array [wpAdj..wpS] of String =
                       (
                          'adjetivo', 'adverbio', 'pronome', 'conjuncao', 'verbo', 'substantivo'
                       );


I changed because each name will correspond the name of field in the table: See below



       // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           Word.Append;
           Word.FieldByName('Palavra').AsString := lpValue.Name;
           Word.FieldByName('Obs').AsString     := lpValue.Description;
           for wpIndex:=wpAdj to wpS do
           Begin
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
              Word.FieldByName(WORDPART_VALUES[wpIndex]).AsString := lpValue.Parts[wpIndex];
             // if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           End;
           Word.Post;
        end;
     end;
0
 
LVL 1

Author Comment

by:hidrau
ID: 18053311
Please, the lasp post was a mistake mine, ignore the last post.

Thanks
0
 
LVL 26

Accepted Solution

by:
Russell Libby earned 2000 total points
ID: 18053813

You need to make a couple of changes to add the new item in (also to handle in the regex parsing). Also, define a second array to use for the record filling, because it needs to match the exact value in the string, eg "v." in order to perform the record part assignment. Then use the other array as the descriptive. Full example follows:


type
  TWordPart         =  (wpAdj, wpAdv, wpPron, wpConj, wpV, wpS, wpInterj);

const
  WORDPART_COMP:    Array [wpAdj..wpInterj] of String =
                       (
                          // These need to be the actual match values (all lowercase)
                          'adj.', 'adv.', 'pron.', 'conj.', 'v.', 's.', 'interj.'
                       );
  WORDPART_VALUES:  Array [wpAdj..wpInterj] of String =
                       (
                          // These can be used as descriptives. Sorry, dont know
                          // Portuguese for interjection
                          'adjetivo', 'adverbio', 'pronome', 'conjuncao', 'verbo', 'substantivo', 'interjection'
                       );
type
  TDictWord         =  packed record
     Name:          String;
     Description:   String;
     Parts:         Array [wpAdj..wpInterj] of String;
  end;

procedure TForm1.Button2Click(Sender: TObject);
var  listFile:      TStringList;
     lpValue:       TDictWord;
     wpIndex:       TWordPart;
     szPart:        String;
     reParse:       TRegExpr;
     dwIndex:       Integer;
begin

  // Create list
  listFile:=TStringList.Create;

  // Resource protection
  try
     // Load from file
     listFile.LoadFromFile('c:\list.txt');  // You need to specify your file to process
     // Create regular expression parser
     reParse:=TRegExpr.Create;
     // Set the parser pattern
     reParse.Pattern:='(:<Part>(adj\.|adv\.|s\.|v\.|pron\.|conj\.|interj\.|[^\s]*))\s*(:<Data>!(adj\.|adv\.|s\.|v\.|pron\.|conj\.|interj\.)*)';
     // Parse each line
     for dwIndex:=0 to Pred(listFile.Count) do
     begin
        // Set source string to parse
        reParse.Source:=PChar(listFile[dwIndex]);
        // Match first
        if reParse.MatchFirst then
        begin
           // Set the word and description
           lpValue.Name:=reParse.NamedBackReference['Part'];
           lpValue.Description:=reParse.NamedBackReference['Data'];
           // Clear other part fields
           for wpIndex:=wpAdj to wpInterj do SetLength(lpValue.Parts[wpIndex], 0);
           // Repeat until no more matches
           while reParse.MatchNext do
           begin
              // Get the part name
              szPart:=Lowercase(reParse.NamedBackReference['Part']);
              // Loop the types
              for wpIndex:=wpAdj to wpInterj do
              begin
                 // Check the word part
                 if (CompareStr(szPart, WORDPART_COMP[wpIndex]) = 0) then
                 begin
                    // Set the data for the part
                    lpValue.Parts[wpIndex]:=reParse.NamedBackReference['Data'];
                    // Done processing part
                    break;
                 end;
              end;
           end;

           //
           // Do something here with the parsed dictionary word, eg:
           //
           // --- start of example ---
           szPart:=Format('Name = %s'#13#10'Description = %s'#13#10#13#10, [lpValue.Name, lpValue.Description]);
           for wpIndex:=wpAdj to wpInterj do
              szPart:=szPart + Format('%s = %s'#13#10, [WORDPART_VALUES[wpIndex], lpValue.Parts[wpIndex]]);
           if (MessageBox(Application.Handle, PChar(szPart), 'Output', MB_OKCANCEL or MB_ICONINFORMATION) = IDCANCEL) then break;
           // --- end of example ---

        end;
     end;
  finally
     // Free the list
     listFile.Free;
  end;

end;
0
 
LVL 1

Author Comment

by:hidrau
ID: 18053931
Now I understood your code better, thanks very much.

0
 
LVL 26

Expert Comment

by:Russell Libby
ID: 18053959
No problem,

Russell
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this tutorial I will show you how to use the Windows Speech API in Delphi. I will only cover basic functions such as text to speech and controlling the speed of the speech. SAPI Installation First you need to install the SAPI type library, th…
Have you ever had your Delphi form/application just hanging while waiting for data to load? This is the article to read if you want to learn some things about adding threads for data loading in the background. First, I'll setup a general applica…
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…
Want to learn how to record your desktop screen without having to use an outside camera. Click on this video and learn how to use the cool google extension called "Screencastify"! Step 1: Open a new google tab Step 2: Go to the left hand upper corn…
Suggested Courses

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question