Link to home
Start Free TrialLog in
Avatar of mathes
mathes

asked on

manipulation of text files

Hi experts,

I have a problem concerning manipulation of ASCII text files.

With the help of an entry form, I collect cities:

collection of cities
--------------------

Europe:
Europe:
Europe:

North america:
North america:
North america:

South america:
South america:
South america:


After finishing the input, I save the filled in data to a text file.
Now there occur some difficulties, if I want to do it in the other
direction: I read an earlier saved file and try to write the content
to the appropriate lines of the entry form.

The shape of the text file could be like this:

Europe Rome Paris
Europe Bucarest Prague
Europe Lissabon Frankfurt
North america San Franzisko New York
North america Los Angeles
North america Toronto Calgary
South america Caracas Lima La Paz Quito
South america Buenos Aires Rio de Janeiro
South america Sao Paolo


In this case, the task would be rather easy:
The user has filled in every line, so you can work with an index:

TEdit 1-3: Europe;
TEdit 4-6: North America
TEdit 7-9: South America

However, the user has the option to leave one or more lines blank, and this makes
things difficult.
(For several reasons, I dont want empty lines in my text file. so I save only
those Edits which contain relevant data.)
So -  after saving the input data - the shape of the text file could
be like this, too:

Europe Rome Paris South america Buenos Aires Rio de Janiero South america Sao Paolo

(Here all datas are written to one single line.)

How can I determine now in this case where each continent starts and ends?
How can I write the data to the correct TEdit?  Note: It does not matter much,
if I write an European city to TEdit1 or TEdit2 or TEdit3, although it is
nice and logic to start with filling TEdit1. Note that a maximal lenghth of
100 characters must not be exceeded in a single TEdit.

Of course it is forbidden to split one city into 2 lines in the entry form:

I don't allow: TEdit7: Sao
               TEdit8: Paolo

and I don't allow TEdit7: Sao Pao
                  TEdit8: lo
either.

Any suggestions for a solution?

With kind regards

Christian

 
Avatar of nrobin
nrobin

Christian,

Is it possible to delimit the values in text file?  If the values can look like this:

"Europe" Rome, Paris "South america" Buenos Aires, Rio de Janiero "South america" Sao Paolo

Then it is easy to load this in, parsing for "" and taking the contents as the continent, and then reading the text between the commas as each separate city.

Just an idea...there are quite a few ways of doing this.  You could even go as far as containing the complete continent and its cities within delimiters, so you just read between them and you have a string which needs further parsing to extract each city.

Regards, Nicholas.
why don't you use a special char, perhaps # or ; at the
end of the towns in a specific continent.

rene
Hi,
I want to show you an other direction which is very easy. I hope you know the concept of ini file or registry. In those files there is such a layout

[sectionx]
key0=valuex
key1=valuey
key2=valuez
[sectiony]
key0=any value
key0=any value
.
So for your program I propose to share a section to each Continent like this:

[Europe]
city1=Rome
city2=Paris
city3=Bucarest
city4=Prague
city5=Lissabon
city6=Frankfurt
[North america]
city1=San Franzisko
city2=New York
city3=Los Angeles
city4=Toronto
city5=Calgary
[South america]
city1=Caracas
city2=Lima
city3=La Paz
city4=Quito
city5=Buenos Aires
city6=Rio de Janeiro
city7=Sao Paolo

kind of file What do you think? The construction of such a file is very easy such as

var
  T : TIniFile;
begin
  T := TIniFile.Create('mydata.ini');
  try
    //to write something
    T.WriteString('Europe','city1','Rome');
    T.WriteString('Europe','city2','Paris');
    //to read something
    // the following returns ROME or if city1 key does not exists returns ''
    ShowMessage(T.ReadString('Europe','city1',''));
  finally
    T.Free;
  end;
end;
may this eases your work, Igor
ASKER CERTIFIED SOLUTION
Avatar of Matvey
Matvey

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Damn, terribly sorry Igor - you are alwais one step ahead! :)

Reject...
Isn't it strange that we always have EXACTLY THE SAME IDEAS?
Avatar of mathes

ASKER

Hi experts.

thank you for your comments so far. You taught me some very useful concepts.
I want to give you some further details about my problem: The ASCII text file which contains
the data is intended to be sent to an program as an command line parameter.
This executable parses and reads the textfile. But this program is not able to understand "artifical delimiters"
like #,@,$ or TIni labels like [label] or whichever character you may use. It will complain about syntax errors. when it reads unallowed
characters like #,$....
Moreover the user might be confused, if he loads the textfile into an editor and finds my artificial delimiters.
So I am not too comfortable with the idea to add "artifical delimiters".
BUT: You can rely on the fact that my textfile already contains some delimiters/labels which you could use for parsing..
Words like "Europe" "North america" are keywords for the executable. These keywords are part of the syntax which
can be understood by the executable.
So: Is it possible to leave the textfile as it is and to use the "natural" keywords (Europe, North America...) as
delimiters.
The only problem that I see lies in the fact that a keyword has 2 functions: It marks the end of the data belonging to the previous keyword and at the same time it marks the start of the data belonging to this keyword.  In my real application there are only 12 of these keywords, so there are not too many tokens for which you have to parse.
Can you please give me a concrete example how you would parse the file under these special circumstances? How can I filter the text between the 12 keywords?
In the entry form I implemented 3 lines for each keyword. How can I distribute the filtered text on the 3 lines, without exceeding the limit of 100 characters for each TEdit and without breaking a word into 2 pieces?

With kind regards

Christian



I don't get it...

>The ASCII text file which contains the data is intended to be sent to an program as an command line parameter

-Who wrote that executable? If it's you, then you better change it, if it was someone else, then propably it has a defined structure of how to parse the file.

>this program is not able to understand "artifical delimiters" like #,@,$ or TIni labels like [label] or whichever character you may use.

-So how do you want to determine names with more than one word? you could write them in quotes, like "Buenos Aires", but will the program read it?

Here is a simple example how to parse a text file using TParser (classes unit):

procedure ParseFile(const AFileName: string);
var FS: TFileStream; AP: TParser;
begin
  FS:=TFileStream.Create(AFileName, fmOpenRead);
  AP:=TParser.Create(FS);
  with AP do
    repeat
      case Token of
      #1 : {Token is a pascal identifier}
      {retreive with TokenString}
      #2 : {Token is a pascal string ie. 'blah' or #<number>}
      {retreive with TokenString}
      #3 : {Token is a pascal integer in decimal or hex}
      {Retreive with TokenInt}
      #4 : {Token is a pascal floating point}
      {retreive with TokenFloat}
      else
      {Token is a single character token, such as a comma}
      {retreive with Token or TokenString}
    until NextToken = #0; {#0 mean End Of File}
  AP.Free;
  FS.Free;
end;

So this should be enough for the simplest text parsing.

>Moreover the user might be confused, if he loads the textfile into an editor and finds my artificial delimiters. So I am not too comfortable with the idea to add "artifical delimiters".

-Why does he have to load it in an other editor??? The user can't view the cars from Need For Speed in Paintbrush, so he propably won't be so disapointed if he can't understand this file.
Moreover: you can make it to be not a file at all. You could write it in the registry instead of a file. It's as simple as an ini file (if you already got scared of the idea...).

Please explain again...
Avatar of mathes

ASKER

Hi experts,

I want to provide you with further details:

> You can easily read this info using TIniFile.
> Is it possible to delimit the values in text file?
> why don't you use a special char, perhaps # or ; at the  end of the towns in a specific >continent.  

All your suggestions have one big problem: You assume that all input  files for the external program are created with the help of my entry form program. But as my program is still under construction, this is obviously not the case. In these days the users normally create the input files with the help of a simple ASCII text editor. So if you try tp parse such a file, you will never see any delimters like @,#,§ in these user created files. The only type of delimiters you will find are a few keywords like Europe, North America,Australia, Africa. So I need a method how I can parse a file without my own delimiters.

>Who wrote that executable? If it's you, then you better change it, if it was someone else, then  > propably it has a defined structure of how to parse the file.  

It was someone else.  The executable indeed has a defined structure how to parse ASCII files.

> -So how do you want to determine names with more than one word?

This is not neccessary in my specific case. Sorry, but my example with the cities was not a good one. I think It will be sufficient, if the parser can distinguish between < predefined keyword> and <user input>

> you could write them in quotes, like "Buenos Aires", but will the program read it?

No, it won't. Quotes don't belong to the syntax which the external executable can understand.

> So I am not too comfortable with the idea to add "artifical delimiters".   -Why does he have to > load it in an other editor???

In my program, you will later find a RichEdit control which allows the user to take a look at the input files. It is intended a service to the user. And I need this FileViewer so that the user can view the outputfiles of the external program. The external program is basically some kind of  a mathematical  program which reads input from a text file and does some calculations with this input data.
After this, the result of the calculation is written to a text file by the external program. The purpose of my program under construction is to provide the user with a graphical user interface for the mathematical software. My user interface will make the handling of the mathematical program much easier.
As I write a user interface for the math program and don't search a program for my interface, I will have to orientate to the rules of the external program. It makes  no sense to create input files according to my own rules, which can't be processed by the mathematical software.

To make a long story short: How would you parse the input files of the mathematical software ?

With kind regards

Christian


>How would you parse the input files of the mathematical software ?

OK, considering that names don't have spaces, you could easily use TParser:


procedure ParseFile(const AFileName: string);
var FS: TFileStream; AP: TParser;

    slEurope, slSAmerica, slNAmerica, slCur: TStrings;
begin
  FS:=TFileStream.Create(AFileName, fmOpenRead);
  AP:=TParser.Create(FS);
  slEurope := TStringList.create;
  slSAmerica := TStringList.create;
  slNAmerica := TStringList.create;
  with AP do
    while NextToken <> #0 do begin
        if tokenstring = 'Europe' then
          slCur := slEurope;
        else if tokenstring = 'South' then
          slCur := slSAmerica;
        else if tokenstring = 'North' then
          slCur := slNAmerica;
        else if not (tokenstring = 'America') then
          slCur.Add(TokenString);
    end;
  end;
  AP.Free;
  FS.Free;
  slEurope.Free;
  slSAmerica.Free;
  slNAmerica.Free;
end;

I didn't try it. See if this could solve your specific case...
--Matvey
Avatar of mathes

ASKER


Hi experts,

> OK, considering that names don't have spaces, you could easily > use TParser:

Meanwhile I carefully studied some valid input files. I found out that in some situations names have spaces. So sometimes a token may even consist of 5 elements, seperated by blanks. How would you parse such an input file?

With kind regards

Christian
 
OK, if you have strings that contain more then one word, you have to use some method to separate them from other strings.

For example: you want to run a curtain application with several files as parameters. Win95 allows to use files with names that contain spaces. So if you pass one of these files to the application, how will it know that it's one file and not two?

app.exe c:\temporary dir\my file.txt

-It will just consider the three sections as different files. So you have to let it know like this:

app.exe "c:\temporary dir\my file.txt" "c:\file 2.txt"

-There is propably no other way to solve this problem.
Your case has the same problem, so you have to find a way to delimit the names of the cities. One way was suggested to use quotes, and another sugestion was to use the INI file structure.

Europe Rome Paris South america "Buenos Aires" "Rio de Janiero" South america "San Paolo"

-Names with just one word don't need the quotes.
Or the ini structure that Igor was the first to sugest.
Obviously, you can invent thouthands of new ways to delimit the names. You don't have to use the methods sugested, you can, for example, write them separated by '$', or write them on different lines, or use the '_' instead of spaces in the names, or any other mad solution, but they all will be because of just one reason - without separating the names somehow, the program won't know which words belong to which name.
There is also the REALLY mad solution: to make a dictionary of all the possible cities, and then parse the words in the file according to this dictionary. This way makes sense only in cases when you give the user some list to select from, and not just write them himself.
But... what can I say, it's alwais good to search for new ways, new ideas. I hope you'll have success with them, and tell us what you found :-)

cheers, Matvey
Avatar of mathes

ASKER

Hi experts,

Due to technical problems I was offline for a few days. So I recieved this comment a few ago minutes today. I will study it during this weekend.

With kind regards

Christian

Avatar of mathes

ASKER

Hi Matvey,

meanwhile I tried to implement your parser routine in a
sample program.
Unfortunately Delphi complains about incompatible data
types:

result_europe:=slEurope;

 Can you please tell me what I am doing wrong here?
I am sure that I can the parsing result to other strings;
Unfortunal slEurope is not a global variable, so I need an extra string in the main program.

My source code is:

unit parsertest;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  StdCtrls;

type
  TForm1 = class(TForm)
    Button1: TButton;
    Edit1: TEdit;
    Edit2: TEdit;
    Edit3: TEdit;
    Europa: TLabel;
    Label2: TLabel;
    Label3: TLabel;
    procedure Button1Click(Sender: TObject);
  private
    { Private-Deklarationen }
  public
    { Public-Deklarationen }
  end;

var
  Form1: TForm1;
  result_europe: string;
  result_north_america: string;
  result_south_america: string;

implementation

{$R *.DFM}

procedure ParseFile(const AFileName: string);

var
FS: TFileStream; AP: TParser;
slEurope, slSAmerica, slNAmerica, slCur: TStrings;

begin
FS:=TFileStream.Create(AFileName, fmOpenRead);
AP:=TParser.Create(FS);
slEurope := TStringList.create;
slSAmerica := TStringList.create;
slNAmerica := TStringList.create;
with AP do
while NextToken <> #0 do begin
if tokenstring = 'Europe' then
slCur := slEurope
else if tokenstring = 'South' then
slCur := slSAmerica
else if tokenstring = 'North' then
slCur := slNAmerica
else if not (tokenstring = 'America') then
slCur.Add(TokenString);
end;
AP.Free;
FS.Free;
result_europe:=slEurope;
result_north_america:=slNAmerica;
result_south_america:=slSAmerica;
slEurope.Free;
slSAmerica.Free;
slNAmerica.Free;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
 ParseFile('test.txt');
 edit1.text:=result_europe;
 edit2.text:=result_north_america;
 edit3.text:= result_south_america;
end;

end.

The testfile test.txt looks as follows:

Europe Rome Paris
Europe Bucarest Prague
Europe Lissabon Frankfurt
North America San Franzisko New York
North America Los Angeles
North America Toronto Calgary
South America Caracas Lima La Paz Quito
South America Buenos Aires Rio de Janeiro
South America Sao Paolo

With kind regards

Christian
Now you really messed it all up! :)
How it is supose to know that there is no such city as "Aires Rio"?
BUT, assuming that the format is a bit different, I rewrote the code so it will sort the cities into three listboxes. Put a button on a form, put three listboxes, and copy the following code, ASSUMING THAT 'test.txt' LOOKS LIKE THIS:

________________________________________________________________
Europe Rome Paris Bucarest Prague Lissabon Frankfurt
North America 'San Franzisko' 'New York' 'Los Angeles' Toronto Calgary
South America Caracas Lima 'La Paz' Quito 'Buenos Aires' 'Rio de Janeiro' 'Sao Paolo'
________________________________________________________________

unit Unit1;


interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  StdCtrls;

type
  TForm1 = class(TForm)
    Button1: TButton;
    ListBox1: TListBox;
    ListBox2: TListBox;
    ListBox3: TListBox;
    procedure Button1Click(Sender: TObject);
  private
    result_europe: TStringList;
    result_north_america: TStringList;
    result_south_america: TStringList;
  end;

var
  Form1: TForm1;

implementation

{$R *.DFM}

procedure ParseFile(const AFileName: string;
                    slEurope, slSAmerica, slNAmerica: TStrings);
var
  FS: TFileStream;
  AP: TParser;
  slCur: TStrings;
  l: Integer;
begin
  FS:=TFileStream.Create(AFileName, fmOpenRead);
  AP:=TParser.Create(FS);

  slEurope.Clear;
  slSAmerica.Clear;
  slNAmerica.Clear;

  try
    with AP do
      While Token <> #0 do begin
        if tokenstring = 'Europe' then
          slCur := slEurope
        else if tokenstring = 'South' then
          slCur := slSAmerica
        else if tokenstring = 'North' then
          slCur := slNAmerica
        else if not (tokenstring = 'America') then begin
          l := SourceLine;
          While l = SourceLine do begin
            slCur.Add(TokenString);
            NextToken;
          end;
          Continue;
        end;
        NextToken;
      end;
  finally
    AP.Free;
    FS.Free;
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
  result_europe := TStringList.create;
  result_north_america := TStringList.create;
  result_south_america := TStringList.create;

  try
    ParseFile('test.txt', result_europe, result_north_america, result_south_america);

    ListBox1.Items.Assign(result_europe);
    ListBox2.Items.Assign(result_north_america);
    ListBox3.Items.Assign(result_south_america);
  finally
  result_europe.Free;
  result_north_america.Free;
  result_south_america.Free;
  end;
end;

end.
________________________________________________________________

If you click the button, you'll see the cities sorted in the listboxes.
--Matvey
Avatar of mathes

ASKER


Dear Matvey,

meanwhile I changed the shape of the inputfile:

Europe Rome Paris
Europe Bucarest Prague
Europe Lissabon Frankfurt
North America Winnipeg Edmonton
North America Houston
North America Toronto Calgary
South America Caracas Lima Quito

Now this should cooperate with the routine which
you sent me a few days ago.
However Delphi complains about string and TStrings.
Can you please show me, how I can fix this problem in my source code ?

With kind regards

Christian


I don't get it. What logic is there in this structure??? You said you have 3 edit boxes for europe, NA and SA cities. Here you have 6 cities in Europe, 5 in NA and 3 in SA, AND IN ADDIDION you have some very strange structure of them on different lines with... anyway who came up with the idea of this, stupid on the first sight, structure???
You better explain what this is for, so we have more understanding here.

The funniest thing is, that the example I brought above will work with this structure too! I didn't mean it, but by accident it does. I see you haven't tested it, so please do. It will even work with the pre-previous structure, assuming that more-than-one-word-names are enclosed in single quotes.

TStrings object and the String type are different things. To be exact: the first one is a collection of the seccond ones. TStrings is used in momos to store lines, in listboxes to store items etc. If you have a TStrings/TStringList object, you access it's items by

  MyStrings[ Index ]: string

What ParseFile actually does, is putting all the city names from Europe, NA, and SA into three different TStringList objects. After the procedure runs, the three lists contain names you can access in the form I mentioned above (look in help for TStringList). Then you can do whatever you want with them - it's just a good way of storing and accesing them. If you have anything differnt in mind - whatever, it doesn't metter much.

So as I said, the above example works fine now. Foollow my previous comment to get it up and running quickly...
--Matvey
Avatar of mathes

ASKER

Hi experts,

I want to thank you all for your help. Your comments helped me to solve my problem and you showed me some very interesting concepts.

With kind regards

Christian