Link to home
Start Free TrialLog in
Avatar of Johnjces
JohnjcesFlag for United States of America

asked on

Parse a .conf or similar file with different command line "switches" on each line.

I am a terrible parser! I have great difficulty and after many lines of redundant delete() and copy() and Pos() functions later I sort of have something sort of working but there must be a  cleaner/better simpler way.

I have a .conf or a .cfg file that I need to parse. Each line in this file is a command line. At a minimum the path and name of another file/executable is needed then various switches which can be in any order. EXxample;

C:\MyWork\MyExecutable.exe                                                { Minimum item required }
C:\Delphi\MyThingy.exe -a c:\readme.txt                               { -a switch then the item follows }
C:\Programs\Stuff\Program.exe -s normal -a c:\readme.txt   { Added -s switch which could be before or after the -a switch and item }

Of course other switches could be added as needed.

The first part of the line must be the executable then the switches (-a, -s, -q and etc.)  that tell me what follows and such can be in any order.

Can anyone point me in the right direction, component or code?

Thanks
Avatar of ThievingSix
ThievingSix
Flag of United States of America image

So what do you want to parse it into? You want to know the executable file/path and whether a switch is set?
Avatar of Johnjces

ASKER

ThievingSix

Thanks!

I forgot to add this, but ideally would be three (3) or more as needed, separate stringlists where I could do a for N := 0 to X-1 and get the stuff needed for any/all switches and do my code execution.

Attached is a screen shot of my sample exe showing what I mean.

John
sample.JPG
Let's try this again! Sorry... didn't have the data showing in the image.. I am turning into a blithering idiot today!
sample.JPG
SOLUTION
Avatar of ThievingSix
ThievingSix
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ThievingSix

I was just going to load the file into a stringlist then parse... might be easier. I have just briefly looked it over and will look into it later. Have a server crash and will have to go back to work myself later tonight until tomorrow!

Thanks. I'll let you know.

John
1. I would recommend a grid control instead of trying to align multiple textbox/memo controls.

2. use two TStringList objects.  One to read the config file lines (.LoadFromFile) and the other to parse the lines (after setting the delimiter = ' -')

3. If you know all possible switch values ahead of time, assign these to different grid columns.  Otherwise, you may need two parsing passes to map the switch values to grid columns.

Note: you may choose to use more TStringList objects to contain the unique switch values and to parse the switch data.
Sorry for the delay. I will get back to this and close it out Monday.

My code is at work.

Thanks!

aikimark,

The sample I gave is just to show what the data should look like. In my code, no visual components are used needed as will be in a service. I too agree in loading the data in a stringlist as I mentioned above. Thanks though.

John
Depending on how you need to use this, you have several options.
* treat the switches and their associated data as name/value pairs
* store the name/value pairs as a list of records (for each program)
* store the name/value pairs in hash table object (for each program)
* leave the data in the first TStringList until it is retrieved

Question...can there be multiple switchs of the same type (e.g. multiple -a switches) for a program?  If so, is there some priority or ordering?
"can there be multiple switchs of the same type (e.g. multiple -a switches) for a program?"

Hmmm. Not at this time BUT I haven't thought about that possibility!

Thanks!

John
Here's my shot at this.
This code does not take knowledge of what the switches mean, so there is no need to add support for new switches, it works by itself.
One restriction, the switches can only be one char long (so -a works but -dosomething does not, tell me if this is a problem).

See below the parse code and some test code with explanation, tell me if it isn't clear please :)
procedure TForm2.Button1Click(Sender: TObject);
var Lines, Params: TStringList;
    i, j: integer;
    Executable: string;
    CurSwitch, CurSwitchValue: string;
begin
 Lines := TStringList.Create();
 
 // some test command lines
 Lines.Add('C:\MyWork\MyExecutable.exe');
 Lines.Add('C:\Delphi\MyThingy.exe -a c:\readme.txt');
 Lines.Add('C:\Programs\Stuff\Program.exe -s normal -a c:\readme.txt');
 Lines.Add('C:\Programs\Stuff\Program.exe -s normal -q -a c:\readme.txt');
 
 // dummy to hold params
 Params := TStringList.Create();
 
 for i := 0 to Lines.Count - 1 do
 begin
  // go through all the test lines
  Params.Clear();
  Parse(Lines[i], Executable, Params);
 
  // so now we have the executable in Executable
  // and all params in the Params stringlist, in a name value type of way
  // so you can go:
 
  for j := 0 to Params.Count - 1 do
  begin
   CurSwitch := Params.Names[j];
   CurSwitchValue := Params.ValueFromIndex[j];
 
   // now you have the switch of argument j in CurSwitch, and it's value in CurSwitchValue
   if CurSwitch = '-a' then
    DoSwitchA(CurSwitchValue);
  end;
 end;
 
 Params.Free();
 Lines.Free();
end;
 
procedure TForm2.Parse(Line: string; out Executable: string; Params: TStringList);
const SEPERATOR = ' -';
var p: integer;
    CurParam: string;
begin
 // first find the executable (which is all untill the first ' -');
 p := Pos(SEPERATOR, Line);
 
 if p = 0 then
 begin
  // no arguments found, just return the executable
  Executable := Trim(Line);
  exit;
 end
 else
 begin
  // copy the executable from the line
  Executable := Trim(Copy(Line, 1, p - 1));
  // and remove it from the source
  Delete(Line, 1, p);
 end;
 
 // since we now have the executable, search for more arguments, all seperated by ' -'
 while (Trim(Line) <> '') and (p <> 0) do
 begin
  // search for the next seperator
  p := PosEx(SEPERATOR, Line, Length(SEPERATOR));
 
  // if we didn't find a 2nd seperator, copy all that's left
  if p = 0 then
   p := Length(Line);
 
  CurParam := Trim(Copy(Line, 1, p));
  Delete(Line, 1, p);
 
  // now we have a param in CurParam, split up the switch and the value and put it
  // in the stringlist, this piece of code assumes that the switch is only 1 char long
  Params.Add(Trim(Copy(CurParam, 1, 2)) + '=' + Trim(Copy(CurParam, 3, Length(CurParam))));
  Params.CommaText;
 end;
end;

Open in new window

oh, forgot to mention, there is no problem with multiple switches the same time, simply because the parsing routine doesn't do anything with the actual data (which is the way it should work imho).
sorry, I left one test line, remove the Params.CommaText; at the bottom, it does no harm, but nothing useful either.
@MerijnB

It is an interesting approach (conversion to name=value pairs).

Why have all the manual parsing in the Parse() routine when you can use the parsing capabilities of a TStringList with much less code?

========================
Note to future readers: The Lines.Add statements (10-13) would be replaced by a Lines.LoadFromFile() in production mode.

========================
@Johnjces

Merijn raises another good question about the length of these switches.

I have a new question as well...
* Will the file path\name be quote delimited if they contain special characters, such as spaces or hyphens?

I ask that because I've encountered a (similar) problem with the old Setup program distributed with VB (classic).  The Setup parser is looking for switches in the Setup.LST file lines.
@aikimark

> Why have all the manual parsing in the Parse() routine when you can use the parsing capabilities of a TStringList with much less code?
You probably refer to your earlier post:
> 2. use two TStringList objects.  One to read the config file lines (.LoadFromFile) and the other to parse the lines (after setting the delimiter = ' -')

please keep in mind that the delimiter property of TStringList is a char, not a string.

I don't think you can use a TStringList to parse this much easier.
@Merijn

Thanks.  I'd never tried to use more than one character as a delimiter before, so I'd never run up against this.  I learned something.

While looking at the limitations and quirks of TStringList delimited text parsing, I saw this neat little trick:
SplitTS.text := StringReplace(Line, Delimeter, #13#10, [rfReplaceAll]);
 
Where Delimiter is ' -'
=====================
Although it won't give the name/value pairs, this would seem to quickly parse the line.  
Executable := Trim(SplitTS[0]);
SplitTS[0].Delete;
Params := SplitTS;
 
Each switch/value pair would need to be parsed as already mentioned.

Open in new window

@MerijnB

Do we need to worry about file names containing an equals character (=)?  If so, does the value string need to be quote/apostrophe delimited?

If the switches are more than one character, we should be able to use the POS() function to find the space character separator.  Now it is just a matter of a loop through the switches to convert them into name=value pairs, replacing each item.
@aikimark
> While looking at the limitations and quirks of TStringList delimited text parsing, I saw this neat little trick:
> ...
> Although it won't give the name/value pairs, this would seem to quickly parse the line.

I agree that it would be less complex in code, but probably somewhat slower as well. I think it doesn't really matter, it are only a few lines now, not really much of a difference.

> Do we need to worry about file names containing an equals character (=)?  If so, does the value string need to be quote/apostrophe delimited?
No, = is not allowed in a filename. " -" is though, that might be a problem.

> the switches are more than one character, we should be able to use the POS() function to find the space character separator.
> Now it is just a matter of a loop through the switches to convert them into name=value pairs, replacing each item.
That's why I didn't really make a fuzz about it, since the parser does not know what the values means, it doesn't matter...

Just curious, you respond like you asked the original question, what's your place in this thread?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
> You are wrong about the '=' sign.

yep, you're right (I learn things as well :)). That might be a problem indeed.
If this is really a problem ('=' will be there in actual file names) there should be used something else besides a TStringList to hold the values, not really a big thing I think.
@MerijnB

Here's an interesting idea: use Regex to parse the lines.  I realize Regex routines aren't usually the fastest, but it may prove the simplest and most reliable method.  What do you think?
@aikimark
regex could very well be an option, but what do you want to filter on?
The problem is that you can't really tell what the filename is exactly:

C:\Programs\Stuff\Program.exe -s normal -a c:\readme.txt

The file could be named "Program.exe -s normal" sitting in c:\Programs\Stuff\
It's not a matter of with what tool, but how to recognize. I don't think you can get it more reliable with regex.
Good day! Finally got tot he site and WOW!!

I have a bunch to devour! Thanks to all of you and this great discussion. Admittedly I am poor developer when it comes to parsing strings. Other stuff, better than some not as good as others! I have some mental stumbling block in some areas as parsing is one of them.

I really appreciate the added thoughts on the "stumbling block" as this little project is just a work in progress and a lot of stuff I have yet to foresee or think about.

I will let you all know how I make out later today, I hope and how I go about this.

John
I'd overlooked the "-s normal" switch. :-(  When I suggested using regex, I thought all the switch values would be file names.  My bad.

This is another instance where multiple experts can provide balance to the suggestions.
@Johnjces

Since this is a 'work in progress', maybe we should be helping you structure and format your file rather than concentrating on complex parsing issues.

What are your requirements and constraints?
aikimark

Parsing the conf file is my main focus to do what I need this service to do and form a simple a text file. Albeit it could be placed in the Regsitry, but more difficult to edit.

I am messing with the code now and at work I am using D5, so had some minor probs with StrUtils.pas and its PosEx function got that one and now working through ValueFromIndex[] which is not available to me in D5.

Thanks!

John



I could rewrite it somewhat so it doesn't use TStringList name / value stuff...
Here's a rendering of your sample file in XML.  I used the Microsoft XML Notepad to create it.  This changes the processing requirements towards the TXMLDocument class.

Also, you might render this as a TClientDataset table and persist it in XML or CDS (binary) format.
<Root_Element>
   <Program Name="C:\MyWork\MyExecutable.exe"/>
   <Program Name="C:\Delphi\MyThingy.exe">
      <Switch Name="a" Value="c:\readme.txt"/>
   </Program>
   <Program Name="C:\Programs\Stuff\Program.exe">
      <Switch Name="s" Value="normal"/>
      <Switch Name="a" Value="c:\readme.txt"/>
   </Program>
</Root_Element>

Open in new window

MerijnB,

If you have the time and desire... sure!

I have just started cracking through the ValueFromIndex[] to see if I can mod the D5 code to make something similar.

John
@MerijnB

I think the list of records idea might be worth revisiting for switches, with each record having Name and Value string fields.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@aikimark

could be an option, but looking at my local time, I choose for a quick hack :p
@aikimark

XML is intriguing but too difficult for a simple conf file for an end user. Trying to keep sort of a "Linux" "standard" in this file.

John
@John

This is all a part of the requirements and constraints I mentioned earlier.

1. Will the user edit the raw config file with Notepad or Word (or equivalent) or will you write a GUI applet for this editing?
2. What is the purpose of the config file?
3. How does the end user interact with the config file?
4. What type of validation should be done on this file after the user has interacted with the file?

In most of my applications, the user does NOT get to edit config files/tables directly.  It is too easy to mess up.
@aikimark

1. Will the user edit the raw config file with Notepad or Word (or equivalent) or will you write a GUI applet for this editing?

Initially the user, (my MIS Staff) will edit the conf file directly as the service is for in house use. I may do a control panel applet to edit this conf file down the road as it will depend on how often we may need to change this file. It should pretty much remain in tact "forever".

2. What is the purpose of the config file?

Starting specific applications "as a service" or more precisely from a service at system boot up and with necesary command line parameters needed by those startup applications and whether to run hidden or shownormal or minimized.

3. How does the end user interact with the config file?

See #1 above.

4. What type of validation should be done on this file after the user has interacted with the file?

Ignore and throw a warning in the NT Application logs. All that is taken care of already.

Really just needed to parse a conf or cfg file at this stage.

What a great thread this turned out to be! I have learned a bunch!

John
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Alrighty...

I have yet to try TheRealLoki's solution but so far, as I go through the conf file line by line, I must have in my for-do loop the;

executable and path... done, and

the switch whether -a or -s or -q or whatever for each executable so I know what switch goes with which executable.

Like in my listbox example, way above, I have three executables listed in the Executable listbox and then the -a switches listed for each executable, one has no -a switch and then the last listbox for the -s switches and one there has no -s switch.

So by looking across you will know that the first executable has no -a option but will Show Normal. The next executable will autostart as a command line parameter but we don't care how it shows... no value. The last then is self explanatory.

In MerijnB's code, I get the lists of switches but have no idea for which executable that switch belongs.

John
@John

I think you are undervaluing your MIS staff skills.  At this point, you might consider using Excel as your config file interface.  See the uploaded XLS as an example.  View with Form or in spreadsheet.
Config.xls
@John

Once Windows states have been entered, subsequent lines behave in a type-ahead fashion.  As with normal Excel/Access processing, Ctrl+" copies that column's data from the prior record, which might be a time saver for your folks.

You can save this file as CSV or TXT, preferably quoting the columns.  Then you process the config data as comma-separated positional parameters, rather than named (switched) parameters.  Also, it is possible for your Delphi program to read the Excel data directly as a data source (DAO, maybe ADO) or import the Excel data (SMImport).
@aikimark

It is worth considering and could be easier than a bunch of parsing!

Thx!

John
hi,
my sample code shows the results like your screenshot example, so it should be very easy to see which command has which switch. It also takes care of quoted parameters
e.g.
C:\Programs\Stuff\Program.exe -s normal -a "c:\temp\some big path\readme.txt"
> In MerijnB's code, I get the lists of switches but have no idea for which executable that switch belongs.

Just to clear up. My idea is that you go through all lines line by line.
For each line you call Parse(). This function will return the executable for that line, and all switches and their values in two separate stringlists (so no name / value stuff anymore).
This way you can handle the file line by line, and you know what executable it belongs to.

Just for my understanding, you want to go through this by executable, not by switch; is this correct?
In other words, you don't want to see a list of all executables which have switch -a, but you want to go through all executables to see what switches they use...
@MerijnB,

"... you want to go through this by executable, not by switch; is this correct?"

Yes... as that is the only way I know what switch to apply to which executable and I need to do this in a loop and I can use Parse() specifically line by line as you suggest but also need to separate each switch individual switch for each executable.

TheRealLoki's code does exactly what I need... but for more switches would only have to add a few lines of code.

John
@John

The executable is in the EXECUTABLE variable passed to the Parse() function.  The function sets its value as well as the values for the two TStringList parameters.  Adding more switches should not require any change to MerijnB's code.

The advantage of Loki's code is that your switches can be more than one character.  But you need to tweak Loki's code more.  I would recommend that you read in the lists from an external file, rather than changing the code.  That way, your executable doesn't need to be recompiled. (as often)

Have your MIS staff looked at the Excel spreadsheet?  What did they think?

@aikimark

"Have your MIS staff looked at the Excel spreadsheet?"
Yes they looked using an XLS to enter the stuff needed.

"What did they think?"
They thought a text file, (notepad) would be quicker should a tweak be needed or a type fixed since most of the PCs this service would be run on do not have any Office products.

"But you need to tweak Loki's code more".
Yep... I know his code is very advantageous. It just needs a couple tweaks so I can easily get the info I need, i.e. the individual switches each in their own string. No big deal though.

JOhn
@John

I don't see where you can't just as easily get the individual switches with their own string with MerijnB's code.  It's all there for every line.
@John

I'm a bit confused.  I thought these config files were the same for all PCs in a group.  Why would an individual PC need a different config file than everyone else in the group?  This seems like a recipe for chaos.
I have to say, configuration in Excel sounds like a no-go to me:

http://thedailywtf.com/Articles/Waiting-to-Excel-.aspx
@MerijnB

I could certainly have used a Word table.  The basic idea was to provide grid/table access to the data, rather than keyword/markup editing.
@aikimark,

Yes, It is all there for every line of code. Just got to do some tweaks so I know, in my own mind (as I said I am terrible with parsing and string stuff). I do have it so no biggies, as I mentioned.

And.. "...these config files were the same for all PCs in a group". Not certain how I gave that perception, but, we have some unique "stuff" where I work that do all kinds of things. Each, when start up and if not logged in to, have varying tasks that need to get done or run no matter what.

There have been times after an upgrade, or routine reboot, a full system backup, etc, when a system is restarted and after we get side tracked and forget to log in and start some of the various interfaces to other PCs that take care of other "stuff".

As an example is one credit card auth system that interfaces with the Restauarant and POS systems and another that interface with that AND the hotel property management system etc. It really ticks off my bosses when we (or I) forget to log in and start up needed applications as many are not services. So, a lot of varying degrees of stuff. I mean, it generally doesn't happen that ofter when we forget and it is always due to some other "urgent" matter, like  telephone call and then starting on project 'G', but it happens!

In any event, it is time to close this thread before it makes it into the EE history books for the longest one!

First, each one of you have helped me to learn! I am always amazed at how some of you know all that you know! Just when I think I know something pretty well, I realize that I was pretty stupid! When I use 100 lines of code to do something like parse a file, someone else knows how to do it in 10!

So, Each answer is correct! MerijnB's answer is top, seconded by aikimark's. I shall spread points. I do hope that is satisfactory.

In any event, I certainly might become a better "parser" as that seems to be an often asked for piece of advice on a lot of forums.

So, Thanks! You guys have been and are awesome!

JOhn
@John

You're welcome.  Glad we could help.

This discussion thread isn't anywhere close to the largest on EE, whether measured by comment count or by length.

==================
If we'd gotten into the design discussion in depth, I would have suggested that you change your switch delimiters to characters that can not appear in a path/file name.  Maybe that could constitute a more general discussion in a future question that is code independent.
aikimark,

"... switch delimiters to characters that can not appear in a path/file name". That would still be a very good idea.

Thanks!

John
These guys were just great! It is hard to divy up 500 points as each had good ideas, suggestions and code! I have learned a bunch!
Thanks to all of you.