Link to home
Start Free TrialLog in
Avatar of trican
trican

asked on

Python regular expression

Hi all,

I have data like the following which I wish to process using a regular expression:

KEYWORD1 VARIABLE_1;
KEYWORD1 VARIABLE_2;
  KEYWORD2 VARIABLE_3;
 KEYWORD2 VARIABLE_4;

So i use the following:

temp1 = re.compile('\s*KEYWORD1\s+(\S+)')
temp2 = re.compile('\s*KEYWORD2\s+(\S+)')

Using this its straight forward to extract and store the VARIABLE_1 to VARIABLE_4, however I sometimes have the following structure

KEYWORD1 [1_OR_MORE_NUMBERS:0] VARIABLE_5;
KEYWORD1 [1_OR_MORE_NUMBERS:0] VARIABLE_6;

I can't seem to devise a regular expression that nicely extracts and stores all the variables - any thoughts?

Avatar of ozo
ozo
Flag of United States of America image

do you want to skip over the  [1_OR_MORE_NUMBERS:0] or do you want to extract it together with the VARIABLE_5?
Avatar of trican
trican

ASKER

Hi ozo,

I want to skip over [1_OR_MORE_NUMBERS:0]
by 1 or more numbers, do you mean 1 or more digits?



temp1 = re.compile('\s*KEYWORD1\s+(\d+:0)?(\S+)')

Open in new window

Avatar of trican

ASKER

yes one or more digits
to skip over things in [] you might use
'\s*KEYWORD2(?:\[.*?\]|\s)+(\S+)'
to get the last \S+ before the ; you might use
'\s*KEYWORD2\s.*?(\S+);'
Are the square brackets actually there?
temp1 = re.compile('\s*KEYWORD1\s+(?\[\d+:0])?(\S+)')

Open in new window

Avatar of trican

ASKER

yes the square brackets are there
Avatar of trican

ASKER

still not quite working? I suspect its very close
what is not working, and in what way is it not working?
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
also, the check for KEYWORD in a string, use the "in" operator



for line in open("file"):
   if "KEYWORD" in line : # or if line.startswith("KEYWORD")
       #do something

Open in new window

Avatar of trican

ASKER

ozo,

I think  its workin ok now - I was confused that I could always use group(2) to access the information that I needed.  If you could explain why this work in this manner all the points are yours.

Also thanks ghostdog, whilst i agree not using regular epxressions is more readable I actually need the flexibility it gives.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial