trican
asked on
Python regular expression
Hi all,
I have data like the following which I wish to process using a regular expression:
KEYWORD1 VARIABLE_1;
KEYWORD1 VARIABLE_2;
KEYWORD2 VARIABLE_3;
KEYWORD2 VARIABLE_4;
So i use the following:
temp1 = re.compile('\s*KEYWORD1\s+ (\S+)')
temp2 = re.compile('\s*KEYWORD2\s+ (\S+)')
Using this its straight forward to extract and store the VARIABLE_1 to VARIABLE_4, however I sometimes have the following structure
KEYWORD1 [1_OR_MORE_NUMBERS:0] VARIABLE_5;
KEYWORD1 [1_OR_MORE_NUMBERS:0] VARIABLE_6;
I can't seem to devise a regular expression that nicely extracts and stores all the variables - any thoughts?
I have data like the following which I wish to process using a regular expression:
KEYWORD1 VARIABLE_1;
KEYWORD1 VARIABLE_2;
KEYWORD2 VARIABLE_3;
KEYWORD2 VARIABLE_4;
So i use the following:
temp1 = re.compile('\s*KEYWORD1\s+
temp2 = re.compile('\s*KEYWORD2\s+
Using this its straight forward to extract and store the VARIABLE_1 to VARIABLE_4, however I sometimes have the following structure
KEYWORD1 [1_OR_MORE_NUMBERS:0] VARIABLE_5;
KEYWORD1 [1_OR_MORE_NUMBERS:0] VARIABLE_6;
I can't seem to devise a regular expression that nicely extracts and stores all the variables - any thoughts?
do you want to skip over the [1_OR_MORE_NUMBERS:0] or do you want to extract it together with the VARIABLE_5?
ASKER
Hi ozo,
I want to skip over [1_OR_MORE_NUMBERS:0]
I want to skip over [1_OR_MORE_NUMBERS:0]
by 1 or more numbers, do you mean 1 or more digits?
temp1 = re.compile('\s*KEYWORD1\s+(\d+:0)?(\S+)')
ASKER
yes one or more digits
to skip over things in [] you might use
'\s*KEYWORD2(?:\[.*?\]|\s) +(\S+)'
to get the last \S+ before the ; you might use
'\s*KEYWORD2\s.*?(\S+);'
'\s*KEYWORD2(?:\[.*?\]|\s)
to get the last \S+ before the ; you might use
'\s*KEYWORD2\s.*?(\S+);'
Are the square brackets actually there?
temp1 = re.compile('\s*KEYWORD1\s+(?\[\d+:0])?(\S+)')
ASKER
yes the square brackets are there
ASKER
still not quite working? I suspect its very close
what is not working, and in what way is it not working?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
also, the check for KEYWORD in a string, use the "in" operator
for line in open("file"):
if "KEYWORD" in line : # or if line.startswith("KEYWORD")
#do something
ASKER
ozo,
I think its workin ok now - I was confused that I could always use group(2) to access the information that I needed. If you could explain why this work in this manner all the points are yours.
Also thanks ghostdog, whilst i agree not using regular epxressions is more readable I actually need the flexibility it gives.
I think its workin ok now - I was confused that I could always use group(2) to access the information that I needed. If you could explain why this work in this manner all the points are yours.
Also thanks ghostdog, whilst i agree not using regular epxressions is more readable I actually need the flexibility it gives.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.