Link to home
Start Free TrialLog in
Avatar of dshrenik
dshrenikFlag for United States of America

asked on

Extract patterns in order (Python)

I need to implement the following:

Input: type1("s df")  type2("df g")  type3("ds ff   d")  type4("fdgfg")

I need to extract the patterns type1(.*), type2(.*), type3(.*), and type4(.*), in the same order as they are found in the input and store them in an array. ( I have exactly 4 types)

I read about re.split, re.match, re.search, etc. But, I don't think I can use any of them to achieve my goal.
Please tell me how I must approach this.
Avatar of kaufmed
kaufmed
Flag of United States of America image

I'm not entirely clear on your input. If these "patterns" all occur in the same search string, then you should be able to use the following. You would inspect the group member of the returned object from match(). Group 1 would be "type1", group 2 would be "type 2", etc.
m = re.match(r"type1\(([^)]*).*?type2\(([^)]*).*?type3\(([^)]*).*?type4\(([^)]*)", "type1(\"s df\")  type2(\"df g\")  type3(\"ds ff   d\")  type4(\"fdgfg\")")

Open in new window

Untitled.jpg
Avatar of dshrenik

ASKER

I don't think I have made my requirement clear.

The output for the example input you considered should be: (Lets use single quotes to avoid escape sequences)
type1('s df')
type2('df g')
type3('ds ff   d')
type4('fdgfg')

The input can have any of these 4 types appear in any order, any number of times.
For example if the input is " type1('ds f') type3('dsf f') type1('sdf') type1('dsf'), then the output must be collected in an array in this order:
type1('ds f')
type3('dsf f')
type1('sdf')
type1('dsf')

So what I need is to collect each type (as a whole) in the order they appear.

Thanks for your efforts! Hope what I want to get isn't very difficult to implement.
Is this more to the desired result?
re.findall(r"(type\d+\([^)]*\))", "type1('ds f') type3('dsf f') type1('sdf') type1('dsf')")

Open in new window

Thats great!
But it wouldn't work always for me, since not all types are name as type1, type2, etc.
I did that only for convenience. Sorry for making myself clear.

The actual types are:
exa('dsfdsf'), almost('sdfdsf'), any('sdf sdf')
Please leave me know if it is possible to customize this. Sorry for the trouble!  
It's cool. Since regex is all about matching patterns, we need to come up with some pattern that will be common to all your strings. Is it safe to assume that a pseudo-pattern would be:

    a string of alphanumerics, followed by an opening parentheses, followed by some string (possibly quoted), followed by a closing parentheses

??
Sure. I guess that should work.
ASKER CERTIFIED SOLUTION
Avatar of kaufmed
kaufmed
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks. That works great. Needed just 1 modification - we need to remove \d+.

If possible, can you address this question?
https://www.experts-exchange.com/questions/26553421/Regular-Expression-Parser.html
>>  we need to remove \d+

True enough. I'm up past my bedtime and I think my brain is slowing down on me, trying to tell me to go to bed  ;)

Glad it worked for you!