Link to home
Start Free TrialLog in
Avatar of whorsfall
whorsfallFlag for Australia

asked on

Regular expresional with optional match

Hi,

I am trying to figure out a regular expression to parse the following lines of text line by line.

Send Request 208JRB03~    Job:        JQZ1881H  Destination:  AA.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 3668    k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300053  SWD Pkg Version: 3~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.144-660><thread=7796 (0x1E74)>
Send Request 208JTB03~    Job:        JYKKQX84  Destination:  BB.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 3246    k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300052  SWD Pkg Version: 2~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.157-660><thread=7796 (0x1E74)>
Send Request 208JUB03~    Job:        JBBYPU4Y  Destination:  CC.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 20      k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300052  SWD Pkg Version: 2~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.178-660><thread=7796 (0x1E74)>
Send Request 208JWB03~    Job:        JXQT2T11  Destination:  DD.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 2419040 k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300054  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.192-660><thread=7796 (0x1E74)>
Send Request 208JXB03~    Job:        J2ODKBP0  Destination:  AA.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 2375789 k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300054  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.204-660><thread=7796 (0x1E74)>
Send Request 208K1B03~    Job:        JB1BVVWP  Destination:  GG.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 3142207 k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300055  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.217-660><thread=7796 (0x1E74)>
Send Request 208JYB03~    Job:        JXL6Q1VY  Destination:  TT.test.internal~    State:      Pending   Status:               Action:    None~    Total size: 0       k Remaining: 0       k Heartbeat: 15:55~    Start:      12:00     Finish:    12:00      Retry:          ~    SWD PkgID:  X0300053  SWD Pkg Version: 3~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.223-660><thread=7796 (0x1E74)>
Send Request 208JZB03~    Job:        J49K1PHU  Destination:  YY.test.internal~    State:      Pending   Status:               Action:    None~    Total size: 0       k Remaining: 0       k Heartbeat: 15:55~    Start:      12:00     Finish:    12:00      Retry:          ~    SWD PkgID:  X0300053  SWD Pkg Version: 3~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.229-660><thread=7796 (0x1E74)>
Send Request 208K0B03~    Job:        JUVML621  Destination:  ZZ.test.internal~    State:      Pending   Status:               Action:    None~    Total size: 0       k Remaining: 0       k Heartbeat: 15:55~    Start:      12:00     Finish:    12:00      Retry:          ~    SWD PkgID:  X0300054  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.236-660><thread=7796 (0x1E74)>

Open in new window


Here is my regular expression:

Send Request\s(?<send_request>\S*)\s*Job:\s*(?<job>\S*)\s*Destination:\s*(?<destination>\S*)~\s*State:\s*(?<state>\S*)\s*Status:\s*(?<status>\S*)\s*Action:\s*(?<action>\S*)~\s*Total size:\s(?<total_size>\d*).*Remaining:\s(?<remaining>\d*).*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?<retry>\d\d:\d\d)~\s*SWD PkgID:\s*(?<package_id>\S*)\s*SWD Pkg Version:\s*(?<package_version>\d*)

Open in new window


The problem I am having is with the status field as you can see above the status is an optional field. So I still want to parse the last 3 lines but have status not picked up or returned as blank.
So how can I change my regular expression to deal with this ?

PS. I have attached the question as a text file to make it easier for people to read.

Thanks,

Ward.
question.txt
Avatar of Kent Dyer
Kent Dyer
Flag of United States of America image

Can't you simplify this down to the following?

\b.test.internal~    State:      Working   Status:    Active\b

Open in new window


I just checked that in EditPad and it works great!

Ref - http://www.regular-expressions.info/wordboundaries.html
Q-28332143-results.txt
Avatar of whorsfall

ASKER

Kent,

Thanks for responding - I realized reading your response and my original question I did not state it correctly - my fault :)

What I wanted to do was get the regular expression is to handle all the lines in the file and *optionally* match the word after "Status:" and before "Action:" if there is one.  So for the first six lines it would capture "Active" into the named capture group "status".

Now for the last the lines nothing or blank would be captured in the capture group "status"
however all the other capture groups would match though. So this is why I am calling it an optional capture. As Status might or might not have data there.

Hope this make sense :)

Thanks,

Ward
\S* seems to work fine for the optional <status>
but then \d\d:\d\d fails for the missing <retry>
So try
Send Request\s(?<send_request>\S*)\s*Job:\s*(?<job>\S*)\s*Destination:\s*(?<destination>\S*)~\s*State:\s*(?<state>\S*)\s*Status:\s*(?<status>\S*)\s*Action:\s*(?<action>\S*)~\s*Total size:\s(?<total_size>\d*).*Remaining:\s(?<remaining>\d*).*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?<retry>\d\d:\d\d)?~\s*SWD PkgID:\s*(?<package_id>\S*)\s*SWD Pkg Version:\s*(?<package_version>\d*)
Hi whorsfall;

I think you will find this pattern to work for both cases. There is a second issue with the Retry field one has digits and the other does not so I took care of that as well.

string pattern = @"Send Request\s(?<send_request>\S*)\s*Job:\s*(?<job>\S*)\s*Destination:\s*(?<destination>\S*)~\s*State:\s*(?<state>\S*)\s*Status:\s*(?<status>.*)Action:\s*(?<action>\S*)~\s*Total size:\s(?<total_size>\d*).*Remaining:\s(?<remaining>\d*).*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?<retry>(?:\d\d:\d\d~)|(?:~))\s*SWD PkgID:\s*(?<package_id>\S*)\s*SWD Pkg Version:\s*(?<package_version>\d*)";

Open in new window

http:#a39763594 changes not only how your last 3 lines match, but also how your first 6 lines match.  
http:#a39761680 assumed that you were satisfied with how your original expression was matching the first 6  lines, and that you only wanted to change it to also match the last 3 lines.
Wow, well...you've definitely got a head-scratcher there.
Let me go ahead and post what I was going to say, before I tested out my regex and found it to be much, much more difficult than I first anticipated(I could've sworn I've done this a dozen times before!), and I will close with my proposed solution:


Apologies, I'm not a C# guru, but I do know regex, and if what I'm seeing is correct, http:#a39763594 's post, changing
Status:\s*(?<status>\S*)\s*Action:\s*(?<action>\S*)
to:
Status:\s*(?<status>.*)Action:\s*(?<action>\S*)
still doesn't solve the problem.
The capture of \s* after "Status:" is still a greedy match, and so will still match up to but not including the first letter of "Action:", causing the entire regex to fail to match lines where status is not present.

However, I believe one of the following two changes should work:
Concerning the portion of relevant regex after "Status" to before "Action:",
\s*(?<status>\S*)\s*
should become:
\s*(?<status>.*)?\s*
This tries to make the entire named capture group <status> optional, meaning there may or may not exist at all a var named status after each line, or it may break your regex entirely(preliminary research suggests it won't).

Alternatively:
(?<status>.*?)
This simply turns the capture regex for populating <status> into a non-greedy "match everything" search (.*?). This of course means it's going to capture all the spaces before/after the "Active" or whatever word might be there, so you'll have to strip those out separately.



...Okay, I think I got it! :-D

So forget all the above regexes I said to try, and try this one out instead:
string pattern = @"Send Request\s*(?<send_request>\S+)~\s*Job:\s*(?<job>\S+)\s*Destination:\s*(?<destination>\S+)~\s*State:\s*(?<state>\S+)\s*Status:\s*(?!Action:)(?|(?<status>\S+)\s*?|(?<status>(\s|\S)*?))\s*Action:\s*(?<action>\S+)~\s*Total size:\s*(?<total_size>\d+)(\s|k)*Remaining:\s*(?<remaining>\d+)(\s|k)*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?!SWD)(?|(?<retry>\d\d:\d\d)~|((\s|\S)*?)~)\s*SWD PkgID:\s*(?<package_id>\S+)\s*SWD Pkg Version:\s*(?<package_version>\d+).*";

Open in new window

I also found you were having the same problem with Retry as you were with Status, as your .* after <total_size> was eating up the rest of the line...so I fixed the remaining regex also. :-)
ASKER CERTIFIED SOLUTION
Avatar of skullnobrains
skullnobrains

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial