whorsfall
asked on
Regular expresional with optional match
Hi,
I am trying to figure out a regular expression to parse the following lines of text line by line.
Here is my regular expression:
The problem I am having is with the status field as you can see above the status is an optional field. So I still want to parse the last 3 lines but have status not picked up or returned as blank.
So how can I change my regular expression to deal with this ?
PS. I have attached the question as a text file to make it easier for people to read.
Thanks,
Ward.
question.txt
I am trying to figure out a regular expression to parse the following lines of text line by line.
Send Request 208JRB03~ Job: JQZ1881H Destination: AA.test.internal~ State: Working Status: Active Action: None~ Total size: 0 k Remaining: 3668 k Heartbeat: 17:46~ Start: 12:00 Finish: 12:00 Retry: 17:46~ SWD PkgID: X0300053 SWD Pkg Version: 3~ $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.144-660><thread=7796 (0x1E74)>
Send Request 208JTB03~ Job: JYKKQX84 Destination: BB.test.internal~ State: Working Status: Active Action: None~ Total size: 0 k Remaining: 3246 k Heartbeat: 17:46~ Start: 12:00 Finish: 12:00 Retry: 17:46~ SWD PkgID: X0300052 SWD Pkg Version: 2~ $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.157-660><thread=7796 (0x1E74)>
Send Request 208JUB03~ Job: JBBYPU4Y Destination: CC.test.internal~ State: Working Status: Active Action: None~ Total size: 0 k Remaining: 20 k Heartbeat: 17:46~ Start: 12:00 Finish: 12:00 Retry: 17:46~ SWD PkgID: X0300052 SWD Pkg Version: 2~ $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.178-660><thread=7796 (0x1E74)>
Send Request 208JWB03~ Job: JXQT2T11 Destination: DD.test.internal~ State: Working Status: Active Action: None~ Total size: 0 k Remaining: 2419040 k Heartbeat: 17:46~ Start: 12:00 Finish: 12:00 Retry: 17:46~ SWD PkgID: X0300054 SWD Pkg Version: 4~ $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.192-660><thread=7796 (0x1E74)>
Send Request 208JXB03~ Job: J2ODKBP0 Destination: AA.test.internal~ State: Working Status: Active Action: None~ Total size: 0 k Remaining: 2375789 k Heartbeat: 17:46~ Start: 12:00 Finish: 12:00 Retry: 17:46~ SWD PkgID: X0300054 SWD Pkg Version: 4~ $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.204-660><thread=7796 (0x1E74)>
Send Request 208K1B03~ Job: JB1BVVWP Destination: GG.test.internal~ State: Working Status: Active Action: None~ Total size: 0 k Remaining: 3142207 k Heartbeat: 17:46~ Start: 12:00 Finish: 12:00 Retry: 17:46~ SWD PkgID: X0300055 SWD Pkg Version: 4~ $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.217-660><thread=7796 (0x1E74)>
Send Request 208JYB03~ Job: JXL6Q1VY Destination: TT.test.internal~ State: Pending Status: Action: None~ Total size: 0 k Remaining: 0 k Heartbeat: 15:55~ Start: 12:00 Finish: 12:00 Retry: ~ SWD PkgID: X0300053 SWD Pkg Version: 3~ $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.223-660><thread=7796 (0x1E74)>
Send Request 208JZB03~ Job: J49K1PHU Destination: YY.test.internal~ State: Pending Status: Action: None~ Total size: 0 k Remaining: 0 k Heartbeat: 15:55~ Start: 12:00 Finish: 12:00 Retry: ~ SWD PkgID: X0300053 SWD Pkg Version: 3~ $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.229-660><thread=7796 (0x1E74)>
Send Request 208K0B03~ Job: JUVML621 Destination: ZZ.test.internal~ State: Pending Status: Action: None~ Total size: 0 k Remaining: 0 k Heartbeat: 15:55~ Start: 12:00 Finish: 12:00 Retry: ~ SWD PkgID: X0300054 SWD Pkg Version: 4~ $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.236-660><thread=7796 (0x1E74)>
Here is my regular expression:
Send Request\s(?<send_request>\S*)\s*Job:\s*(?<job>\S*)\s*Destination:\s*(?<destination>\S*)~\s*State:\s*(?<state>\S*)\s*Status:\s*(?<status>\S*)\s*Action:\s*(?<action>\S*)~\s*Total size:\s(?<total_size>\d*).*Remaining:\s(?<remaining>\d*).*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?<retry>\d\d:\d\d)~\s*SWD PkgID:\s*(?<package_id>\S*)\s*SWD Pkg Version:\s*(?<package_version>\d*)
The problem I am having is with the status field as you can see above the status is an optional field. So I still want to parse the last 3 lines but have status not picked up or returned as blank.
So how can I change my regular expression to deal with this ?
PS. I have attached the question as a text file to make it easier for people to read.
Thanks,
Ward.
question.txt
ASKER
Kent,
Thanks for responding - I realized reading your response and my original question I did not state it correctly - my fault :)
What I wanted to do was get the regular expression is to handle all the lines in the file and *optionally* match the word after "Status:" and before "Action:" if there is one. So for the first six lines it would capture "Active" into the named capture group "status".
Now for the last the lines nothing or blank would be captured in the capture group "status"
however all the other capture groups would match though. So this is why I am calling it an optional capture. As Status might or might not have data there.
Hope this make sense :)
Thanks,
Ward
Thanks for responding - I realized reading your response and my original question I did not state it correctly - my fault :)
What I wanted to do was get the regular expression is to handle all the lines in the file and *optionally* match the word after "Status:" and before "Action:" if there is one. So for the first six lines it would capture "Active" into the named capture group "status".
Now for the last the lines nothing or blank would be captured in the capture group "status"
however all the other capture groups would match though. So this is why I am calling it an optional capture. As Status might or might not have data there.
Hope this make sense :)
Thanks,
Ward
\S* seems to work fine for the optional <status>
but then \d\d:\d\d fails for the missing <retry>
So try
Send Request\s(?<send_request>\ S*)\s*Job: \s*(?<job> \S*)\s*Des tination:\ s*(?<desti nation>\S* )~\s*State :\s*(?<sta te>\S*)\s* Status:\s* (?<status> \S*)\s*Act ion:\s*(?< action>\S* )~\s*Total size:\s(?<total_size>\d*). *Remaining :\s(?<rema ining>\d*) .*Heartbea t:\s*(?<he artbeat>\d \d:\d\d)~\ s*Start:\s *(?<start> \d\d:\d\d) \s*Finish: \s*(?<fini sh>\d\d:\d \d)\s*Retr y:\s*(?<re try>\d\d:\ d\d)?~\s*S WD PkgID:\s*(?<package_id>\S* )\s*SWD Pkg Version:\s*(?<package_vers ion>\d*)
but then \d\d:\d\d fails for the missing <retry>
So try
Send Request\s(?<send_request>\
Hi whorsfall;
I think you will find this pattern to work for both cases. There is a second issue with the Retry field one has digits and the other does not so I took care of that as well.
I think you will find this pattern to work for both cases. There is a second issue with the Retry field one has digits and the other does not so I took care of that as well.
string pattern = @"Send Request\s(?<send_request>\S*)\s*Job:\s*(?<job>\S*)\s*Destination:\s*(?<destination>\S*)~\s*State:\s*(?<state>\S*)\s*Status:\s*(?<status>.*)Action:\s*(?<action>\S*)~\s*Total size:\s(?<total_size>\d*).*Remaining:\s(?<remaining>\d*).*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?<retry>(?:\d\d:\d\d~)|(?:~))\s*SWD PkgID:\s*(?<package_id>\S*)\s*SWD Pkg Version:\s*(?<package_version>\d*)";
http:#a39763594 changes not only how your last 3 lines match, but also how your first 6 lines match.
http:#a39761680 assumed that you were satisfied with how your original expression was matching the first 6 lines, and that you only wanted to change it to also match the last 3 lines.
http:#a39761680 assumed that you were satisfied with how your original expression was matching the first 6 lines, and that you only wanted to change it to also match the last 3 lines.
Wow, well...you've definitely got a head-scratcher there.
Let me go ahead and post what I was going to say, before I tested out my regex and found it to be much, much more difficult than I first anticipated(I could've sworn I've done this a dozen times before!), and I will close with my proposed solution:
...Okay, I think I got it! :-D
So forget all the above regexes I said to try, and try this one out instead:
Let me go ahead and post what I was going to say, before I tested out my regex and found it to be much, much more difficult than I first anticipated(I could've sworn I've done this a dozen times before!), and I will close with my proposed solution:
Apologies, I'm not a C# guru, but I do know regex, and if what I'm seeing is correct, http:#a39763594 's post, changing
Status:\s*(?<status>\S*)\s*Action:\s *(?<action >\S*)
to:
Status:\s*(?<status>.*)Action:\s*(?< action>\S* )
still doesn't solve the problem.
The capture of \s* after "Status:" is still a greedy match, and so will still match up to but not including the first letter of "Action:", causing the entire regex to fail to match lines where status is not present.
However, I believe one of the following two changes should work:
Concerning the portion of relevant regex after "Status" to before "Action:",
\s*(?<status>\S*)\s*
should become:
\s*(?<status>.*)?\s*
This tries to make the entire named capture group <status> optional, meaning there may or may not exist at all a var named status after each line, or it may break your regex entirely(preliminary research suggests it won't).
Alternatively:
(?<status>.*?)
This simply turns the capture regex for populating <status> into a non-greedy "match everything" search (.*?). This of course means it's going to capture all the spaces before/after the "Active" or whatever word might be there, so you'll have to strip those out separately.
...Okay, I think I got it! :-D
So forget all the above regexes I said to try, and try this one out instead:
string pattern = @"Send Request\s*(?<send_request>\S+)~\s*Job:\s*(?<job>\S+)\s*Destination:\s*(?<destination>\S+)~\s*State:\s*(?<state>\S+)\s*Status:\s*(?!Action:)(?|(?<status>\S+)\s*?|(?<status>(\s|\S)*?))\s*Action:\s*(?<action>\S+)~\s*Total size:\s*(?<total_size>\d+)(\s|k)*Remaining:\s*(?<remaining>\d+)(\s|k)*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?!SWD)(?|(?<retry>\d\d:\d\d)~|((\s|\S)*?)~)\s*SWD PkgID:\s*(?<package_id>\S+)\s*SWD Pkg Version:\s*(?<package_version>\d+).*";
I also found you were having the same problem with Retry as you were with Status, as your .* after <total_size> was eating up the rest of the line...so I fixed the remaining regex also. :-)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Open in new window
I just checked that in EditPad and it works great!
Ref - http://www.regular-expressions.info/wordboundaries.html
Q-28332143-results.txt