Regular expresional with optional match

Hi,

I am trying to figure out a regular expression to parse the following lines of text line by line.

Send Request 208JRB03~    Job:        JQZ1881H  Destination:  AA.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 3668    k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300053  SWD Pkg Version: 3~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.144-660><thread=7796 (0x1E74)>
Send Request 208JTB03~    Job:        JYKKQX84  Destination:  BB.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 3246    k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300052  SWD Pkg Version: 2~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.157-660><thread=7796 (0x1E74)>
Send Request 208JUB03~    Job:        JBBYPU4Y  Destination:  CC.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 20      k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300052  SWD Pkg Version: 2~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.178-660><thread=7796 (0x1E74)>
Send Request 208JWB03~    Job:        JXQT2T11  Destination:  DD.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 2419040 k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300054  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.192-660><thread=7796 (0x1E74)>
Send Request 208JXB03~    Job:        J2ODKBP0  Destination:  AA.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 2375789 k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300054  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.204-660><thread=7796 (0x1E74)>
Send Request 208K1B03~    Job:        JB1BVVWP  Destination:  GG.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 3142207 k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300055  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.217-660><thread=7796 (0x1E74)>
Send Request 208JYB03~    Job:        JXL6Q1VY  Destination:  TT.test.internal~    State:      Pending   Status:               Action:    None~    Total size: 0       k Remaining: 0       k Heartbeat: 15:55~    Start:      12:00     Finish:    12:00      Retry:          ~    SWD PkgID:  X0300053  SWD Pkg Version: 3~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.223-660><thread=7796 (0x1E74)>
Send Request 208JZB03~    Job:        J49K1PHU  Destination:  YY.test.internal~    State:      Pending   Status:               Action:    None~    Total size: 0       k Remaining: 0       k Heartbeat: 15:55~    Start:      12:00     Finish:    12:00      Retry:          ~    SWD PkgID:  X0300053  SWD Pkg Version: 3~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.229-660><thread=7796 (0x1E74)>
Send Request 208K0B03~    Job:        JUVML621  Destination:  ZZ.test.internal~    State:      Pending   Status:               Action:    None~    Total size: 0       k Remaining: 0       k Heartbeat: 15:55~    Start:      12:00     Finish:    12:00      Retry:          ~    SWD PkgID:  X0300054  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.236-660><thread=7796 (0x1E74)>

Open in new window


Here is my regular expression:

Send Request\s(?<send_request>\S*)\s*Job:\s*(?<job>\S*)\s*Destination:\s*(?<destination>\S*)~\s*State:\s*(?<state>\S*)\s*Status:\s*(?<status>\S*)\s*Action:\s*(?<action>\S*)~\s*Total size:\s(?<total_size>\d*).*Remaining:\s(?<remaining>\d*).*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?<retry>\d\d:\d\d)~\s*SWD PkgID:\s*(?<package_id>\S*)\s*SWD Pkg Version:\s*(?<package_version>\d*)

Open in new window


The problem I am having is with the status field as you can see above the status is an optional field. So I still want to parse the last 3 lines but have status not picked up or returned as blank.
So how can I change my regular expression to deal with this ?

PS. I have attached the question as a text file to make it easier for people to read.

Thanks,

Ward.
question.txt
LVL 1
whorsfallAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
skullnobrainsConnect With a Mentor Commented:
if my understanding is correct, the problem you have is the 3 last lines do not match at all

if I'm correct, this is how to break it up :

Status:                     matches "Status:"
\s*                           matches all the whitespace
(?<status>\S*)          matches "Action:"
\s*                           matches nothing
Action                      not found so the ereg does not match

i guess switching to ungreedy mode should be enough to make your existing expression work. i'd also change the \s* to \s+ for more safety

of course if you know what might be a valid action, you can always try something like

Status:\s*(?<status>(?:Active|Inactive|))\s*Action

allowing for "Active" "Inactive" or "" statuses
0
 
Kent DyerIT Security Analyst SeniorCommented:
Can't you simplify this down to the following?

\b.test.internal~    State:      Working   Status:    Active\b

Open in new window


I just checked that in EditPad and it works great!

Ref - http://www.regular-expressions.info/wordboundaries.html
Q-28332143-results.txt
0
 
whorsfallAuthor Commented:
Kent,

Thanks for responding - I realized reading your response and my original question I did not state it correctly - my fault :)

What I wanted to do was get the regular expression is to handle all the lines in the file and *optionally* match the word after "Status:" and before "Action:" if there is one.  So for the first six lines it would capture "Active" into the named capture group "status".

Now for the last the lines nothing or blank would be captured in the capture group "status"
however all the other capture groups would match though. So this is why I am calling it an optional capture. As Status might or might not have data there.

Hope this make sense :)

Thanks,

Ward
0
What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

 
ozoCommented:
\S* seems to work fine for the optional <status>
but then \d\d:\d\d fails for the missing <retry>
So try
Send Request\s(?<send_request>\S*)\s*Job:\s*(?<job>\S*)\s*Destination:\s*(?<destination>\S*)~\s*State:\s*(?<state>\S*)\s*Status:\s*(?<status>\S*)\s*Action:\s*(?<action>\S*)~\s*Total size:\s(?<total_size>\d*).*Remaining:\s(?<remaining>\d*).*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?<retry>\d\d:\d\d)?~\s*SWD PkgID:\s*(?<package_id>\S*)\s*SWD Pkg Version:\s*(?<package_version>\d*)
0
 
Fernando SotoRetiredCommented:
Hi whorsfall;

I think you will find this pattern to work for both cases. There is a second issue with the Retry field one has digits and the other does not so I took care of that as well.

string pattern = @"Send Request\s(?<send_request>\S*)\s*Job:\s*(?<job>\S*)\s*Destination:\s*(?<destination>\S*)~\s*State:\s*(?<state>\S*)\s*Status:\s*(?<status>.*)Action:\s*(?<action>\S*)~\s*Total size:\s(?<total_size>\d*).*Remaining:\s(?<remaining>\d*).*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?<retry>(?:\d\d:\d\d~)|(?:~))\s*SWD PkgID:\s*(?<package_id>\S*)\s*SWD Pkg Version:\s*(?<package_version>\d*)";

Open in new window

0
 
ozoCommented:
http:#a39763594 changes not only how your last 3 lines match, but also how your first 6 lines match.  
http:#a39761680 assumed that you were satisfied with how your original expression was matching the first 6  lines, and that you only wanted to change it to also match the last 3 lines.
0
 
Derek JensenCommented:
Wow, well...you've definitely got a head-scratcher there.
Let me go ahead and post what I was going to say, before I tested out my regex and found it to be much, much more difficult than I first anticipated(I could've sworn I've done this a dozen times before!), and I will close with my proposed solution:


Apologies, I'm not a C# guru, but I do know regex, and if what I'm seeing is correct, http:#a39763594 's post, changing
Status:\s*(?<status>\S*)\s*Action:\s*(?<action>\S*)
to:
Status:\s*(?<status>.*)Action:\s*(?<action>\S*)
still doesn't solve the problem.
The capture of \s* after "Status:" is still a greedy match, and so will still match up to but not including the first letter of "Action:", causing the entire regex to fail to match lines where status is not present.

However, I believe one of the following two changes should work:
Concerning the portion of relevant regex after "Status" to before "Action:",
\s*(?<status>\S*)\s*
should become:
\s*(?<status>.*)?\s*
This tries to make the entire named capture group <status> optional, meaning there may or may not exist at all a var named status after each line, or it may break your regex entirely(preliminary research suggests it won't).

Alternatively:
(?<status>.*?)
This simply turns the capture regex for populating <status> into a non-greedy "match everything" search (.*?). This of course means it's going to capture all the spaces before/after the "Active" or whatever word might be there, so you'll have to strip those out separately.



...Okay, I think I got it! :-D

So forget all the above regexes I said to try, and try this one out instead:
string pattern = @"Send Request\s*(?<send_request>\S+)~\s*Job:\s*(?<job>\S+)\s*Destination:\s*(?<destination>\S+)~\s*State:\s*(?<state>\S+)\s*Status:\s*(?!Action:)(?|(?<status>\S+)\s*?|(?<status>(\s|\S)*?))\s*Action:\s*(?<action>\S+)~\s*Total size:\s*(?<total_size>\d+)(\s|k)*Remaining:\s*(?<remaining>\d+)(\s|k)*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?!SWD)(?|(?<retry>\d\d:\d\d)~|((\s|\S)*?)~)\s*SWD PkgID:\s*(?<package_id>\S+)\s*SWD Pkg Version:\s*(?<package_version>\d+).*";

Open in new window

I also found you were having the same problem with Retry as you were with Status, as your .* after <total_size> was eating up the rest of the line...so I fixed the remaining regex also. :-)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.