Solved

Regular expresional with optional match

Posted on 2014-01-07
7
301 Views
Last Modified: 2014-03-17
Hi,

I am trying to figure out a regular expression to parse the following lines of text line by line.

Send Request 208JRB03~    Job:        JQZ1881H  Destination:  AA.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 3668    k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300053  SWD Pkg Version: 3~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.144-660><thread=7796 (0x1E74)>
Send Request 208JTB03~    Job:        JYKKQX84  Destination:  BB.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 3246    k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300052  SWD Pkg Version: 2~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.157-660><thread=7796 (0x1E74)>
Send Request 208JUB03~    Job:        JBBYPU4Y  Destination:  CC.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 20      k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300052  SWD Pkg Version: 2~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.178-660><thread=7796 (0x1E74)>
Send Request 208JWB03~    Job:        JXQT2T11  Destination:  DD.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 2419040 k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300054  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.192-660><thread=7796 (0x1E74)>
Send Request 208JXB03~    Job:        J2ODKBP0  Destination:  AA.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 2375789 k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300054  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.204-660><thread=7796 (0x1E74)>
Send Request 208K1B03~    Job:        JB1BVVWP  Destination:  GG.test.internal~    State:      Working   Status:    Active     Action:    None~    Total size: 0       k Remaining: 3142207 k Heartbeat: 17:46~    Start:      12:00     Finish:    12:00      Retry:     17:46~    SWD PkgID:  X0300055  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.217-660><thread=7796 (0x1E74)>
Send Request 208JYB03~    Job:        JXL6Q1VY  Destination:  TT.test.internal~    State:      Pending   Status:               Action:    None~    Total size: 0       k Remaining: 0       k Heartbeat: 15:55~    Start:      12:00     Finish:    12:00      Retry:          ~    SWD PkgID:  X0300053  SWD Pkg Version: 3~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.223-660><thread=7796 (0x1E74)>
Send Request 208JZB03~    Job:        J49K1PHU  Destination:  YY.test.internal~    State:      Pending   Status:               Action:    None~    Total size: 0       k Remaining: 0       k Heartbeat: 15:55~    Start:      12:00     Finish:    12:00      Retry:          ~    SWD PkgID:  X0300053  SWD Pkg Version: 3~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.229-660><thread=7796 (0x1E74)>
Send Request 208K0B03~    Job:        JUVML621  Destination:  ZZ.test.internal~    State:      Pending   Status:               Action:    None~    Total size: 0       k Remaining: 0       k Heartbeat: 15:55~    Start:      12:00     Finish:    12:00      Retry:          ~    SWD PkgID:  X0300054  SWD Pkg Version: 4~  $$<SMS_PACKAGE_TRANSFER_MANAGER><01-07-2014 17:46:26.236-660><thread=7796 (0x1E74)>

Open in new window


Here is my regular expression:

Send Request\s(?<send_request>\S*)\s*Job:\s*(?<job>\S*)\s*Destination:\s*(?<destination>\S*)~\s*State:\s*(?<state>\S*)\s*Status:\s*(?<status>\S*)\s*Action:\s*(?<action>\S*)~\s*Total size:\s(?<total_size>\d*).*Remaining:\s(?<remaining>\d*).*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?<retry>\d\d:\d\d)~\s*SWD PkgID:\s*(?<package_id>\S*)\s*SWD Pkg Version:\s*(?<package_version>\d*)

Open in new window


The problem I am having is with the status field as you can see above the status is an optional field. So I still want to parse the last 3 lines but have status not picked up or returned as blank.
So how can I change my regular expression to deal with this ?

PS. I have attached the question as a text file to make it easier for people to read.

Thanks,

Ward.
question.txt
0
Comment
Question by:whorsfall
7 Comments
 
LVL 17

Expert Comment

by:Kent Dyer
ID: 39761557
Can't you simplify this down to the following?

\b.test.internal~    State:      Working   Status:    Active\b

Open in new window


I just checked that in EditPad and it works great!

Ref - http://www.regular-expressions.info/wordboundaries.html
Q-28332143-results.txt
0
 

Author Comment

by:whorsfall
ID: 39761596
Kent,

Thanks for responding - I realized reading your response and my original question I did not state it correctly - my fault :)

What I wanted to do was get the regular expression is to handle all the lines in the file and *optionally* match the word after "Status:" and before "Action:" if there is one.  So for the first six lines it would capture "Active" into the named capture group "status".

Now for the last the lines nothing or blank would be captured in the capture group "status"
however all the other capture groups would match though. So this is why I am calling it an optional capture. As Status might or might not have data there.

Hope this make sense :)

Thanks,

Ward
0
 
LVL 84

Expert Comment

by:ozo
ID: 39761680
\S* seems to work fine for the optional <status>
but then \d\d:\d\d fails for the missing <retry>
So try
Send Request\s(?<send_request>\S*)\s*Job:\s*(?<job>\S*)\s*Destination:\s*(?<destination>\S*)~\s*State:\s*(?<state>\S*)\s*Status:\s*(?<status>\S*)\s*Action:\s*(?<action>\S*)~\s*Total size:\s(?<total_size>\d*).*Remaining:\s(?<remaining>\d*).*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?<retry>\d\d:\d\d)?~\s*SWD PkgID:\s*(?<package_id>\S*)\s*SWD Pkg Version:\s*(?<package_version>\d*)
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 62

Expert Comment

by:Fernando Soto
ID: 39763594
Hi whorsfall;

I think you will find this pattern to work for both cases. There is a second issue with the Retry field one has digits and the other does not so I took care of that as well.

string pattern = @"Send Request\s(?<send_request>\S*)\s*Job:\s*(?<job>\S*)\s*Destination:\s*(?<destination>\S*)~\s*State:\s*(?<state>\S*)\s*Status:\s*(?<status>.*)Action:\s*(?<action>\S*)~\s*Total size:\s(?<total_size>\d*).*Remaining:\s(?<remaining>\d*).*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?<retry>(?:\d\d:\d\d~)|(?:~))\s*SWD PkgID:\s*(?<package_id>\S*)\s*SWD Pkg Version:\s*(?<package_version>\d*)";

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 39763740
http:#a39763594 changes not only how your last 3 lines match, but also how your first 6 lines match.  
http:#a39761680 assumed that you were satisfied with how your original expression was matching the first 6  lines, and that you only wanted to change it to also match the last 3 lines.
0
 
LVL 9

Expert Comment

by:Derek Jensen
ID: 39772417
Wow, well...you've definitely got a head-scratcher there.
Let me go ahead and post what I was going to say, before I tested out my regex and found it to be much, much more difficult than I first anticipated(I could've sworn I've done this a dozen times before!), and I will close with my proposed solution:


Apologies, I'm not a C# guru, but I do know regex, and if what I'm seeing is correct, http:#a39763594 's post, changing
Status:\s*(?<status>\S*)\s*Action:\s*(?<action>\S*)
to:
Status:\s*(?<status>.*)Action:\s*(?<action>\S*)
still doesn't solve the problem.
The capture of \s* after "Status:" is still a greedy match, and so will still match up to but not including the first letter of "Action:", causing the entire regex to fail to match lines where status is not present.

However, I believe one of the following two changes should work:
Concerning the portion of relevant regex after "Status" to before "Action:",
\s*(?<status>\S*)\s*
should become:
\s*(?<status>.*)?\s*
This tries to make the entire named capture group <status> optional, meaning there may or may not exist at all a var named status after each line, or it may break your regex entirely(preliminary research suggests it won't).

Alternatively:
(?<status>.*?)
This simply turns the capture regex for populating <status> into a non-greedy "match everything" search (.*?). This of course means it's going to capture all the spaces before/after the "Active" or whatever word might be there, so you'll have to strip those out separately.



...Okay, I think I got it! :-D

So forget all the above regexes I said to try, and try this one out instead:
string pattern = @"Send Request\s*(?<send_request>\S+)~\s*Job:\s*(?<job>\S+)\s*Destination:\s*(?<destination>\S+)~\s*State:\s*(?<state>\S+)\s*Status:\s*(?!Action:)(?|(?<status>\S+)\s*?|(?<status>(\s|\S)*?))\s*Action:\s*(?<action>\S+)~\s*Total size:\s*(?<total_size>\d+)(\s|k)*Remaining:\s*(?<remaining>\d+)(\s|k)*Heartbeat:\s*(?<heartbeat>\d\d:\d\d)~\s*Start:\s*(?<start>\d\d:\d\d)\s*Finish:\s*(?<finish>\d\d:\d\d)\s*Retry:\s*(?!SWD)(?|(?<retry>\d\d:\d\d)~|((\s|\S)*?)~)\s*SWD PkgID:\s*(?<package_id>\S+)\s*SWD Pkg Version:\s*(?<package_version>\d+).*";

Open in new window

I also found you were having the same problem with Retry as you were with Status, as your .* after <total_size> was eating up the rest of the line...so I fixed the remaining regex also. :-)
0
 
LVL 26

Accepted Solution

by:
skullnobrains earned 500 total points
ID: 39893127
if my understanding is correct, the problem you have is the 3 last lines do not match at all

if I'm correct, this is how to break it up :

Status:                     matches "Status:"
\s*                           matches all the whitespace
(?<status>\S*)          matches "Action:"
\s*                           matches nothing
Action                      not found so the ereg does not match

i guess switching to ungreedy mode should be enough to make your existing expression work. i'd also change the \s* to \s+ for more safety

of course if you know what might be a valid action, you can always try something like

Status:\s*(?<status>(?:Active|Inactive|))\s*Action

allowing for "Active" "Inactive" or "" statuses
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now