# Help with creating a Regular Expression

Hey all,

Trying to repost this question worded better to get an answer closer to what I want.  I'm using Visual Studio 2005, C#, .Net Framework 2.0

Here is the text i'm parsing from (just a made up sample):  "heading 90 degrees east, crossing a junction found on the south side at 14 mm, a total distance of 27 mm to the intersection"

In that example, what I need to pull out is "east 27 mm" or "east a total distance of 27 mm".  Either one will work for what i'm doing with CAD instructions.  I do NOT want to pull out "east, crossing a junction found on the south side at 14mm, a total distance of 27 mm", that really doesn't help me much.  To do that you can just do (east).*a total distance of.* or some such which isnt what I need.

Nathan
LVL 2
###### Who is Participating?

Commented:
What you really want to do here is use groups.  I'm a little rusty on the C#, but I think it's something to this effect:

string input = "stuff you have above";
string re = @"(east|west|north|south).*?total distance[^\d]*(\d+)\s+(\w+)";
Match m = Regex.Match(input , re);
string finalString = m.Groups[1] + " " + m.Groups[2] + " " + m.Groups[3];

And the value of finalString should be exactly what you are looking for "east 27 mm".  Note that this accounts for different directions (north, south, west).
0

Author Commented:
Forgot to mention, in that example the "east" and "a total distance of" can be assumed to always be fixed, they will never change so you can use them as search keys.
0

Author Commented:
I seem to be getting somewhere with using something like: "\ba total distance of\b\s*\d*\s*\w*" which would output for me "a total distance of 27 mm" out of that string.  The only issue is pairing it up with the east, how can I account for all the junk text in between the two areas that i'm searching for?
0

Author Commented:
Thanks I will check this tonight when I get home and see if it works as I was wanting.
0

Author Commented:
Tested the regex some and with some tweaks it seems to pull some of the right area, but then i'd need post processing like you are doing in C#..  Ideally that shouldn't happen because it is a large input file with lots of other matches so knowing when to employ "special case #1" or what not is going to be extremely difficult and another coding nightmare in and of itself.
0

Author Commented:
Thanks, that seems to be the only real way to do it is with the C# outer processing.  Its too bad you can't search using a backreference by doing a regex on the contents of an overall backreference from within an expression (i.e. do what the groups do but with only regex to do it).
0

Commented:
If that is the case, then yes, it would have to be changed but I wrote that based on what you said you were trying to do.  Without using groupings, there is no way you can extract "east 27 mm" with a regular expression.  I obviously don't know what the variations are in the lines you are trying to extract, but regular expressions can be very, very flexible in getting only what you want when you know how to write them.
0

Commented:
You can do back references and forward references actually within regular expressions.  The only thing that makes it so it will require the group processing is the fact that that is the only way to take pieces out of a string.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.