Link to home
Start Free TrialLog in
Avatar of DuNuNuBatman
DuNuNuBatman

asked on

Capturing everything before a \n

I attached the text I need to parse. What I need to do is capture each line individually (there should be 4) using regular expressions. Here is the regex I have now: \n.*
It will capture everything but not the top line.

1:5 1-19-08  flags 803BE180 hg0 10.936 hg2 3.864 hgt 14.800 rfint 75355.400 intt 28.175 rctt 45.453 prbt 0.000 cnvt 0.000 umbt 0.000 vntp -3.076 orfp -0.556 dilp -11.111 bbkp -11.111 edup -11.111 vac 33.933 smplf 0.000 pmtv 647.871 pres 626.082 dilf 20.000 obkg 4.810 tblg 6.304 ocoef 0.793 tcoef 1.343 hg81 0.000 lampt 43.446 oxyt 0.000 fsafe 0.000
1:6 1-19-08  flags 803BE180 hg0 10.936 hg2 3.864 hgt 14.800 rfint 75355.400 intt 28.175 rctt 45.453 prbt 0.000 cnvt 0.000 umbt 0.000 vntp -3.076 orfp -0.556 dilp -11.111 bbkp -11.111 edup -11.111 vac 33.933 smplf 0.000 pmtv 647.871 pres 626.082 dilf 20.000 obkg 4.810 tblg 6.304 ocoef 0.793 tcoef 1.343 hg81 0.000 lampt 43.446 oxyt 0.000 fsafe 0.000
1:7 1-19-08  flags 803BE180 hg0 10.936 hg2 3.864 hgt 14.800 rfint 75355.400 intt 28.175 rctt 45.453 prbt 0.000 cnvt 0.000 umbt 0.000 vntp -3.076 orfp -0.556 dilp -11.111 bbkp -11.111 edup -11.111 vac 33.933 smplf 0.000 pmtv 647.871 pres 626.082 dilf 20.000 obkg 4.810 tblg 6.304 ocoef 0.793 tcoef 1.343 hg81 0.000 lampt 43.446 oxyt 0.000 fsafe 0.000
1:8 1-19-08  flags 803BE180 hg0 10.936 hg2 3.864 hgt 14.800 rfint 75355.400 intt 28.175 rctt 45.453 prbt 0.000 cnvt 0.000 umbt 0.000 vntp -3.076 orfp -0.556 dilp -11.111 bbkp -11.111 edup -11.111 vac 33.933 smplf 0.000 pmtv 647.871 pres 626.082 dilf 20.000 obkg 4.810 tblg 6.304 ocoef 0.793 tcoef 1.343 hg81 0.000 lampt 43.446 oxyt 0.000 fsafe 0.000

Open in new window

Avatar of ddrudik
ddrudik
Flag of United States of America image

.*

(where . would match any char except \n - as long as no RegexOptions.Singleline flag is set )
Avatar of DuNuNuBatman
DuNuNuBatman

ASKER

That matches everything. I need to have a MatchCollection consisting of each row seperated.
Note that there is also a flag called RegexOptions.Multiline that you could set to treat the ^$ boundaries per line.
Try:
using System;
using System.Text.RegularExpressions;
namespace myapp
{
  class Class1
    {
      static void Main(string[] args)
        {
          String sourcestring = "source string to match with pattern";
          Regex re = new Regex(@".*");
          MatchCollection mc = re.Matches(sourcestring);
          int mIdx = 0;
          foreach (Match m in mc)
           {
            for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
              {
                Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
              }
          }
        }
    }
}

Or:
using System;
using System.Text.RegularExpressions;
namespace myapp
{
  class Class1
    {
      static void Main(string[] args)
        {
          String sourcestring = "source string to match with pattern";
          Regex re = new Regex(@"^.*$",RegexOptions.Multiline);
          MatchCollection mc = re.Matches(sourcestring);
          int mIdx = 0;
          foreach (Match m in mc)
           {
            for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
              {
                Console.WriteLine("[" + mIdx + "][" + re.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
              }
          }
        }
    }
}
Is there a way to do it using just regular expressions? The regex is part of the config file. I don't have access to the actual source code for the project at the moment.
ASKER CERTIFIED SOLUTION
Avatar of ddrudik
ddrudik
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
That was almost exactly what I was looking for.
Here is the final expression: .*\r\n
It seems to be working right. Do you see any issue with this?
.*\r\n would only catch the first lines 1-3, assuming that 4 does not end in \r\n.  That's what |$ is meant to catch.
If you concatenated chr(13) & chr(10) to the string before processing it, then .*\r\n would work fine (since that would add a \r\n to the last line).
got it, I'll make the changes to mine. Thanks!
If you could specify something else about the start of the lines you could avoid conc. \r\n if you use something like:
\d+:.*?(?=\r\n|$)
Thanks for the question and the points.