Hello,
NOTE: I am a C# programmer - if you give a solution with UNIX or PERL formatting I won't know what to do with it. Please observe the forum I am asking this question in (C#). Thanks!
I am writing a parsing module using regex's and ran across a snag: when my input contains new lines characters my regex stops working. I think best explaination is with code:
[Test]
public void ApplyStartAndStopAnchors()
{
string urlContent = "abcdefghijklmnopqrstuvwxy
z";
string regexString = "mno";
string regexStartString = "def";
string regexStopString = "stu";
RegexGenerator generator =
new RegexGenerator(new List<string>(), regexString, regexStartString, regexStopString, urlContent);
generator.ApplyStartAndSto
pAnchors()
;
Assert.AreEqual("ghijklmno
pqr", generator.UrlContent); // PASSES
}
And this example wouldn't be complete without generator.ApplyStartAndSto
pAnchors()
(this being where I need your help)
public void ApplyStartAndStopAnchors()
{
string subRegex = @"(?<=" + _regexStartString + ")" + "(.+)" + "(?=" + _regexStopString + ")";
Regex r = new Regex(subRegex);
Match m = r.Match(UrlContent);
if (m.Success)
UrlContent = m.Groups[1].Value;
}
Now here's the catch, if I use multiline input:
Input text file: (MultiLineRegexTest.txt)
<start>
abcdefghij
klmnop
qrstuvw
xyz
<stop>
Which is just the English alphabet spread across 4 lines.
Now, when I read this input text file it contains new line characters. Here is a test showing the result:
[Test]
public void ApplyStartAndStopAnchorsOn
TestFileWi
thMultiple
Lines()
{
FileStream fs = FileSupport.OpenFile(@"../
../../Test
files/Mult
iLineRegex
Test.txt")
;
string content = FileSupport.GetTextFileAsS
tring(fs);
string regexString = "mno";
string regexStartString = "def";
string regexStopString = "stu";
RegexGenerator generator =
new RegexGenerator(new List<string>(), regexString, regexStartString, regexStopString, content);
generator.ApplyStartAndSto
pAnchors()
;
Assert.AreEqual("ghijklmno
pqr", generator.UrlContent); // FAILS
}
And the output of the failing Assert.AreEqual statement is:
<start>
RegexGeneratorTestFixture.
ApplyStart
AndStopAnc
horsOnTest
FileWithMu
ltipleLine
s : FailedNUnit.Framework.Asse
rtionExcep
tion:
String lengths differ. Expected length=12, but was length=32.
Strings differ at index 0.
expected:<"ghijklmnopqr">
but was:<"abcdefghij\r\nklmnop
\r\nqrstuv
w\r\nxyz">
-----------^
<stop>
Notice \r\n in the "but was" output? Those are the new line characters. They make it so my Regex in ApplyStartAndStopAnchors()
doesn't work anymore.
Does anyone know how to fix this?
Much thanks,
sapbucket
Start Free Trial