Regex Help Needed

Can anyone tell me how to write a regular expression that will find the following text in a text file?  The catch is that the 47 will always vary.  The rest of the line will always be exactly as shown here.

HL*47**20*1~

I'm using this snippet in a C#.NET console application to read text files, and replace the string with no characters so it deletes it.

            string filePath = "c:\\36866f-dmh.txt";

            File.WriteAllText(
                filePath,
                File.ReadAllText(filePath)
                .Replace("<regex goes here>", "")
                );
fcsITAsked:
Who is Participating?
 
wdosanjosConnect With a Mentor Commented:
There you go:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication6
{
    class Program
    {
        static void Main(string[] args)
        {
            var filePath = @"C:\temp\36866f-dmh.txt";

            File.WriteAllText(
                filePath,
                Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~(\s{1,2}(PRV|NM1|N3|N4|REF|PER).+){6}~\r?\n?", "", RegexOptions.Multiline)
                );
        }
    }
}

Open in new window

0
 
wdosanjosCommented:
Try:
string filePath = "c:\\36866f-dmh.txt";

File.WriteAllText(
    filePath,
    File.ReadAllText(filePath).Replace(@"HL\*\d+\*\*20\*1~", "")
    );

Open in new window

0
 
fcsITAuthor Commented:
Thanks for the quick response wdosanjos!  Unfortunately that didn't work.  It left the full HL*47**20*1~ line in my text file.
0
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

 
wdosanjosCommented:
Can you provide a small sample of the file for me to test?
0
 
wdosanjosCommented:
BTW, is it to remove the whole line or just the HL*47**20*1~ string?
0
 
fcsITAuthor Commented:
DTP*123*D8*20120404~
REF*6R*12345C1234567-1234567-1~
HCP*10*100**01AA~
HL*47**20*1~
PRV*BI*ZZ*123AB4567C~
0
 
fcsITAuthor Commented:
Well, it's actually to remove an entire block of text, however the *20* near the HL is the identifier for the block to be removed.  I have to build a regular expression that can find the
HL*47**20*1~ string, then remove that string, and everything after it until the code finds the next HL*.

It's all one line of text, tens of thousands of characters long.

This is just the first step, figuring out what regular expression will find the HL*47**20*1~ string.
0
 
JAruchamyCommented:
Try this,

 string s;
s = Regex.Replace("HL*47**20*1~", "HL[*]..[*][*]20[*][1][~]", "Test");
0
 
JAruchamyCommented:
using System.Text.RegularExpressions;


 string s = Regex.Replace(@"DTP*123*D8*20120404~
REF*6R*12345C1234567-1234567-1~
HCP*10*100**01AA~
HL*47**20*1~
PRV*BI*ZZ*123AB4567C~
", "HL[*]..[*][*]20[*][1][~]", "Test");
0
 
wdosanjosCommented:
Please try this it removes the string and the CR LF that follow, thus deleting the line.
File.WriteAllText(
	filePath,
	Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~\r?\n?", "", RegexOptions.Multiline)
	);

Open in new window

0
 
fcsITAuthor Commented:
My VS has pretty much stopped responding.  I have to reboot.  Give me about 10 minutes to reboot and test both of your responses, and I'll post back.
0
 
fcsITAuthor Commented:
wdosanjos' response worked!

JAruchamy's response failed at the regular expression piece.  It said "unrecognized escape sequence."  You guys will forever be my heros if you can help me get the two responses combined so that the code finds the block of text beginning with the code wdosanjos posted:

File.WriteAllText(
      filePath,
      Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~\r?\n?", "", RegexOptions.Multiline)
      );

Then replaces from that string all the way through the ending string that I need to be erased.  Here's an example:

HL*47**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~
0
 
wdosanjosCommented:
What's the marker of the ending string in the above example?
0
 
fcsITAuthor Commented:
Ok, I think I have the escape sequence fixed by adding an @ just before "HL.  Now I just need to know how to include the filepath variable in with the rest of it.
0
 
fcsITAuthor Commented:
The ~ (tilde).

The block of text will always be exactly as I entered in my example, with the sole exception of the 47 in the HL line.  That number will always change, but you've already resolved that piece.
0
 
wdosanjosCommented:
OK. What's the expected result of the replacement against sample lines below?
HL*47**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~

Open in new window

0
 
fcsITAuthor Commented:
That they are erased.
0
 
fcsITAuthor Commented:
Is there something we should be ending after the replace runs?  I can only run debug against this code once, then I have to reboot again because VS keeps telling me:

The operation could not be completed.  The process cannot access the file because it is being used by another process.

I've tried closing VS, logging off and back on, and deleting the text file, then replacing it.  I still get the same error.  Only rebooting seems to fix it.
0
 
wdosanjosCommented:
OK. I think I got it.  Please try the following.  It removes the HL*47**20*1~ line and the next 6 lines.
File.WriteAllText(
	filePath,
	Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~(\s+.+~){6}\r?\n?", "", RegexOptions.Multiline)
	);

Open in new window

0
 
wdosanjosCommented:
Regarding the "The process cannot access the file because it is being used by another process" issue, make sure you don't have the file open on another editor window before running your code.
0
 
fcsITAuthor Commented:
Well, either that didn't remove any of the desired lines, or my VS just isn't cooperating.

I'm able to get past the "operation could not be completed" error by using the start without debugging command, and the file modified datetime stamp changes to the time that I run the program, but the lines still exist in the text file.

Any ideas?
0
 
wdosanjosCommented:
Humm.... I tested on my system a few times with the sample data you provided and it works just fine.   Can you provide a larger sample of your file?
0
 
fcsITAuthor Commented:
Ok, let me do another reboot to see if it's my VS causing the issue.  I find it odd that it does say the file has been modified, but it doesn't remove the lines.

Maybe a reboot will fix it.
0
 
fcsITAuthor Commented:
Well crap.  I figured out what's going on.  It is changing the file and doing exactly what your code says.  I forgot there is one other instance of a HLxxxx20 line, which has to stay.

Is there a way to get it to ignore the first HL 20 line, but then catch all the rest?  It's rare, but there are times where there could be two instances of that line in a file that are bad.

How do we tell it to ignore the first one, which is always:

HL*1**20*1~

then replace any others in finds in the file as your code already does?
0
 
wdosanjosCommented:
Try this.  It ignores any HL*1**20*1~ group of records.
File.WriteAllText(
	filePath,
	Regex.Replace(File.ReadAllText(filePath), @"HL\*(1\d+|[02-9]\d*)\*\*20\*1~(\s+.+~){6}\r?\n?", "", RegexOptions.Multiline)
	);

Open in new window

0
 
fcsITAuthor Commented:
I am so sorry this isn't easier!  I just tested the new code.  It did leave the HL*1... alone, but it didn't remove the HL further down the file with the 20 in it.
0
 
fcsITAuthor Commented:
Also, by specifying the HL*1xxx20 has to stay, will it still catch any other HL strings with a 20 that start with a 1, such as HL*17 or HL*120?
0
 
fcsITAuthor Commented:
Here's my full code, just for reference:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;

namespace _837P_4010_to_5010_DMH_PreMappingProcessor
{
    class Program
    {
        static void Main()
        {
            string filePath = "c:\\36866f-dmh.txt";

            File.WriteAllText(
              filePath,
              Regex.Replace(File.ReadAllText(filePath), @"HL\*(1\d+|[2-9]\d*)\*\*20\*1~(\s+.+~){6}\r?\n?", "", RegexOptions.Multiline)
              );
        }
    }
}
0
 
wdosanjosCommented:
Yes, it takes care of the HL*17 and HL*120 scenarios.  Please post the HL that was supposed to be replaced.  I need to check a few things. Thanks.
0
 
fcsITAuthor Commented:
The following is a good block to test with.  It includes the HL*1 that has to stay, then the HL*xx**20*1~ that has to be replaced with two instances.

HL*1**20*1~
PRV*BI*ZZ*456QM7890X~
HL*47**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~
HL*48**22*1~
SBR*~
HL*49**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~
HL*50**22*1~
SBR*~
0
 
wdosanjosCommented:
OK.  That helps.  Please try the following:
File.WriteAllText(
	filePath,
	Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~(\s{1,2}(PRV|NM1|N3|N4|REF|PER).+){6}~\r?\n?", "", RegexOptions.Multiline)
	);

Open in new window

Input:
HL*1**20*1~
PRV*BI*ZZ*456QM7890X~
HL*47**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~
HL*48**22*1~
SBR*~
HL*49**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~
HL*50**22*1~
SBR*~

Open in new window

Output:
HL*1**20*1~
PRV*BI*ZZ*456QM7890X~
HL*48**22*1~
SBR*~
HL*50**22*1~
SBR*~

Open in new window

0
 
fcsITAuthor Commented:
Well, I don't get it.  Obviously the same code I'm running is working for you, but it's not for me, so there must be something wrong in my setup.  (Sorry it took so long for me to reply, I tried everything I could think of to get the same result you did, but I had no luck.)

I created a C#.net console application.  Is that the same thing you're using?

Here's the complete code I'm running:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;

namespace _837P_4010_to_5010_DMH_PreMappingProcessor
{
    class Program
    {
        static void Main()
        {
            string filePath = "c:\\36866f-dmh.txt";

            File.WriteAllText(
            filePath,
            Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~(\s{1,2}(PRV|NM1|N3|N4|REF|PER).+){6}~\r?\n?", "", RegexOptions.Multiline)
            );
        }
    }
}
0
 
fcsITAuthor Commented:
Something is also causing it to remain in memory after I run it the first time.  I can completely close out of VS, but I still have a process in the Task Manager running for the name of the program, which is what's causing the file already in use error.  I have to close VS, kill the process, restart VS each time to run the program.

I'm using VS 2010.
0
 
fcsITAuthor Commented:
Can you please post your entire source so I can see if that will run for me?
0
 
fcsITAuthor Commented:
I don't know.  I copied your code, created a new console application, called ConsoleApplication6, pasted your code, and it still won't modify the file.  (It says it did as far as the timestamp goes, but nothing in the file changes.)

I really appreciate all your help and sticking with this.  I'll accept your solution as the answer since it works for you.  I just have no clue why it won't work for me.

Steve
0
 
fcsITAuthor Commented:
Wait a second!  I just thought to save my text file with carriage returns in it, and it did modify the file!

(The actual file is all one line, just one huge line.)

It did modify it, but it just removed the HL lines that had a 20 on them, not the rest of the lines.

Maybe we can knock this out!  :)
0
 
fcsITAuthor Commented:
Strange, I just tested that further.  The code doesn't work when everything is on one line, but it has to.  I changed the RegexOptions to be Singleline, but it still didn't modify it.

Any clues on that one?
0
 
fcsITAuthor Commented:
Here's more good news!  I reverted back to a previous version of your expression and got it to work with carriage returns in the file too, so maybe it's easier to add a carriage return after each tilde (~), then run the regular expression, than to make the regular expression work without carriage returns??

Working code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication6
{
    class Program
    {
        static void Main(string[] args)
        {
            var filePath = @"C:\36866f-dmh.txt";

            File.WriteAllText(
              filePath,
              Regex.Replace(File.ReadAllText(filePath), @"HL\*(1\d+|[2-9]\d*)\*\*20\*1~(\s+.+~){6}\r?\n?", "", RegexOptions.Multiline)
              );
        }
    }
}
0
 
fcsITAuthor Commented:
GOT IT!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


Code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication6
{
    class Program
    {
        static void Main(string[] args)
        {
            var filePath = @"C:\36866f-dmh.txt";

            File.WriteAllText(
              filePath,
              Regex.Replace(File.ReadAllText(filePath), @"~", "~\n", RegexOptions.Multiline)
              );

            File.WriteAllText(
              filePath,
              Regex.Replace(File.ReadAllText(filePath), @"HL\*(1\d+|[2-9]\d*)\*\*20\*1~(\s+.+~){6}\r?\n?", "", RegexOptions.Multiline)
              );
        }
    }
}




0
 
wdosanjosCommented:
Ah. That explains it.  Let's try one more thing:
File.WriteAllText(
	filePath,
	Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~((PRV|NM1|N3|N4|REF|PER)[^~]+~){6}", "")
	);

Open in new window

0
 
fcsITAuthor Commented:
No luck there.  That one erased each HLxxx20 string, including the first one I need.

I can't tell you how much I appreciate you!!  I'd take you to lunch if we were coworkers! :)

You ROCK!!!!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.