Solved

Regex Help Needed

Posted on 2012-04-04
41
191 Views
Last Modified: 2012-04-04
Can anyone tell me how to write a regular expression that will find the following text in a text file?  The catch is that the 47 will always vary.  The rest of the line will always be exactly as shown here.

HL*47**20*1~

I'm using this snippet in a C#.NET console application to read text files, and replace the string with no characters so it deletes it.

            string filePath = "c:\\36866f-dmh.txt";

            File.WriteAllText(
                filePath,
                File.ReadAllText(filePath)
                .Replace("<regex goes here>", "")
                );
0
Comment
Question by:fcsIT
  • 25
  • 14
  • 2
41 Comments
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
Try:
string filePath = "c:\\36866f-dmh.txt";

File.WriteAllText(
    filePath,
    File.ReadAllText(filePath).Replace(@"HL\*\d+\*\*20\*1~", "")
    );

Open in new window

0
 

Author Comment

by:fcsIT
Comment Utility
Thanks for the quick response wdosanjos!  Unfortunately that didn't work.  It left the full HL*47**20*1~ line in my text file.
0
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
Can you provide a small sample of the file for me to test?
0
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
BTW, is it to remove the whole line or just the HL*47**20*1~ string?
0
 

Author Comment

by:fcsIT
Comment Utility
DTP*123*D8*20120404~
REF*6R*12345C1234567-1234567-1~
HCP*10*100**01AA~
HL*47**20*1~
PRV*BI*ZZ*123AB4567C~
0
 

Author Comment

by:fcsIT
Comment Utility
Well, it's actually to remove an entire block of text, however the *20* near the HL is the identifier for the block to be removed.  I have to build a regular expression that can find the
HL*47**20*1~ string, then remove that string, and everything after it until the code finds the next HL*.

It's all one line of text, tens of thousands of characters long.

This is just the first step, figuring out what regular expression will find the HL*47**20*1~ string.
0
 
LVL 2

Expert Comment

by:JAruchamy
Comment Utility
Try this,

 string s;
s = Regex.Replace("HL*47**20*1~", "HL[*]..[*][*]20[*][1][~]", "Test");
0
 
LVL 2

Expert Comment

by:JAruchamy
Comment Utility
using System.Text.RegularExpressions;


 string s = Regex.Replace(@"DTP*123*D8*20120404~
REF*6R*12345C1234567-1234567-1~
HCP*10*100**01AA~
HL*47**20*1~
PRV*BI*ZZ*123AB4567C~
", "HL[*]..[*][*]20[*][1][~]", "Test");
0
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
Please try this it removes the string and the CR LF that follow, thus deleting the line.
File.WriteAllText(
	filePath,
	Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~\r?\n?", "", RegexOptions.Multiline)
	);

Open in new window

0
 

Author Comment

by:fcsIT
Comment Utility
My VS has pretty much stopped responding.  I have to reboot.  Give me about 10 minutes to reboot and test both of your responses, and I'll post back.
0
 

Author Comment

by:fcsIT
Comment Utility
wdosanjos' response worked!

JAruchamy's response failed at the regular expression piece.  It said "unrecognized escape sequence."  You guys will forever be my heros if you can help me get the two responses combined so that the code finds the block of text beginning with the code wdosanjos posted:

File.WriteAllText(
      filePath,
      Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~\r?\n?", "", RegexOptions.Multiline)
      );

Then replaces from that string all the way through the ending string that I need to be erased.  Here's an example:

HL*47**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~
0
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
What's the marker of the ending string in the above example?
0
 

Author Comment

by:fcsIT
Comment Utility
Ok, I think I have the escape sequence fixed by adding an @ just before "HL.  Now I just need to know how to include the filepath variable in with the rest of it.
0
 

Author Comment

by:fcsIT
Comment Utility
The ~ (tilde).

The block of text will always be exactly as I entered in my example, with the sole exception of the 47 in the HL line.  That number will always change, but you've already resolved that piece.
0
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
OK. What's the expected result of the replacement against sample lines below?
HL*47**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~

Open in new window

0
 

Author Comment

by:fcsIT
Comment Utility
That they are erased.
0
 

Author Comment

by:fcsIT
Comment Utility
Is there something we should be ending after the replace runs?  I can only run debug against this code once, then I have to reboot again because VS keeps telling me:

The operation could not be completed.  The process cannot access the file because it is being used by another process.

I've tried closing VS, logging off and back on, and deleting the text file, then replacing it.  I still get the same error.  Only rebooting seems to fix it.
0
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
OK. I think I got it.  Please try the following.  It removes the HL*47**20*1~ line and the next 6 lines.
File.WriteAllText(
	filePath,
	Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~(\s+.+~){6}\r?\n?", "", RegexOptions.Multiline)
	);

Open in new window

0
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
Regarding the "The process cannot access the file because it is being used by another process" issue, make sure you don't have the file open on another editor window before running your code.
0
 

Author Comment

by:fcsIT
Comment Utility
Well, either that didn't remove any of the desired lines, or my VS just isn't cooperating.

I'm able to get past the "operation could not be completed" error by using the start without debugging command, and the file modified datetime stamp changes to the time that I run the program, but the lines still exist in the text file.

Any ideas?
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
Humm.... I tested on my system a few times with the sample data you provided and it works just fine.   Can you provide a larger sample of your file?
0
 

Author Comment

by:fcsIT
Comment Utility
Ok, let me do another reboot to see if it's my VS causing the issue.  I find it odd that it does say the file has been modified, but it doesn't remove the lines.

Maybe a reboot will fix it.
0
 

Author Comment

by:fcsIT
Comment Utility
Well crap.  I figured out what's going on.  It is changing the file and doing exactly what your code says.  I forgot there is one other instance of a HLxxxx20 line, which has to stay.

Is there a way to get it to ignore the first HL 20 line, but then catch all the rest?  It's rare, but there are times where there could be two instances of that line in a file that are bad.

How do we tell it to ignore the first one, which is always:

HL*1**20*1~

then replace any others in finds in the file as your code already does?
0
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
Try this.  It ignores any HL*1**20*1~ group of records.
File.WriteAllText(
	filePath,
	Regex.Replace(File.ReadAllText(filePath), @"HL\*(1\d+|[02-9]\d*)\*\*20\*1~(\s+.+~){6}\r?\n?", "", RegexOptions.Multiline)
	);

Open in new window

0
 

Author Comment

by:fcsIT
Comment Utility
I am so sorry this isn't easier!  I just tested the new code.  It did leave the HL*1... alone, but it didn't remove the HL further down the file with the 20 in it.
0
 

Author Comment

by:fcsIT
Comment Utility
Also, by specifying the HL*1xxx20 has to stay, will it still catch any other HL strings with a 20 that start with a 1, such as HL*17 or HL*120?
0
 

Author Comment

by:fcsIT
Comment Utility
Here's my full code, just for reference:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;

namespace _837P_4010_to_5010_DMH_PreMappingProcessor
{
    class Program
    {
        static void Main()
        {
            string filePath = "c:\\36866f-dmh.txt";

            File.WriteAllText(
              filePath,
              Regex.Replace(File.ReadAllText(filePath), @"HL\*(1\d+|[2-9]\d*)\*\*20\*1~(\s+.+~){6}\r?\n?", "", RegexOptions.Multiline)
              );
        }
    }
}
0
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
Yes, it takes care of the HL*17 and HL*120 scenarios.  Please post the HL that was supposed to be replaced.  I need to check a few things. Thanks.
0
 

Author Comment

by:fcsIT
Comment Utility
The following is a good block to test with.  It includes the HL*1 that has to stay, then the HL*xx**20*1~ that has to be replaced with two instances.

HL*1**20*1~
PRV*BI*ZZ*456QM7890X~
HL*47**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~
HL*48**22*1~
SBR*~
HL*49**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~
HL*50**22*1~
SBR*~
0
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
OK.  That helps.  Please try the following:
File.WriteAllText(
	filePath,
	Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~(\s{1,2}(PRV|NM1|N3|N4|REF|PER).+){6}~\r?\n?", "", RegexOptions.Multiline)
	);

Open in new window

Input:
HL*1**20*1~
PRV*BI*ZZ*456QM7890X~
HL*47**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~
HL*48**22*1~
SBR*~
HL*49**20*1~
PRV*BI*ZZ*123QM1234X~
NM1*85*2*TEXT*****XX*1234567890~
N3*1234 S. MYROAD*SUITE 123~
N4*CITY*ST*123456789~
REF*EI*1234567890~
PER*IC*JOHN DOE*TE*5551234567~
HL*50**22*1~
SBR*~

Open in new window

Output:
HL*1**20*1~
PRV*BI*ZZ*456QM7890X~
HL*48**22*1~
SBR*~
HL*50**22*1~
SBR*~

Open in new window

0
 

Author Comment

by:fcsIT
Comment Utility
Well, I don't get it.  Obviously the same code I'm running is working for you, but it's not for me, so there must be something wrong in my setup.  (Sorry it took so long for me to reply, I tried everything I could think of to get the same result you did, but I had no luck.)

I created a C#.net console application.  Is that the same thing you're using?

Here's the complete code I'm running:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;

namespace _837P_4010_to_5010_DMH_PreMappingProcessor
{
    class Program
    {
        static void Main()
        {
            string filePath = "c:\\36866f-dmh.txt";

            File.WriteAllText(
            filePath,
            Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~(\s{1,2}(PRV|NM1|N3|N4|REF|PER).+){6}~\r?\n?", "", RegexOptions.Multiline)
            );
        }
    }
}
0
 

Author Comment

by:fcsIT
Comment Utility
Something is also causing it to remain in memory after I run it the first time.  I can completely close out of VS, but I still have a process in the Task Manager running for the name of the program, which is what's causing the file already in use error.  I have to close VS, kill the process, restart VS each time to run the program.

I'm using VS 2010.
0
 

Author Comment

by:fcsIT
Comment Utility
Can you please post your entire source so I can see if that will run for me?
0
 
LVL 23

Accepted Solution

by:
wdosanjos earned 500 total points
Comment Utility
There you go:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication6
{
    class Program
    {
        static void Main(string[] args)
        {
            var filePath = @"C:\temp\36866f-dmh.txt";

            File.WriteAllText(
                filePath,
                Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~(\s{1,2}(PRV|NM1|N3|N4|REF|PER).+){6}~\r?\n?", "", RegexOptions.Multiline)
                );
        }
    }
}

Open in new window

0
 

Author Comment

by:fcsIT
Comment Utility
I don't know.  I copied your code, created a new console application, called ConsoleApplication6, pasted your code, and it still won't modify the file.  (It says it did as far as the timestamp goes, but nothing in the file changes.)

I really appreciate all your help and sticking with this.  I'll accept your solution as the answer since it works for you.  I just have no clue why it won't work for me.

Steve
0
 

Author Comment

by:fcsIT
Comment Utility
Wait a second!  I just thought to save my text file with carriage returns in it, and it did modify the file!

(The actual file is all one line, just one huge line.)

It did modify it, but it just removed the HL lines that had a 20 on them, not the rest of the lines.

Maybe we can knock this out!  :)
0
 

Author Comment

by:fcsIT
Comment Utility
Strange, I just tested that further.  The code doesn't work when everything is on one line, but it has to.  I changed the RegexOptions to be Singleline, but it still didn't modify it.

Any clues on that one?
0
 

Author Comment

by:fcsIT
Comment Utility
Here's more good news!  I reverted back to a previous version of your expression and got it to work with carriage returns in the file too, so maybe it's easier to add a carriage return after each tilde (~), then run the regular expression, than to make the regular expression work without carriage returns??

Working code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication6
{
    class Program
    {
        static void Main(string[] args)
        {
            var filePath = @"C:\36866f-dmh.txt";

            File.WriteAllText(
              filePath,
              Regex.Replace(File.ReadAllText(filePath), @"HL\*(1\d+|[2-9]\d*)\*\*20\*1~(\s+.+~){6}\r?\n?", "", RegexOptions.Multiline)
              );
        }
    }
}
0
 

Author Comment

by:fcsIT
Comment Utility
GOT IT!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


Code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Text.RegularExpressions;

namespace ConsoleApplication6
{
    class Program
    {
        static void Main(string[] args)
        {
            var filePath = @"C:\36866f-dmh.txt";

            File.WriteAllText(
              filePath,
              Regex.Replace(File.ReadAllText(filePath), @"~", "~\n", RegexOptions.Multiline)
              );

            File.WriteAllText(
              filePath,
              Regex.Replace(File.ReadAllText(filePath), @"HL\*(1\d+|[2-9]\d*)\*\*20\*1~(\s+.+~){6}\r?\n?", "", RegexOptions.Multiline)
              );
        }
    }
}



THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU
0
 
LVL 23

Expert Comment

by:wdosanjos
Comment Utility
Ah. That explains it.  Let's try one more thing:
File.WriteAllText(
	filePath,
	Regex.Replace(File.ReadAllText(filePath), @"HL\*\d+\*\*20\*1~((PRV|NM1|N3|N4|REF|PER)[^~]+~){6}", "")
	);

Open in new window

0
 

Author Comment

by:fcsIT
Comment Utility
No luck there.  That one erased each HLxxx20 string, including the first one I need.

I can't tell you how much I appreciate you!!  I'd take you to lunch if we were coworkers! :)

You ROCK!!!!
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Today is the age of broadband.  More and more people are going this route determined to experience the web and it’s multitude of services as quickly and painlessly as possible. Coupled with the move to broadband, people are experiencing the web via …
In .NET 2.0, Microsoft introduced the Web Site.  This was the default way to create a web Project in Visual Studio 2005.  In Visual Studio 2008, the Web Application has been restored as the default web Project in Visual Studio/.NET 3.x The Web Si…
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now