Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 93
  • Last Modified:

Regular Expression help needed to find and replace specific numbers in a text file.

I have a very large text file, hundreds of thousands of rows, that repeats the same 10 types rows for each new record in the file, where each row contains different information about a given record.  The file is a combination of nearly 200 other text files that I combined to prevent our employees from having to deal with hundreds of files.  Combining the file means I have to renumber specific lines in the file to be sequential.  Each row I need to find begins with HL.

Here's an example of a few rows I have to find and renumber:
HL*65*2*22*0~
HL*104*2*22*0~
HL*8*22*2*0~
HL*1052*2*0~

I have to replace the numeric value found after the first *, so 65, 104, 8 and 1052 in the examples shown above.  The rest of the strings has to be left alone.

Does anyone know how to do this?

Note - I'll be doing this find and replace in a C#.NET console application.
0
fcsIT
Asked:
fcsIT
  • 9
  • 8
  • 3
  • +1
1 Solution
 
it_saigeDeveloperCommented:
These are EDI markers used to signify the Hierarchial Level, which are normally file specific and not unique (in other words, multiple files will use the same HL's).  These are used by the EDI specification to mark where in the EDI hierarchy the proceeding information can be found.  As such I, personally, would first ensure that the EDI specification you are using has a file/loop separation marker.  If you find that it does, I would employ that marker when joining the files.

-saige-
0
 
Fernando SotoCommented:
How many values will be replace in the file at any given time? Is the lines / rows in the file in any order or are they unordered?
0
 
fcsITAuthor Commented:
You are correct, these are Hierarchical Level markers for EDI files, specifically HIPAA 270 files, however there's not a separation marker available in the 270 spec that I can find.
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
fcsITAuthor Commented:
The rows are in a very specific order and can't be modified or the file will be rejected by the state EDI system.  The number of rows that will be changed will vary slightly each time this is ran, but it's roughly 10% of the total lines in the file.

The file I'm working with right now has almost 425,000 lines, so roughly 42,500 will be modified by the needed regular expression.
0
 
MotKohnCommented:
^HL\*(\d+).*$

Open in new window

in multiline mode will get the number into group 1 then replace with different number.
0
 
fcsITAuthor Commented:
MotKohn, thank you for helping!  Unfortunately, this regex didn't do what I needed it to.

It found all of the HL lines, but instead of removing the line counter portion of them, and writing a new sequential value to them, it actually deleted all of the HL lines from the file.

Here's my code.  It's using Multiline mode.

File.WriteAllText(
file,
Regex.Replace(File.ReadAllText(file), @"^HL\*(\d+).*$", "", RegexOptions.Multiline)
);

Open in new window

0
 
Fernando SotoCommented:
Hi fcsIT;

I do not believe that using Regular Expressions will do what you want it to do. Regular Expressions will find parts of strings that match a particular pattern. Regular Expressions will also find a pattern and replace that pattern with another value but it will not replace each one of the patterns with a different value.

Do you have before hand a list of the rows that need to be updated and the value it needs to update it to?
0
 
fcsITAuthor Commented:
I don't have a specific list of just the HL lines, no.

These are EDI files that are dumped out of a system we have into a very specific file format used by the government.  The HL lines are just one type of line in the file, along with a couple dozen other line types.  As for the value they need to be updated to, all that matters there is that they're sequential from the beginning of the file to the end, so HL*1, HL*2, HL*3, HL*4 etc.  It doesn't matter what value an HL line was when it comes to changing it to a different value, they just have to be in order.
0
 
Fernando SotoCommented:
In your file are all the HL*?? line in sequential order or are they stored in random order?
0
 
fcsITAuthor Commented:
Well, when they're dumped out of the system, they're sequential, but since I've combined 190 files dumped from the system, they're no longer sequential (hence my problem).

The numbering now resets 190 times in my file, and ends at different numbers each time, based on how big each of the original 190 files was.
0
 
Fernando SotoCommented:
Well from your last post the only way to add a new row with the next sequence number is to first order the HL*?? rows in the file then go to the last row in the ordered list and increment that HL*?? and assign it to the new line and store it at the end of the file.
0
 
fcsITAuthor Commented:
Sorry, I'm not explaining the need very well I don't think.

Here's a sample of the layout of the file:

ISA*00*
GS*HS*
ST*270*
BHT*0022*
HL*1**20*1~
NM1*PR2*
HL*2*1*21*1~
MN1*1P*2*
HL*3*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*4*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*5*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*1**20*1~
NM1*PR2*
HL*2*1*21*1~
MN1*1P*2*
HL*3*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*4*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*5*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*6*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~

And on and on.  (I abbreviated all of the lines except the HL ones.)

I don't need something that looks at the current HL*1, HL*2, etc values.  I need something that overwrites them, regardless of what they are, beginning with HL*3.  (The HL*1 and HL*2 are special lines that cannot be changed.)  I also don't need new HL lines.  What's needed is to overwrite the counter in the existing HL lines with a sequential number beginning with 3.

These are HIPAA EDI file formats.  There's nothing simple about them.
0
 
Fernando SotoCommented:
Please explain as in an algorithm the steps you need to accomplish to achieve your goals. For example,

1. Find All HL*4XXXXX
2. Replace All 4 digit 4 in HL*4XXXXX to 5 so that all HL*4XXXXX now look like HL*5XXXXX
3. Save the file.
0
 
fcsITAuthor Commented:
1. Find all HL* lines beginning with HL*3.
2. Replace the value found between the first and second asterisks (*) in each HL line with a new auto-incrementing value beginning with 3.
3. Save the file.


Note: There are no leading zeros for these values, so it will range from a single digit number up to four digit numbers.
0
 
Fernando SotoCommented:
So in step 2 the auto-incrementing number starts with 3 and increments by 1 each time until you have no more rows to re-number. Correct. So from your example it will end up looking as follows.
ISA*00*
GS*HS*
ST*270*
BHT*0022*
HL*1**20*1~
NM1*PR2*
HL*2*1*21*1~
MN1*1P*2*
HL*3*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*4*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*5*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*1**20*1~
NM1*PR2*
HL*2*1*21*1~
MN1*1P*2*
HL*6*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*7*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*8*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~
HL*9*2*22*0~
TRN*1*
MN1*IL*
REF*SY*
N3*
N4*
DMG*D8*
DTP*291
EQ*30~
III*ZZ*53~

Open in new window

0
 
fcsITAuthor Commented:
You are correct.
0
 
MotKohnCommented:
warning i did not debug this but just to give an idea:
        int x = 1;
        public void doReplace()
        {
            string txt = File.ReadAllText(file);
            txt = Regex.Replace(txt, @"^HL\*(\d+).*$", new MatchEvaluator(this.m), RegexOptions.Multiline);
            File.WriteAllText(file, txt);
        }
        private string m(Match match)
        {
            string s = match.Value;
            s = s.Remove(match.Groups[1].Index, match.Groups[1].Length);
            s = s.Insert(match.Groups[1].Index, x++.ToString());
            return s;
        }

Open in new window

0
 
Fernando SotoCommented:
Hi fcsIT;

The following code should do what you need.
private List<StringBuilder> HIPAA = new List<StringBuilder>();
private List<StringBuilder> HL_rows;

// Load the lines from your file into the List<StringBuilder> HIPAA each line as a StringBuilder object.
File.ReadLines( "C:/Working Directory/HIPAA-EDI.txt" ).ToList().ForEach(r => HIPAA.Add(new StringBuilder().Append(r)));
// Find all the lines that need to be modified and load them into HL_rows
HL_rows = HIPAA.Where( r =>
            r.ToString().StartsWith( "HL*" ) &&
            ( r.ToString( ).Substring( 2, 3 ) != "*1*" && r.ToString( ).Substring( 2, 3 ) != "*2*" ) 
        ).ToList( );

// keeps track of the next sequence number to use.
var seqNo = 3;
// Resequence all the found lines
for ( int row = 0; row < HL_rows.Count; row++ ) {
    int secondAsterisk = HL_rows[row].ToString().IndexOf( "*", 3 );
    HL_rows[row].Remove( 3, secondAsterisk - 3 ).Insert( 3, seqNo.ToString( ) );
    seqNo++;
}

// Open a StreamWriter to write all the lines back to the file
StreamWriter sw = new StreamWriter("C:/Working Directory/HIPAA-EDI-New.txt");
// Write the lines back to the file system
HIPAA.ForEach( r => sw.WriteLine( r.ToString( ) ) );

// File clean up
sw.Flush( );
sw.Close( );

Open in new window

0
 
fcsITAuthor Commented:
You NAILED it!  I'd buy you lunch if you were here.  Thank you so much!
0
 
Fernando SotoCommented:
Not a problem fcsIT, glad I was able to help. Have a great day.
0
 
MotKohnCommented:
This is a simpler version if anybody is interested:
txt = Regex.Replace(txt, @"^(HL\*)(\d+)(.*)$", new MatchEvaluator(match => match.Groups[1].Value + x++ + match.Groups[3].Value), RegexOptions.Multiline);

Open in new window

0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

  • 9
  • 8
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now