• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 4839
  • Last Modified:

Trying to replace characters in a text file by position on line (not pattern or delimiter) in unix shell scripting

I am trying to develop a shell script on AIX 5.3.5 / ksh or ksh93, which will take a text file with fixed-length rows ('records'), and on each row replace certain characters, based on position, with spaces.
For example, I may want to replace characters 450 through 470 with 20 blank spaces.

Problem I've encountered is that most unix utilities are based on delimited fields, not position-defined fields. I've tried to cludge it in with regexes:
CTC_PS_BLANKCMD="sed s/^\(.\{$CTC_PS_BEG\}\)\(.\{$CTC_PS_LEN\}\)/\1$CTC_PS_SPACER/ $CTC_PS_SOURCE"

(where CTC_PS_BEG is the first character to be replaced, CTC_PS_LEN is the length, CTC_PS_SPACER is the string with spaces, and CTC_PS_SOURCE is the input file).
However, two problems: no matter how I play with quotes, interpreter/shell changes the number of spaces to single; more over, it only works for characters up to 255, because of AIX limit of RC_MAX_DUP which I cannot change - it only allows up to 255 matches of a regular expression.

What is the best way I can create a command that will replace characters based on position?
0
Nikola_Novak
Asked:
Nikola_Novak
4 Solutions
 
Richard QuadlingSenior Software DeverloperCommented:
I don't know the shell (I'm on Windows), but in regex terms, split the 450 into smaller chunks and capture them.

Using ! as a space in the replace.

Search : ^(.{200})(.{200})(.{49}).{20}
Replace : \1\2!!!!!!!!!!!!!!!!!!!!

0
 
Nikola_NovakAuthor Commented:
Thanks; that's a valid idea, however it gets tricky as I do not know ahead of time how many characters I'll be replacing. So it may be chars 200-220 (in which case no modification would be necessary) or 1200-1600 (in which case I'd need to split up both the starting character, and the replacement length into chunks).
I can create some sort of loop that for every "100 characters" adds another set, but... it lacks certain elegance, I may run out of allowable match sets (up to 9, I think), and is prone to breakups and outlier whopsies. Since this will run on a big production finances server, I'd really prefer a one-liner that makes use of UNIX's tried-&-tested utils (there must be _something_ out there that works on positions rather than matches or delimiters... Ihope;) , than me trying to kludge too much shell code that can break ;)
0
 
Richard QuadlingSenior Software DeverloperCommented:
Oops. Missed \3.



Q. Does sed return a status if nothing was changed?

If so you will need to repeated run the search/replace until no changes are made ...

Search : ^(.{200})(.{200})(.{49})(?! {20}).{20}
Replace : \1\2\3!!!!!!!!!!!!!!!!!!!!

This says replace the 20 characters only if they are NOT 20 spaces.

So, run this repeatedly and those that have already been replaced should be ignored.


Using slightly smaller files ...

Search : ^(.{2})(.{2})(.{4})(?! {20}).{20}
Replace : \1\2\3!!!!!!!!!!!!!!!!!!!!

Source :

1122333312345678901234567890abc
11223333                    zyx
11223333 2345678901234567890abc
112233331  45678901234567890abc
1122333312   678901234567890abc
11223333123    8901234567890abc
112233331234                abc
11223333                   a bc

Dest :

11223333!!!!!!!!!!!!!!!!!!!!abc
11223333                    zyx
11223333!!!!!!!!!!!!!!!!!!!!abc
11223333!!!!!!!!!!!!!!!!!!!!abc
11223333!!!!!!!!!!!!!!!!!!!!abc
11223333!!!!!!!!!!!!!!!!!!!!abc
11223333!!!!!!!!!!!!!!!!!!!!abc
11223333!!!!!!!!!!!!!!!!!!!! bc


You'll need to use a mono spaced font to see that properly.

Replaced 20 non spaces at position 9-28 with !




0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Richard QuadlingSenior Software DeverloperCommented:
Use a language? PHP/Perl/etc?
0
 
Richard QuadlingSenior Software DeverloperCommented:
<?php
// Output the results back to the file we were supplied with in param 1.
file_put_contents (
      $argv[1],
      // Search and replace.
      preg_replace (
            // Build the regex based upon the offset and length supplied as params 2 and 3.
            sprintf (
                  '`^(.{%d})(?! {%d}).{%d}`im',
                  intval (
                        $argv[2]
                  ),
                  intval (
                        $argv[3]
                  )
            ),
            // Replace the above match with part 1 and x spaces - arg3 again.
            '\1' . str_repeat (
                  ' ',
                  $argv[3]
            )
      )
);
?>
0
 
Richard QuadlingSenior Software DeverloperCommented:
php ./replaceAt.php <file> <offset> <length>

And missed a param ...

<?php
// Output the results back to the file we were supplied with in param 1.
file_put_contents (
      $argv[1],
      // Search and replace.
      preg_replace (
            // Build the regex based upon the offset and length supplied as params 2 and 3.
            sprintf (
                  '`^(.{%d})(?! {%d}).{%d}`im',
                  intval (
                        $argv[2]
                  ),
                  intval (
                        $argv[3]
                  ),
                  intval (
                        $argv[3]
                  )
            ),
            // Replace the above match with part 1 and x spaces - arg3 again.
            '\1' . str_repeat (
                  ' ',
                  $argv[3]
            )
      )
);
?>


0
 
Richard QuadlingSenior Software DeverloperCommented:
So, params of xxx.txt 450 20 produces a regex of ...

`^(.{450})(?! {20}).{20}`im

Which is what is needed in this instance.

And I missed out the entire bloody file read!!!! Doh!

<?php
// Output the results back to the file we were supplied with in param 1.
file_put_contents (
      $argv[1],
      // Search and replace.
      preg_replace (
            // Build the regex based upon the offset and length supplied as params 2 and 3.
            sprintf (
                  '`^(.{%d})(?! {%d}).{%d}`im',
                  intval (
                        $argv[2]
                  ),
                  intval (
                        $argv[3]
                  ),
                  intval (
                        $argv[3]
                  )
            ),
            // Replace the above match with part 1 and x spaces - arg3 again.
            '\1' . str_repeat (
                  ' ',
                  $argv[3]
            ),
            // Use the file supplied in param 1
            file_get_contents (
                  $argv[1]
            )
      )
);
?>

Using source file ./sp.txt with command php ./replaceAt ./sp.txt 8 20

I converted :

1122333312345678901234567890abc
11223333                    zyx
11223333 2345678901234567890abc
112233331  45678901234567890abc
1122333312   678901234567890abc
11223333123    8901234567890abc
112233331234                abc
11223333                   a bc


into

11223333                    abc
11223333                    zyx
11223333                    abc
11223333                    abc
11223333                    abc
11223333                    abc
11223333                    abc
11223333                     bc

Then using php ./replaceAt ./sp.txt 2 1

I turned it into ...

11 23333                    abc
11 23333                    zyx
11 23333                    abc
11 23333                    abc
11 23333                    abc
11 23333                    abc
11 23333                    abc
11 23333                     bc


Finally php ./replaceAt ./sp.txt 0 1

 1 23333                    abc
 1 23333                    zyx
 1 23333                    abc
 1 23333                    abc
 1 23333                    abc
 1 23333                    abc
 1 23333                    abc
 1 23333                     bc

So, tested and working. Now all you need is PHP! But that SHOULD be there.
0
 
Nikola_NovakAuthor Commented:
(PERL is acceptable, though  Unix Shell Script solution is MUCH preferable, since the shell script is the driver and will be retrieving info from the database etc; I do not believe we can intall the DBI modules for Perl to communicate to the DB directly. Also, using PERL will need justification by the management folks, sigh... ;)
0
 
Nikola_NovakAuthor Commented:
Whopsie, just in case I didn't properly explain this (it was way too early and no coffee when I wrote the initial question) - I'm replacing with spaces; the characters that I'm replacing could be anything.
For example:
replace characters 5 to 7 with spaces:

input file
1234567890
0987654321

Output file
1234   890
0987   321

* PHP is not available. This is a production AIX box for a finance system for a major company... quite frankly, I'm pleasantly *amazed* Perl is available - and allowed by the powers that be ;)
0
 
Richard QuadlingSenior Software DeverloperCommented:
As I said, I don't know the shell. Converting the PHP code to Perl shouldn't be too difficult, but that's not my bag, baby.

My script does allow you to put any number of spaces at any offset.

Ha.

Being slightly off the wall, if you have looping capability within the shell ...

(pseudo).

get param 1 into an integer.
prepare an empty string 1.
while(param 1  less 1 is gtr zero) add '.' to string 1

get param 2 into an integer.
prepare an empty string 2.
prepare an empty string 3
while(param 2 less 1 is gtr zero) add ' ' to string 2 and add '.' to string 3

build regex as follows

^(string1)(?!string2)string3


maybe.
0
 
Richard QuadlingSenior Software DeverloperCommented:
Use string2 as the replacement string.
0
 
ahoffmannCommented:
perl -nle 's#(.){1,449}(?:.){20}(.*)#$1                    $2#;print' yourfile
0
 
Nikola_NovakAuthor Commented:
ooooh! I'm liking that! A one liner is just what the doctor ordered :)
So if I call it from shell script I can substitute my $CTC_PS_SPACER in between $1 and $2, and other variables for 449/20; it looks like basically a similar regex but converted to PERL.

Now, the only trick is it's not working:
$ cat sorts
12345678901234567890
12345678901234567890
abcdefghijklmnopqrst
 
$ perl -nle 's#(.){1,4}(?:.){5}(.*)#$1   $2#;print' sorts
4   01234567890
4   01234567890
d   jklmnopqrst

Instead of preserving "1234", it's only preserving the last character "4". I'm guessing that $1 is backreferencing just one character instead of all matches, but I'm not sure as I don't really remember much Perl...


0
 
ozoCommented:
perl -nle 's#(.{1,4})(?:.){5}(.*)#$1   $2#;print' sort
0
 
Nikola_NovakAuthor Commented:
I see - thank you very much - this is exactly what I needed :)
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now