Unix script to break up file

need help with a Unix script.

1) Need to split a file that by fixed text and use part of the data to name the file.

Source file A

$$$$|DEF|ADDD
HDR|12345678|444|rhrhrh|hghghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr
HDR|234567890|444|rhrhrh|hghghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr

2) $$$$ starts new file
3) 2nd field in HDR is part of file name

Here is output

File B  name TESTTfile_12345678_20180130.txt

HDR|12345678|444|rhrhrh|hghghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr

File C  name TESTTfile_234567890_20180130.txt

HDR|234567890|444|rhrhrh|hghghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr

then original file saved to /archive

thanks!
wiestassocAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Bill PrewCommented:
Here is a small AWK script that should do the job.  Save as a file (I used ee29081347.awk) and run as follows:

gawk -f ee29081347.awk infile.txt

BEGIN {
   FS = "|"
   currentDate = strftime("%Y%m%d")
}

{
    if ($1 == "$$$$") {
       next
    }

   if ($1 == "HDR") {
      fileOut = "TESTTfile_" $2 "_" currentDate ".txt"
   }

   print $0>>fileOut
}

Open in new window


»bp
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Abhimanyu SuriSr Database EngineerCommented:
/home>cat A.txt
$$$$|DEF|ADDD
HDR|12345678|444|rhrhrh|hghghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr
HDR|234567890|444|rhrhrh|hghghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr

/home>cat parse_a.sh
#!/bin/sh
file_name=$1
dt=`date '+%Y%m%d'`
awk -v dtd=$dt -F'|' '/^HDR/ {file="TESTFILE_"$2"_"dtd".out"} /^HDR/,0 { print > file }' $file_name

/home>./parse_a.sh A.txt
refus12c:/orahome>ls -ltr TESTFILE*
-rw-r--r-- 1 oracle oinstall 104 Jan 31 10:38 TESTFILE_234567890_20180131.out
-rw-r--r-- 1 oracle oinstall 103 Jan 31 10:38 TESTFILE_12345678_20180131.out

/home>cat TESTFILE_234567890_20180131.out
HDR|234567890|444|rhrhrh|hghghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr

/home>cat TESTFILE_12345678_20180131.out
HDR|12345678|444|rhrhrh|hghghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr

Open in new window


Please modify as per your requirement.

Thanks,
Suri
1
wiestassocAuthor Commented:
Gents.. thanks so much.  I will get them a try .
0
wiestassocAuthor Commented:
There has been a change to the requirements.

Can anyone adjust a solution

1) Replace any "&" in file to "H"
2) Replace any "#" in file to "I"  (Letter I)
3) Break file by $$$$$$$.
4) Use the second entry in the $$$$$$$$ as part of the file name
5) Remove the $$$$$$$$ from the new file
6) Place new file into a new directory

Parameters

$INDIR = /data/out/
$OUTDIR = /data/newout


Input file:

Name sad.123.sad

$$$$$$$$|I231_0081788682|
HEADER|INV|20180224|20180224||0004165036|0004165036|0081788682|||||||||||
ITEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788682|10|180211C01|20180121
ITEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788682|10|180503C02|20180219
ITEM|900001|20|87QS5930150|1381|9999|200|CS|0081788682|20|180181C01|20180118
ITEM|900002|20|87QS5930150|1381|9999|100|CS|0081788682|20|180182C02|20180118
&EADER|DELV|20180224|20180224||0004165036|0004165036|0081788682|||||||||||
#TEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788682|10|180211C01|20180121
#TEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788682|10|180503C02|20180219
#TEM|900001|20|87QS5930150|1381|9999|200|CS|0081788682|20|180181C01|20180118
#TEM|900002|20|87QS5930150|1381|9999|100|CS|0081788682|20|180182C02|20180118
$$$$$$$$|I231_0081788684|
HEADER|INV|20180224|20180224||0004165036|0004165036|0081788684|||||||||||
ITEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788684|10|180211C01|20180121
ITEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788684|10|180503C02|20180219
ITEM|900001|20|87QS5930150|1381|9999|200|CS|0081788684|20|180181C01|20180118
ITEM|900002|20|87QS5930150|1381|9999|100|CS|0081788684|20|180182C02|20180118
&EADER|DELV|20180224|20180224||0004165036|0004165036|0081788684|||||||||||
#TEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788684|10|180211C01|20180121
#TEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788684|10|180503C02|20180219
#TEM|900001|20|87QS5930150|1381|9999|200|CS|0081788684|20|180181C01|20180118
#TEM|900002|20|87QS5930150|1381|9999|100|CS|0081788684|20|180182C02|20180118
$$$$$$$$|I266_0081788699|
HEADER|INV|20180224|20180224||0004165036|0004165036|0081788699|||||||||||
ITEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788699|10|180211C01|20180121
ITEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788699|10|180503C02|20180219
ITEM|900001|20|87QS5930150|1381|9999|200|CS|0081788699|20|180181C01|20180118
ITEM|900002|20|87QS5930150|1381|9999|100|CS|0081788699|20|180182C02|20180118
&EADER|DELV|20180224|20180224||0004165036|0004165036|0081788699|||||||||||
#TEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788699|10|180211C01|20180121
#TEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788699|10|180503C02|20180219
#TEM|900001|20|87QS5930150|1381|9999|200|CS|0081788699|20|180181C01|20180118
#TEM|900002|20|87QS5930150|1381|9999|100|CS|0081788699|20|180182C02|20180118

Break into files (This sample above is 3)

1  File    Should be named:  I231_0081788682_02252018_051500.txt

HEADER|INV|20180224|20180224||0004165036|0004165036|0081788682|||||||||||
ITEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788682|10|180211C01|20180121
ITEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788682|10|180503C02|20180219
ITEM|900001|20|87QS5930150|1381|9999|200|CS|0081788682|20|180181C01|20180118
ITEM|900002|20|87QS5930150|1381|9999|100|CS|0081788682|20|180182C02|20180118
HEADER|DELV|20180224|20180224||0004165036|0004165036|0081788682|||||||||||
ITEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788682|10|180211C01|20180121
ITEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788682|10|180503C02|20180219
ITEM|900001|20|87QS5930150|1381|9999|200|CS|0081788682|20|180181C01|20180118
ITEM|900002|20|87QS5930150|1381|9999|100|CS|0081788682|20|180182C02|20180118

2) file: should be            Should be named:  I231_0081788684_02252018_051500.txt
HEADER|INV|20180224|20180224||0004165036|0004165036|0081788684|||||||||||
ITEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788684|10|180211C01|20180121
ITEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788684|10|180503C02|20180219
ITEM|900001|20|87QS5930150|1381|9999|200|CS|0081788684|20|180181C01|20180118
ITEM|900002|20|87QS5930150|1381|9999|100|CS|0081788684|20|180182C02|20180118
HEADER|DELV|20180224|20180224||0004165036|0004165036|0081788684|||||||||||
ITEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788684|10|180211C01|20180121
ITEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788684|10|180503C02|20180219
ITEM|900001|20|87QS5930150|1381|9999|200|CS|0081788684|20|180181C01|20180118
ITEM|900002|20|87QS5930150|1381|9999|100|CS|0081788684|20|180182C02|20180118

3) File   name    Should be named:  I266_0081788699_02252018_051500.txt
HEADER|INV|20180224|20180224||0004165036|0004165036|0081788699|||||||||||
ITEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788699|10|180211C01|20180121
ITEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788699|10|180503C02|20180219
ITEM|900001|20|87QS5930150|1381|9999|200|CS|0081788699|20|180181C01|20180118
ITEM|900002|20|87QS5930150|1381|9999|100|CS|0081788699|20|180182C02|20180118
HEADER|DELV|20180224|20180224||0004165036|0004165036|0081788699|||||||||||
ITEM|900001|10|4L7S6101367|1381|9999|100|CS|0081788699|10|180211C01|20180121
ITEM|900002|10|4L7S6101367|1381|9999|100|CS|0081788699|10|180503C02|20180219
ITEM|900001|20|87QS5930150|1381|9999|200|CS|0081788699|20|180181C01|20180118
ITEM|900002|20|87QS5930150|1381|9999|100|CS|008178869|20|180182C02|20180118
0
tel2Commented:
Weeks later...

Hi wiestassoc,
I see you've now opened a new question for this new requirement.  That was the right thing to do, because it's quite different to the original requirement, which has already been answered.

Going back to your original request, here's a shell script which uses Perl to do the main processing, and is very similar to Abhimanyu's beautifully concise shell/awk answer:

#!/bin/bash
export DATE=`date +%Y%m%d`
perl -pe 'open STDOUT, ">>TESTFILE_$1_$ENV{DATE}.txt" if /^HDR\|(.+?)\|/' $*
mv $* /archive

Open in new window

If you put that in a script named script1.sh, and give it execute permission, you could run it like this:
./script1.sh input_file(s)
As you can see, it can process more than one file per execution.  For example, if you want to process all files starting with "A", you could do this:
./script1.sh A*
1
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Shell Scripting

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.