wiestassoc
asked on
Unix script to break up file
need help with a Unix script.
1) Need to split a file that by fixed text and use part of the data to name the file.
Source file A
$$$$|DEF|ADDD
HDR|12345678|444|rhrhrh|hg hghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr
HDR|234567890|444|rhrhrh|h ghghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr
2) $$$$ starts new file
3) 2nd field in HDR is part of file name
Here is output
File B name TESTTfile_12345678_2018013 0.txt
HDR|12345678|444|rhrhrh|hg hghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr
File C name TESTTfile_234567890_201801 30.txt
HDR|234567890|444|rhrhrh|h ghghghg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr
then original file saved to /archive
thanks!
1) Need to split a file that by fixed text and use part of the data to name the file.
Source file A
$$$$|DEF|ADDD
HDR|12345678|444|rhrhrh|hg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr
HDR|234567890|444|rhrhrh|h
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr
2) $$$$ starts new file
3) 2nd field in HDR is part of file name
Here is output
File B name TESTTfile_12345678_2018013
HDR|12345678|444|rhrhrh|hg
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr
File C name TESTTfile_234567890_201801
HDR|234567890|444|rhrhrh|h
LINE|33333|444444|ghththg
LINE|THHE|rrr|5555
LINE|TEHEHE|5555|urjurjr
then original file saved to /archive
thanks!
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
There has been a change to the requirements.
Can anyone adjust a solution
1) Replace any "&" in file to "H"
2) Replace any "#" in file to "I" (Letter I)
3) Break file by $$$$$$$.
4) Use the second entry in the $$$$$$$$ as part of the file name
5) Remove the $$$$$$$$ from the new file
6) Place new file into a new directory
Parameters
$INDIR = /data/out/
$OUTDIR = /data/newout
Input file:
Name sad.123.sad
$$$$$$$$|I231_0081788682|
HEADER|INV|20180224|201802 24||000416 5036|00041 65036|0081 788682|||| |||||||
ITEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788682|1 0|180211C0 1|20180121
ITEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788682|1 0|180503C0 2|20180219
ITEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788682|2 0|180181C0 1|20180118
ITEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 81788682|2 0|180182C0 2|20180118
&EADER|DELV|20180224|20180 224||00041 65036|0004 165036|008 1788682||| ||||||||
#TEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788682|1 0|180211C0 1|20180121
#TEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788682|1 0|180503C0 2|20180219
#TEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788682|2 0|180181C0 1|20180118
#TEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 81788682|2 0|180182C0 2|20180118
$$$$$$$$|I231_0081788684|
HEADER|INV|20180224|201802 24||000416 5036|00041 65036|0081 788684|||| |||||||
ITEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788684|1 0|180211C0 1|20180121
ITEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788684|1 0|180503C0 2|20180219
ITEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788684|2 0|180181C0 1|20180118
ITEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 81788684|2 0|180182C0 2|20180118
&EADER|DELV|20180224|20180 224||00041 65036|0004 165036|008 1788684||| ||||||||
#TEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788684|1 0|180211C0 1|20180121
#TEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788684|1 0|180503C0 2|20180219
#TEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788684|2 0|180181C0 1|20180118
#TEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 81788684|2 0|180182C0 2|20180118
$$$$$$$$|I266_0081788699|
HEADER|INV|20180224|201802 24||000416 5036|00041 65036|0081 788699|||| |||||||
ITEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788699|1 0|180211C0 1|20180121
ITEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788699|1 0|180503C0 2|20180219
ITEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788699|2 0|180181C0 1|20180118
ITEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 81788699|2 0|180182C0 2|20180118
&EADER|DELV|20180224|20180 224||00041 65036|0004 165036|008 1788699||| ||||||||
#TEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788699|1 0|180211C0 1|20180121
#TEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788699|1 0|180503C0 2|20180219
#TEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788699|2 0|180181C0 1|20180118
#TEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 81788699|2 0|180182C0 2|20180118
Break into files (This sample above is 3)
1 File Should be named: I231_0081788682_02252018_0 51500.txt
HEADER|INV|20180224|201802 24||000416 5036|00041 65036|0081 788682|||| |||||||
ITEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788682|1 0|180211C0 1|20180121
ITEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788682|1 0|180503C0 2|20180219
ITEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788682|2 0|180181C0 1|20180118
ITEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 81788682|2 0|180182C0 2|20180118
HEADER|DELV|20180224|20180 224||00041 65036|0004 165036|008 1788682||| ||||||||
ITEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788682|1 0|180211C0 1|20180121
ITEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788682|1 0|180503C0 2|20180219
ITEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788682|2 0|180181C0 1|20180118
ITEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 81788682|2 0|180182C0 2|20180118
2) file: should be Should be named: I231_0081788684_02252018_0 51500.txt
HEADER|INV|20180224|201802 24||000416 5036|00041 65036|0081 788684|||| |||||||
ITEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788684|1 0|180211C0 1|20180121
ITEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788684|1 0|180503C0 2|20180219
ITEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788684|2 0|180181C0 1|20180118
ITEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 81788684|2 0|180182C0 2|20180118
HEADER|DELV|20180224|20180 224||00041 65036|0004 165036|008 1788684||| ||||||||
ITEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788684|1 0|180211C0 1|20180121
ITEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788684|1 0|180503C0 2|20180219
ITEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788684|2 0|180181C0 1|20180118
ITEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 81788684|2 0|180182C0 2|20180118
3) File name Should be named: I266_0081788699_02252018_0 51500.txt
HEADER|INV|20180224|201802 24||000416 5036|00041 65036|0081 788699|||| |||||||
ITEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788699|1 0|180211C0 1|20180121
ITEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788699|1 0|180503C0 2|20180219
ITEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788699|2 0|180181C0 1|20180118
ITEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 81788699|2 0|180182C0 2|20180118
HEADER|DELV|20180224|20180 224||00041 65036|0004 165036|008 1788699||| ||||||||
ITEM|900001|10|4L7S6101367 |1381|9999 |100|CS|00 81788699|1 0|180211C0 1|20180121
ITEM|900002|10|4L7S6101367 |1381|9999 |100|CS|00 81788699|1 0|180503C0 2|20180219
ITEM|900001|20|87QS5930150 |1381|9999 |200|CS|00 81788699|2 0|180181C0 1|20180118
ITEM|900002|20|87QS5930150 |1381|9999 |100|CS|00 8178869|20 |180182C02 |20180118
Can anyone adjust a solution
1) Replace any "&" in file to "H"
2) Replace any "#" in file to "I" (Letter I)
3) Break file by $$$$$$$.
4) Use the second entry in the $$$$$$$$ as part of the file name
5) Remove the $$$$$$$$ from the new file
6) Place new file into a new directory
Parameters
$INDIR = /data/out/
$OUTDIR = /data/newout
Input file:
Name sad.123.sad
$$$$$$$$|I231_0081788682|
HEADER|INV|20180224|201802
ITEM|900001|10|4L7S6101367
ITEM|900002|10|4L7S6101367
ITEM|900001|20|87QS5930150
ITEM|900002|20|87QS5930150
&EADER|DELV|20180224|20180
#TEM|900001|10|4L7S6101367
#TEM|900002|10|4L7S6101367
#TEM|900001|20|87QS5930150
#TEM|900002|20|87QS5930150
$$$$$$$$|I231_0081788684|
HEADER|INV|20180224|201802
ITEM|900001|10|4L7S6101367
ITEM|900002|10|4L7S6101367
ITEM|900001|20|87QS5930150
ITEM|900002|20|87QS5930150
&EADER|DELV|20180224|20180
#TEM|900001|10|4L7S6101367
#TEM|900002|10|4L7S6101367
#TEM|900001|20|87QS5930150
#TEM|900002|20|87QS5930150
$$$$$$$$|I266_0081788699|
HEADER|INV|20180224|201802
ITEM|900001|10|4L7S6101367
ITEM|900002|10|4L7S6101367
ITEM|900001|20|87QS5930150
ITEM|900002|20|87QS5930150
&EADER|DELV|20180224|20180
#TEM|900001|10|4L7S6101367
#TEM|900002|10|4L7S6101367
#TEM|900001|20|87QS5930150
#TEM|900002|20|87QS5930150
Break into files (This sample above is 3)
1 File Should be named: I231_0081788682_02252018_0
HEADER|INV|20180224|201802
ITEM|900001|10|4L7S6101367
ITEM|900002|10|4L7S6101367
ITEM|900001|20|87QS5930150
ITEM|900002|20|87QS5930150
HEADER|DELV|20180224|20180
ITEM|900001|10|4L7S6101367
ITEM|900002|10|4L7S6101367
ITEM|900001|20|87QS5930150
ITEM|900002|20|87QS5930150
2) file: should be Should be named: I231_0081788684_02252018_0
HEADER|INV|20180224|201802
ITEM|900001|10|4L7S6101367
ITEM|900002|10|4L7S6101367
ITEM|900001|20|87QS5930150
ITEM|900002|20|87QS5930150
HEADER|DELV|20180224|20180
ITEM|900001|10|4L7S6101367
ITEM|900002|10|4L7S6101367
ITEM|900001|20|87QS5930150
ITEM|900002|20|87QS5930150
3) File name Should be named: I266_0081788699_02252018_0
HEADER|INV|20180224|201802
ITEM|900001|10|4L7S6101367
ITEM|900002|10|4L7S6101367
ITEM|900001|20|87QS5930150
ITEM|900002|20|87QS5930150
HEADER|DELV|20180224|20180
ITEM|900001|10|4L7S6101367
ITEM|900002|10|4L7S6101367
ITEM|900001|20|87QS5930150
ITEM|900002|20|87QS5930150
Weeks later...
Hi wiestassoc,
I see you've now opened a new question for this new requirement. That was the right thing to do, because it's quite different to the original requirement, which has already been answered.
Going back to your original request, here's a shell script which uses Perl to do the main processing, and is very similar to Abhimanyu's beautifully concise shell/awk answer:
./script1.sh input_file(s)
As you can see, it can process more than one file per execution. For example, if you want to process all files starting with "A", you could do this:
./script1.sh A*
Hi wiestassoc,
I see you've now opened a new question for this new requirement. That was the right thing to do, because it's quite different to the original requirement, which has already been answered.
Going back to your original request, here's a shell script which uses Perl to do the main processing, and is very similar to Abhimanyu's beautifully concise shell/awk answer:
#!/bin/bash
export DATE=`date +%Y%m%d`
perl -pe 'open STDOUT, ">>TESTFILE_$1_$ENV{DATE}.txt" if /^HDR\|(.+?)\|/' $*
mv $* /archive
If you put that in a script named script1.sh, and give it execute permission, you could run it like this:./script1.sh input_file(s)
As you can see, it can process more than one file per execution. For example, if you want to process all files starting with "A", you could do this:
./script1.sh A*
ASKER