G Ram
asked on
Separate out IP and text spanning different lines , into a specified format using bash scripting
How to separate out a text file having the following format on to another text file ?
10.10.10.06 | skjahdkjhhadjhahdahkahdhaj kdhajkhjdk hakjhdjkah jdhajkhdjk ahjkdddddd dddddddddd ddhakkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkkdds hajhd
10.10.10.06 |dsjhdjhjjjjjjjjjjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjj
*ashadjahddddddddddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddda
10.10.10.06 | xcnbxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxzc zc
I would like to have
10.10.10.06
-----------------
1) skjahdkjhhadjhahdahkahdhaj kdhajkhjdk hakjhdjkah jdhajkhdjk ahjkdddddd dddddddddd ddhakkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkkdds hajhd
2) dsjhdjhjjjjjjjjjjjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjj
*ashadjahddddddddddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddddddddd dddda
3) xcnbxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxzc zc
Thanks,
GR
10.10.10.06 | skjahdkjhhadjhahdahkahdhaj
10.10.10.06 |dsjhdjhjjjjjjjjjjjjjjjjjj
*ashadjahddddddddddddddddd
10.10.10.06 | xcnbxxxxxxxxxxxxxxxxxxxxxx
I would like to have
10.10.10.06
-----------------
1) skjahdkjhhadjhahdahkahdhaj
2) dsjhdjhjjjjjjjjjjjjjjjjjjj
*ashadjahddddddddddddddddd
3) xcnbxxxxxxxxxxxxxxxxxxxxxx
Thanks,
GR
Dont't forget to first sort the file if it is not sorted
ASKER
Hello @arnold,
Yes this is OK. But how could I avoid underlining for multiple lines appearing for the second column?
[Current output ]
10.10.10.06
---------------
skjahdkjhhadjhahdahkahdhaj kdhajkhjdk hakjhdjkah jdhajkhdjk ahjkdddddd dddddddddd ddhakkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkk
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- --------
kkkkkkkkkkddshajhd
-------------------------- -----
10.10.10.06
-----------------------
dsjhdjhjjjjjjjjjjjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjj
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -
jjjjjjjjjjjjjjjjjjjjjjjjjj jjjj
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -----
Desired output :
IP=10.10.10.06
----------------
skjahdkjhhadjhahdahkahdhaj kdhajkhjdk hakjhdjkah jdhajkhdjk ahjkdddddd dddddddddd ddhakkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkk
kkkkkkkkkkddshajhd
IP=10.10.10.06
-----------------------
dsjhdjhjjjjjjjjjjjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjjjjjjj jjjjj
jjjjjjjjjjjjjjjjjjjjjjjjjj jjjj
Thanks,
GR
Yes this is OK. But how could I avoid underlining for multiple lines appearing for the second column?
[Current output ]
10.10.10.06
---------------
skjahdkjhhadjhahdahkahdhaj
--------------------------
kkkkkkkkkkddshajhd
--------------------------
10.10.10.06
-----------------------
dsjhdjhjjjjjjjjjjjjjjjjjjj
--------------------------
jjjjjjjjjjjjjjjjjjjjjjjjjj
--------------------------
Desired output :
IP=10.10.10.06
----------------
skjahdkjhhadjhahdahkahdhaj
kkkkkkkkkkddshajhd
IP=10.10.10.06
-----------------------
dsjhdjhjjjjjjjjjjjjjjjjjjj
jjjjjjjjjjjjjjjjjjjjjjjjjj
Thanks,
GR
Tre difficulty is whether the data you gave includes patterns
|contin...
Use the awk portion
echo '1|2' | awk -F\| ' {printf "%s\n--------\n%s\n",$1,$2 } '
Do you get
1
-------
2
|contin...
Use the awk portion
echo '1|2' | awk -F\| ' {printf "%s\n--------\n%s\n",$1,$2
Do you get
1
-------
2
ASKER
Yes, I get that . The issue is 2nd column values spans multiple lines. As you can see, 1st and 2nd column data is separated by |
"The issue is 2nd column values spans multiple lines."
Do you mean "it does and should not" or "it does not and should" for the problem you mention
More precisely does it in the initial data? If no AND you don't want in output, the simply concatenate. Depending on where you look the result, it will either hide what is outside screen, or span aditional lines.
Do you mean "it does and should not" or "it does not and should" for the problem you mention
More precisely does it in the initial data? If no AND you don't want in output, the simply concatenate. Depending on where you look the result, it will either hide what is outside screen, or span aditional lines.
Can you post a sample of the text file surrounding the portions that have this issue as an example?
If the second part spans multiple lines
try the following, adding some debugging feature to indicate whether the data is coming from AWK or external sources.
awk -F\| ' {printf "%s\n----------\n%s\n+++++ +n",$1,$2 }. 'Data_spurce_file
the effect is
1
-----------
2
++++++
see what your output is like.
adding a condition outside the { as (length($1)>5) will check whether the IP is present in the first item
awk -F\| ' (length($1) >5 ) {printf "%s\n----------\n%s\n",$1, $2 }. 'Data_spurce_file
in which case if the IP is not present (using five to deal with any errand spaces, tab characters....
see if that changes the display, though note that your data file might have
IP | comment
this is a new line continueing the comment.
The awk as posted only checks a line at a time.
If the second part spans multiple lines
try the following, adding some debugging feature to indicate whether the data is coming from AWK or external sources.
awk -F\| ' {printf "%s\n----------\n%s\n+++++
the effect is
1
-----------
2
++++++
see what your output is like.
adding a condition outside the { as (length($1)>5) will check whether the IP is present in the first item
awk -F\| ' (length($1) >5 ) {printf "%s\n----------\n%s\n",$1,
in which case if the IP is not present (using five to deal with any errand spaces, tab characters....
see if that changes the display, though note that your data file might have
IP | comment
this is a new line continueing the comment.
The awk as posted only checks a line at a time.
ASKER
Hello @Bernard,
I mean the data file which I am parsing has this issue .So current solution by @arnold does separate out the columns .But since 2nd column data spans multiple rows, obviously the output is
10.10.10.06
---------------
skjahdkjhhadjhahdahkahdhaj kdhajkhjdk hakjhdjkah jdhajkhdjk ahjkdddddd dddddddddd ddhakkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkkkkk kkkkkkk
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- --------
kkkkkkkkkkddshajhd
-------------------------- -----
What is need is just the IP in bold and under that each of 2nd column value. and it does not matter if it repeats IP because description would be different. I have already given my desired output in earlier reply
Thanks
I mean the data file which I am parsing has this issue .So current solution by @arnold does separate out the columns .But since 2nd column data spans multiple rows, obviously the output is
10.10.10.06
---------------
skjahdkjhhadjhahdahkahdhaj
--------------------------
kkkkkkkkkkddshajhd
--------------------------
What is need is just the IP in bold and under that each of 2nd column value. and it does not matter if it repeats IP because description would be different. I have already given my desired output in earlier reply
Thanks
try the following script
cat data | perl script.pl
the below is script.pl
cat data | perl script.pl
the below is script.pl
#!/usr/bin/perl
$found=0;
$last=<STDIN>;
while (<STDIN> ) {
chomp(); #remove the lf/cr if exists at the end of this line.
if ( /\d+\.\d+\.\d+\.\d+/ ) {
#there is a match to an IP meaning we reached a new line
@array=split('\|',$last,2); #break the line into two only
printf ("%s\n-------\n%s\n",$array[0],$array[1]); #output the prior line since we found a new line
$last="$_";
}
else {
$last.="$_"; #append the current line to the prior as it is a continuation
}
} # end of while loop, the below is to clear the last line or it will be omitted
@array=split('\|',$last,2); #break the line into two only
printf ("%s\n-------\n%s\n",$array[0],$array[1]); #output the prior line since we found a new line
ASKER
Hello @arnold,
I checked perl ver, its 5 . I tried it out ,while it gives better control of what we do .Sample input data below..
10.10.10.06|An value exists.
It is sometimes opened by this/these Programs:
notepad.exe
notepad++.exe
Unless you know for sure what is behind it, you'd better
check your system
**** have been dynamically allocated to system
10.10.10.06|Certificate of this service will expire shortly
I do see a 3 rd column of values. Is that why it is still outputting some lines separated by -----
Thanks,
I checked perl ver, its 5 . I tried it out ,while it gives better control of what we do .Sample input data below..
10.10.10.06|An value exists.
It is sometimes opened by this/these Programs:
notepad.exe
notepad++.exe
Unless you know for sure what is behind it, you'd better
check your system
**** have been dynamically allocated to system
10.10.10.06|Certificate of this service will expire shortly
I do see a 3 rd column of values. Is that why it is still outputting some lines separated by -----
Thanks,
you are on a Windows system?
Perl script.pl <filename
I am unsure which file opens with notepad?
If you want to run script.pl, you would need to change its file association to open using perl.
Not sure how to answer, the awk example parses the li evaded on |
Starting from 1 each resulting element.
I.e. 1|2|3|4|5|6|7
If passed to awk, only 1 and 2 will be output.
The issue begins if the lifespan multiple lines
Ip | sone text
Sone additional text | so eother info
With awk
Ip, sone additional text will be in column 1, while the .... In column 2
What is your environment made up of.
On Linux, UNIX do you gave an editor, vi, vim, emacs, nano, pico, etc?
Perl script.pl <filename
I am unsure which file opens with notepad?
If you want to run script.pl, you would need to change its file association to open using perl.
Not sure how to answer, the awk example parses the li evaded on |
Starting from 1 each resulting element.
I.e. 1|2|3|4|5|6|7
If passed to awk, only 1 and 2 will be output.
The issue begins if the lifespan multiple lines
Ip | sone text
Sone additional text | so eother info
With awk
Ip, sone additional text will be in column 1, while the .... In column 2
What is your environment made up of.
On Linux, UNIX do you gave an editor, vi, vim, emacs, nano, pico, etc?
ASKER
Centos 7. Vi editor.what I do is bash script runs the sql against SQLite3 dB and put in resultant txt file which I want to parse, and that’s when I hit the text wrap issues.perl -v gave me ver as 5. I am not familiar with Perl . So I can modify the script to suit more than 2 columns? Only issue with the perl script given is that when there are multiple lines in column 2 having blank lines in between, it underlines some lines. I guess I could get over that with making the column 1 (ip) as bold. BecAuse this file will mailed as attachment , readability is important
Yes,
Split(delimeter,"string",n umber of elements; optional)
In the perl script I was only interested in two fields, the ip, and the second column.
Changing the 2 to 3 will split the string into three columns if there are two | ...
Just add %s\n in the display, first portion of printf. It functions the same way as in c. And add ,$array[2] ....
Split(delimeter,"string",n
In the perl script I was only interested in two fields, the ip, and the second column.
Changing the 2 to 3 will split the string into three columns if there are two | ...
Just add %s\n in the display, first portion of printf. It functions the same way as in c. And add ,$array[2] ....
ASKER
Hello @Arnold,
That works . But how to output the parsed file so that sendmail in bash can send as attachment ?
That works . But how to output the parsed file so that sendmail in bash can send as attachment ?
When you say attachment, presumably it means not unlike.
One way is to writeout out
File="fIlename.$$"
The $$ is the PID of the process.
You can output the results into $file
Sendmail will include.
If you use perl, and Mail module you can encode the file.
In Bash, you need to use an email client such as mail, mutt, etc.
With those you an include the file as an attachment.
At no point in the question emailing the results ....
mail -s "subject" somerecipient@somedomain.c om <$file
One way is to writeout out
File="fIlename.$$"
The $$ is the PID of the process.
You can output the results into $file
Sendmail will include.
If you use perl, and Mail module you can encode the file.
In Bash, you need to use an email client such as mail, mutt, etc.
With those you an include the file as an attachment.
At no point in the question emailing the results ....
mail -s "subject" somerecipient@somedomain.c
ASKER
Thanks @Arnold . I know that I have to use sendmail in bash.Just wanted to know after calling perl script in bash, how to return the result file back to bash ,so that I can send as attachment using sendmail
ASKER
using $() operator ?
If you use perl, you can send email from within without the need to return it back to bash.
Example of sending email usin perl.
https://learn.perl.org/examples/email.html
The example of sending with attachment ....
Can be seen...
Example of sending email usin perl.
https://learn.perl.org/examples/email.html
The example of sending with attachment ....
Can be seen...
This question needs an answer!
Become an EE member today
7 DAY FREE TRIALMembers can start a 7-Day Free trial then enjoy unlimited access to the platform.
View membership options
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
First field will be the Ip, the second is the SATA string
awk -F\| ' {printf "%s\n----------\n%s\n",$1,