Solved

Requesting assistance with simple bash script to convert text list to csv

Posted on 2014-02-20
9
436 Views
Last Modified: 2014-02-25
Hi team,

   just need help with a script that can convert a regular TXT file into a  CSV file.  I have an  "expect" tcl program that  logs in to multiple network devices and extract SSL certificate information from each. I then collect the output to a file. To make the output useful, I would have to convert the generated "certoutput.txt" file to a "certoutput.csv" so I can import it to excel and do further filtering and manipulation.

Here is how the raw output looks like.  It can possibly be hundreds of lines long, and I just show here how the basic *repeating* format would typically look like.   In the case below, there are 3 successful SSL records and 1 error record.

[lnetadmin@load-balancer-A] ssl.crt # 
openssl x509  -issuer -subject -startdate -enddate -in cert1.intranet.mycompany.com.crt -noout
issuer= /DC=US/DC=COM/DC=Mycompany/DC=USDOM/DC=infra/CN=someissuer
subject= /C=US/ST=California/L=mycity/O=myorg/OU=enterprise/CN=test1.bpm.intranet.mycompany.com
notBefore=Feb 20 03:58:08 2013 GMT
notAfter=Apr 16 03:58:08 2014 GMT
[lnetadmin@load-balancer-A] ssl.crt # 
openssl x509  -issuer -subject -startdate -enddate -in CERT2.intranet.mycompany.com.crt -noout
issuer= /DC=US/DC=COM/DC=Mycompany/DC=USDOM/DC=infra/CN=someissuer
subject= /C=US/ST=California/L=mycity/O=myorg/OU=enterprise/CN=test2.bpm.intranet.mycompany.com
notBefore=Feb 20 03:58:08 2013 GMT
notAfter=Apr 16 03:58:08 2014 GMT
[lnetadmin@load-balancer-A] ssl.crt # 
openssl x509  -issuer -subject -startdate -enddate -in allcerts.txt -noout
unable to load certificate
29648:error:0906D06C:PEM routines:PEM_read_bio:no start line:pem_lib.c:647:Expecting: TRUSTED CERTIFICATE
[lnetadmin@load-balancer-A] ssl.crt # 
openssl x509  -issuer -subject -startdate -enddate -in CERT3.crt -noout
issuer= /DC=US/DC=COM/DC=Mycompany/DC=USDOM/CN=someissuer
subject= /DC=US/DC=COM/DC=Mycompany/DC=USDOM/DC=infra/CN=test3.intranet.mycompany.com
notBefore=Oct 29 08:31:41 2009 GMT
notAfter=Oct 29 08:41:41 2014 GMT
[lnetadmin@load-balancer-A] ssl.crt # 
[netadmin@myserver]$ 

Open in new window


I just need the output to look like this:
CERTID, ISSUER, CNNAME, ISSUEDATE, EXPIRATION
cert1.intranet.mycompany.com.crt , DC=US/DC=COM/DC=Mycompany/DC=USDOM/DC=infra/CN=someissuer,test1.bpm.intranet.mycompany.com,Feb 20 03:58:08 2013,Apr 16 03:58:08 2014
CERT2.intranet.mycompany.com.crt,DC=US/DC=COM/DC=Mycompany/DC=USDOM/DC=infra/CN=someissuer,test2.bpm.intranet.mycompany.com, Feb 20 03:58:08 2013, Apr 16 03:58:08 2014
CERT3.crt -noout,DC=US/DC=COM/DC=Mycompany/DC=USDOM/CN=someissuer, test3.intranet.mycompany.com, Oct 29 08:31:41 2009,Oct 29 08:41:41 2014

Open in new window


*The idea is to ignore all lines where it will say "error" , or "unable to load certificate" or the actual lines with the prompt "[lnetadmin@load-balancer-A] ssl.crt # "

Thanks very much.
0
Comment
Question by:rleyba828
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 3
9 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39873259
perl -l -0133ne '(1..1)&&print "CERTID, ISSUER, CNNAME, ISSUEDATE, EXPIRATION";print "$1 , $2,$3, $4" if /openssl.*?-in\s+(\S+).*?issuer=\s*(\S+).*?subject=.*?CN=(\S+).*?notAfter=([^\n]*)/s'
0
 
LVL 68

Assisted Solution

by:woolmilkporc
woolmilkporc earned 100 total points
ID: 39873264
OK, let's try this:

awk ' BEGIN {print "CERTID, ISSUER, CNNAME, ISSUEDATE, EXPIRATION"}
     {if($1~/^openssl/) {FS="-in | -noout"; S=$2}
      if($1~/^issuer=/) {FS="issuer= "; sub("^/","",$2); printf "%s , %s,", S, $2}
      if($1~/^subject=/) {FS="CN="; printf "%s,", $2}
      if($1~/^notBefore/) {FS="="; printf "%s,", $2}
      if($1~/^notAfter/) {FS="="; print $2}}' certoutput.txt > certoutput.csv
0
 
LVL 84

Accepted Solution

by:
ozo earned 400 total points
ID: 39873275
perl -l -0133ne '(1..1)&&print "CERTID, ISSUER, CNNAME, ISSUEDATE, EXPIRATION";print "$1 , $2,$3, $4, $5" if /openssl.*?-in\s+(\S+).*?issuer=\s*(\S+).*?subject=.*?CN=(\S+).*?notBefore=([^\n]*).*?notAfter=([^\n]*)/s' certoutput.txt > certoutput.csv
0
U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39873300
Just saw you also want to get rid of "GMT":

awk ' BEGIN {print "CERTID, ISSUER, CNNAME, ISSUEDATE, EXPIRATION"}
     {if($1~/^openssl/) {FS="-in | -noout"; S=$2}
      if($1~/^issuer=/) {FS="issuer= "; sub("^/","",$2); printf "%s,%s,", S, $2}
      if($1~/^subject=/) {FS="CN="; printf "%s,", $2}
      if($1~/^notBefore/) {FS="="; sub(" GMT","",$2); printf "%s,", $2}
      if($1~/^notAfter/) {FS="="; sub(" GMT","",$2); print $2}}' certoutput.txt > certoutput.csv
0
 

Author Comment

by:rleyba828
ID: 39879050
Hi Ozo,

   Thanks for this, the output of your file using the example above, is now printing ok  as:

[lnetadmin@myserver]$ cat certoutput.csv 
CERTID, ISSUER, CNNAME, ISSUEDATE, EXPIRATION
cert1.intranet.mycompany.com.crt , /DC=US/DC=COM/DC=Mycompany/DC=USDOM/DC=infra/CN=someissuer,test1.bpm.intranet.mycompany.com, Feb 20 03:58:08 2013 GMT, Apr 16 03:58:08 2014 GMT
CERT2.intranet.mycompany.com.crt , /DC=US/DC=COM/DC=Mycompany/DC=USDOM/DC=infra/CN=someissuer,test2.bpm.intranet.mycompany.com, Feb 20 03:58:08 2013 GMT, Apr 16 03:58:08 2014 GMT
CERT3.crt , /DC=US/DC=COM/DC=Mycompany/DC=USDOM/CN=someissuer,test3.intranet.mycompany.com, Oct 29 08:31:41 2009 GMT, Oct 29 08:41:41 2014 GMT

Open in new window


However, on my real data set which contains 70 certificates, when I simply do a cat certoutput.csv, then the line endings look weird and the "notafter" field appears to overlap the first few characters of that line.  However, when I do gedit certoutput.csv, everything seems to line up properly, except that after the "notbefore" field, the next comma and the "notafter" field is forced on the next line.  This happened on every single record (70 certs in my case) of the certoutput.csv file. I thought this was just due to the screen wrap of my monitor, but when I opened on a very wide screen, the "notafter" field REALLY was being forced on the next line.

Unfortunately, I can't post the output here as they are all our customer's certificates and it is too difficult for me to sanitize the output for publishing on this forum.


Finally,  could you kindly assist a non Perl expert like me by letting me know what "perl -l -0133ne '(1..1)&&" means?  The next lines of your code seem to make sense as they are all regex constructs, but am not sure about the first part.

Also,  I have to run the same script on about 92 other devices and I was going to do it this way.

extract-all-certs.sh   

Open in new window


and the content of this file would be

perl -e yourscript device-1-certs.txt   > device-1-certs.csv 
perl -e yourscript device-2-certs.txt   > device-2-certs.csv 
perl -e yourscript device-3-certs.txt   > device-3-certs.csv 
perl -e yourscript device-4-certs.txt   > device-4-certs.csv 
perl -e yourscript device-5-certs.txt   > device-5-certs.csv 
........
perl -e yourscript device-92-certs.txt   > device-92-certs.csv 

Open in new window


and the final output would contain the "device name" as the first column, like this....
[netadmin@myserver]$ cat device-1-certs.csv 
DEVICE-NAME,CERTID, ISSUER, CNNAME, ISSUEDATE, EXPIRATION
device-1,cert1.intranet.mycompany.com.crt , /DC=US/DC=COM/DC=Mycompany/DC=USDOM/DC=infra/CN=someissuer,test1.bpm.intranet.mycompany.com, Feb 20 03:58:08 2013 GMT, Apr 16 03:58:08 2014 GMT
device-1,CERT2.intranet.mycompany.com.crt , /DC=US/DC=COM/DC=Mycompany/DC=USDOM/DC=infra/CN=someissuer,test2.bpm.intranet.mycompany.com, Feb 20 03:58:08 2013 GMT, Apr 16 03:58:08 2014 GMT
device-1,CERT3.crt , /DC=US/DC=COM/DC=Mycompany/DC=USDOM/CN=someissuer,test3.intranet.mycompany.com, Oct 29 08:31:41 2009 GMT, Oct 29 08:41:41 2014 GMT

Open in new window


would very much appreciate if you could just add the extra line or two needed in your code that would print out the device name as the first field in the output file.   thanks .


for woolmilkporc, using the same example above, yours is coming out like this for some reason:
[lnetadmin@myserver]$  cat certoutput.csv 
CERTID, ISSUER, CNNAME, ISSUEDATE, EXPIRATION
x509,,,,Apr 16 03:58:08 2014
,,,,Apr 16 03:58:08 2014
CERT3.crt,,,,Oct 29 08:41:41 2014
[lnetadmin@myserver]$ 

Open in new window


thanks very much.
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39879056
Are you on Solaris?

If so, use /usr/xpg4/bin/awk or "nawk".

Besides that, when run against the data you posted in the Q my version runs just fine under Linux and AIX.
0
 

Author Comment

by:rleyba828
ID: 39879057
Hi Ozo,

  I did more analysis, and in my real data set, after the issuedate field in all the records, there is a [CR] character, that's why it was forcing the next field to start on the next line (in case when I open the file in gedit) or on the SAME line, as is the case when I do a simple cat certoutput.csv.  How do we remove this extraneous [CR] character?

Thanks.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39884478
s/\r//g
0
 

Author Comment

by:rleyba828
ID: 39885279
Hi woolmilkporc, not sure what was going on , I am using on plain old Linux OpenClient 2.6.32-358.11

For Ozo,   thanks for the search and replace string...that's what I need.

Thanks.
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question