?
Solved

replace content of certain NOT empty columns in a *.csv file by a defined string

Posted on 2016-10-17
5
Medium Priority
?
27 Views
Last Modified: 2016-10-24
Dear experts,
I have to replace the content of certain columns, which is NOT empty, leaving attributes of reccords ,where the content was emtpy, empty  in a *.csv file by a defined string using a unix shell script on AIX.
e.g. CSV file before processing, replacing NON empty columns 2 and 4 by ***

11111,2222,3333,4444,5555
aaaaa,,ccc,dd,ee

now after replacing column 2 and 4 by ***:
11111,***,3333,***,5555
aaaaa,cccc,***,ee

I've posted a simliar question earlier, the solution for just replacing column 2 and 4 with *** was

perl -F"\35" -ane '$F[1]=$F[3]="***";print join("\35",@F)' input_file.csv

With a column separator being 'Group Separator':
http://www.theasciicode.com.ar/ascii-control-characters/group-separator-ascii-code-29.html

Now the question is how to update the above script, that it only replaces NON empty records columns by *** leaving already empty columns empty and not replacing by ***

Many thanks
0
Comment
Question by:mruff
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 79

Assisted Solution

by:arnold
arnold earned 1500 total points
ID: 41847283
what is the relationship between the first row and the second row.
Or is it ok to replace 2 and 4 when the first column is numbers, but if it is alpha or alphanumeric, replace the third?
non empty, means that you have to split, check and then act accordingly.
In the current example, you are blindly replacing contents.

@array=split(/,/); #feeding line by line
test $array[1] and $array[3] for length, if 0 means there is nothing there, so you skip.
......
Must it be on a command line?
0
 

Author Comment

by:mruff
ID: 41847633
Hi LVL 76
I did not fully understand your question, the column to be checked for empty or content must always be the same regardless of potential content, may it be numeric or alpha or alphanumeric.
Yes, it must be unfortunately command line, with e.g. java that would be very easy.
I know that  the version
perl -F"\35" -ane '$F[1]=$F[3]="***";print join("\35",@F)' input_file.csv
replaces blindly, isn't there a solution with this version using regex?
e.g. having to possibility to evaluate $F[1] first, with a regex instead of directly assigning 'blindly' '***'
0
 
LVL 79

Expert Comment

by:arnold
ID: 41848679
you have in the example when the second and fourth columns are replaced, the first column is 111111.
IN the second row when the third column is replaced, it starts with aaaaa

Your current perl script is replacing whatever it finds in the positions outline with the new one.

perl has a match regex, but for that you would need to know the maximum number of columns in a line, or it is possible though I am not suffciently familiar to have multiple matches on the same line assigned to a match array that you will need to handle
the straight forward for me is to use @array=split(/,/) the array will have as many elements as needed based on the comma separated value. IT will not differentiate if there are commas enclosed by quotes......

So understanding the type of data you are processing is important to make sure not to ....
With the @array you can test the length($array[1]) column 2, and $array[3] column 4 in the one example, or $array[2] as column three under a different scenario.

Where is this CSV comming from, might it not be simpler to perform the task prior to the CSV export?
0
 

Accepted Solution

by:
mruff earned 0 total points
ID: 41850362
the solution was
awk -F";" 'BEGIN { OFS=";";}{ if ( $24 != "" ) $24="*PURGED*" ; print $0}' file1.dat > file2_.dat
to e.g. purge column 24 if it had content
0
 

Author Closing Comment

by:mruff
ID: 41856722
my comment was the final solution
0

Featured Post

Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This tech tip describes how to install the Solaris Operating System from a tape backup that was created using the Solaris flash archive utility. I have used this procedure on the Solaris 8 and 9 OS, and it shoudl also work well on the Solaris 10 rel…
Java performance on Solaris - Managing CPUs There are various resource controls in operating system which directly/indirectly influence the performance of application. one of the most important resource controls is "CPU".   In a multithreaded…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…
Suggested Courses
Course of the Month10 days, 18 hours left to enroll

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question