?
Solved

Unix Sorting unique values

Posted on 2011-10-27
10
Medium Priority
?
617 Views
Last Modified: 2012-08-14
Hi
I want unix script which get unique values of pipe delimieted file. It has to pick unique values  and sort from each column

file1|

1|2|4|7|2|1
4|5|2|8|7|1
4|4|2|9|

Result file should be

1|2|2|2|2|1
4|4|7|8|7|
  |5|  |9|

Thanks
0
Comment
Question by:uco
  • 6
  • 3
10 Comments
 
LVL 85

Expert Comment

by:ozo
ID: 37037187
What happened to the
4
2
2
column and
7
8
9
column?
Where did the
2
8
9
result column come from?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37037658
Rather inelegant, but ...

 
#!/bin/sh
input="file1"
temp=$input.$$
output="file1.out"
maxf=$(awk -F"|" '{if(NF>M) M=NF} END {print M}' $input)
i=1; j=1
while [ $i -le $maxf ]; do
   echo $(cut -d"|" -f$i-$i $input|sort -u) >> $temp
   ((i+=1))
done 
while [ $j -le $maxf ]; do
  echo $(awk '{print $'$j'}' $temp | egrep -v "^$" | tr "\n" "|") |sed 's/^|/ |/;s/||/| |/g' | sed 's/||/| |/g'
  ((j+=1))
 done |egrep -v "^$" > $output
rm $temp 
exit

Open in new window


wmp
0
 

Author Comment

by:uco
ID: 37038090
Ok , Let me put it better way
I want   each column in this file to be sorted and there should not be any duplicates in respective columns. Each column is independent column separated by pipe. Attached is the file.  

file1|

A|   |       |26|  |  |  |  
A|   |361    |26|  |  |  |  
A|   |361    |26|  |  |  |009
A|   |361    |26|  |  |  |057
A|   |361    |TC|  |  |  |  
A|   |361    |TC|  |  |  |009
A|   |361    |TC|  |  |  |057
A|   |362    |26|  |  |  |  
A|   |362    |26|  |  |  |009
A|   |362    |26|  |  |  |057
A|   |362    |TC|  |  |  |  
file1.txt
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37038347
A little modification according to the more detailed info about the input format.
We could make the output a bit nicer by column alignment, if needed.

#!/bin/sh
input="file1.txt"
temp=$input.$$
output="file1.out"
maxf=$(awk -F"|" '{if(NF>M) M=NF} END {print M}' $input)
i=1; j=1
while [ $i -le $maxf ]; do
   echo $(cut -d"|" -f$i-$i $input|sort -u) >> $temp
   ((i+=1))
done
while [ $j -le $maxf ]; do
  echo $(awk '{print "|" $'$j'}' $temp | egrep -v "^$" | tr "\n" "")
  ((j+=1))
 done |egrep -v "^$" > $output
rm $temp
exit

Open in new window

0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37038464
As for the "nicer" thing:

See line 12 ("tr -d") and line 14 ("awk").

Adjust "%3s" to your desired column width, e.g. "%6s".

 
#!/bin/sh
input="file1.txt"
temp=$input.$$
output="file1.out"
maxf=$(awk -F"|" '{if(NF>M) M=NF} END {print M}' $input)
i=1; j=1
while [ $i -le $maxf ]; do
   echo $(cut -d"|" -f$i-$i $input|sort -u) >> $temp
   ((i+=1))
done 
while [ $j -le $maxf ]; do
  echo $(awk '{print "|" $'$j'}' $temp | egrep -v "^$" | tr -d "\n")
  ((j+=1))
 done | awk -F"|" '{printf FS; for (i=2;i<NF;i++) printf "%3s"FS, $i; printf "%3s\n", $NF}' > $output
rm $temp 
exit

Open in new window

0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37038620
Sorry, there's one very important line missing (copy-and-paste error!)

See below line 11
 
#!/bin/sh
input="file1.txt"
temp=$input.$$
output="file1.out"
maxf=$(awk -F"|" '{if(NF>M) M=NF} END {print M}' $input)
i=1; j=1
while [ $i -le $maxf ]; do
   echo $(awk -F"|" '{print $'$i'}' $input|sort -u) >> $temp
   ((i+=1))
done 
maxf=$(awk '{if(NF>M) M=NF} END {print M}' $temp)
while [ $j -le $maxf ]; do
  echo $(awk '{print "|" $'$j'}' $temp | tr -d "\n")
  ((j+=1))
 done | awk -F"|" '{printf FS; for (i=2;i<NF;i++) printf "%3s"FS, $i; printf "%3s\n", $NF}' > $output
rm $temp 
exit

Open in new window



0
 

Author Comment

by:uco
ID: 37041597
Hi
Works great but I dont want output file to have pipe in the beginning.Attached is the file
Please make the change and let me know where you made the change also .
Also does it work if any column size increases in the input file?
Thanks and appreciate your work
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 37041705
Here you go:
#!/bin/sh
input="file1.txt"
temp=$input.$$
output="file1.out"
maxf=$(awk -F"|" '{if(NF>M) M=NF} END {print M}' $input)
i=1; j=1
while [ $i -le $maxf ]; do
   echo $(awk -F"|" '{print $'$i'}' $input|sort -u) >> $temp
   ((i+=1))
done 
maxf=$(awk '{if(NF>M) M=NF} END {print M}' $temp)
while [ $j -le $maxf ]; do
  echo $(awk '{print "|" $'$j'}' $temp | tr -d "\n")
  ((j+=1))
 done | awk -F"|" '{for (i=2;i<NF;i++) printf "%3s"FS, $i; printf "%3s\n", $NF}' > $output
rm $temp 
exit

Open in new window

The change is in line 15 - I removed "printf FS" which prints the field separator ("|") in the first position.
Can't remember why I assumed that this was desired.

The column width in the output is determined by the number "3" inside the two "%3s" format strings in line 15.
Either make it big enough for future growth or use just "%s" which will give unaligned (but always sufficiently wide) columns.

Glad you like it.

wmp
0
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 2000 total points
ID: 37041759
You can also leave the format string "%3s" as is.

Values in single columns exceeding this width will make these columns unaligned, but the values will print correctly.
If all values in a column change to the same width alignment will be kept, however.
0
 

Author Closing Comment

by:uco
ID: 37045371
Great work, very quick and responsive . Very much appreciated
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This tech tip describes how to install the Solaris Operating System from a tape backup that was created using the Solaris flash archive utility. I have used this procedure on the Solaris 8 and 9 OS, and it shoudl also work well on the Solaris 10 rel…
Java performance on Solaris - Managing CPUs There are various resource controls in operating system which directly/indirectly influence the performance of application. one of the most important resource controls is "CPU".   In a multithreaded…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Suggested Courses
Course of the Month15 days, 21 hours left to enroll

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question