asked on

Unix Shell script to sort & keep only one occurrence of repeated field ( sed awk Perl grep )

File1 has the following lines (columns delimited by space(s)):

997818 found in 3498
1004060 found in 3499
1214451 found in 3498
879730 found in 3499
8029032 found in 3515
8054065 found in 3515
8056462 found in 3515
8138803 found in 3517
8135802 found in 3516
8135803 found in 3516
. . .

I need a script that will sort by the 4th column & for repeated 4th column values/
lines, just list out only the 4th columns' values once. So the output file2 will be :
3498
3499
3515
3516
3517
...

Then another script will read file2 and compare against file3, eg, file3 has lines below:
3515
3517

The final output will be those lines / values in file2 that are not found in file3, so final output:
3498
3499
3516

I'm Ok if you can combine the 2 scripts into 1 script or even a single liner

ASKER CERTIFIED SOLUTION

medvedd

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

arnold

Instead of asking for scripts to perform a specific task on the same set of data, could you detail your overall goal is with this data?

Having 10 scripts processing the same set of data extracting different things where a single multipurpose script will do.
i.e. get the first column do something, use the fourth column and do something else etc.
read in files and ...etc.

SOLUTION

omarfarid

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

talatmasood

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Unix Shell script to sort &amp; keep only one occurrence of repeated field ( sed awk Perl grep )

Unix Shell script to sort & keep only one occurrence of repeated field ( sed awk Perl grep )