Link to home
Start Free TrialLog in
Avatar of sunhux
sunhux

asked on

Unix Shell script to sort & keep only one occurrence of repeated field ( sed awk Perl grep )


File1 has the following lines (columns delimited by space(s)):

997818 found in 3498
1004060 found in 3499
1214451 found in 3498
879730 found in 3499
8029032 found in 3515
8054065 found in 3515
8056462 found in 3515
8138803 found in 3517
8135802 found in 3516
8135803 found in 3516
. . .

I need a script that will sort by the 4th column & for repeated 4th column values/
lines, just list out only the 4th columns' values once.  So the output file2 will be :
3498
3499
3515
3516
3517
...


Then another script will read file2 and compare against file3, eg, file3 has lines below:
3515
3517

The final output will be those lines / values in file2 that are not found in file3, so final output:
3498
3499
3516

I'm Ok if you can combine the 2 scripts into 1 script or even a single liner
ASKER CERTIFIED SOLUTION
Avatar of medvedd
medvedd

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of arnold
Instead of asking for scripts to perform a specific task on the same set of data, could you detail your overall goal is with this data?

Having 10 scripts processing the same set of data extracting different things where a single multipurpose script will do.
i.e. get the first column do something, use the fourth column and do something else etc.
read in files and ...etc.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial