Solved

UNIX BASH

Posted on 2010-11-11
31
477 Views
Last Modified: 2013-12-18
$ cat file1

field1                                               field2
ACT_MSTR                                      294705808
ACT_MSTR                                      294705808
CC_PROVIDERS                              2697541582
CC_PROVIDERS                              2697541581
CLOSED_WIN_JOB_CENTERS         3353753290
CLOSED_WIN_JOB_CENTERS         3353753290

if records of field1 are the same, compare the values of the 2 records based on field2. If values are not the same, print ONLY those records.

I would like my new file to be as follow:
CC_PROVIDERS                              2697541582
CC_PROVIDERS                              2697541581

THANKS!
0
Comment
Question by:Severcorr
  • 16
  • 15
31 Comments
 
LVL 76

Assisted Solution

by:arnold
arnold earned 500 total points
ID: 34116320
egrep -iv "`awk ' { print $2 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '`" file1

awk ' { print $2 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1  ) { print $2 } '
The above will extract the common items in field2.
the egrep -i will treat each response line individual against the file referenced at the end in this case file1.
0
 

Author Comment

by:Severcorr
ID: 34116438
For some reasons, non of the commands were working. Thanks anyway.
0
 
LVL 76

Expert Comment

by:arnold
ID: 34116566
Which linux are you using?
Which shell do you have?

Check whether you have all the command available egrep, awk, sort, uniq.
0
 

Author Comment

by:Severcorr
ID: 34116589
Here my unix version and those commands are there.

Linux WIAMigrationDB 2.6.18-164.11.1.el5.centos.plus #1 SMP Wed Jan 20 18:49:35 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
0
 
LVL 76

Expert Comment

by:arnold
ID: 34117904
You should have all those tools.
look in  /var/log/rpmpkgs

yum install gawk grep coreutils
they should be available in the /bin/ directory
try this then:
/bin/egrep -iv "`/bin/awk ' { print $2 } ' file1| /bin/grep -v field|/bin/sort | /bin/uniq -c | /bin/sort -rn | /bin/awk ' ($1 >1 ) { print $2 } '`"
0
 

Author Comment

by:Severcorr
ID: 34120671
I installed the gawk grep coreutils and it still doesn't work. Did you create a test file and tested scripts on your side? Thanks.
0
 
LVL 76

Expert Comment

by:arnold
ID: 34121641
Yes, I used a copy of the post you made. and named in file1. and when executing the commands, file1 was in the Current working directory i.e. ls file1 returned information about the file.

Could you post what errors you are getting?
0
 

Author Comment

by:Severcorr
ID: 34121763
I didn't get any error, but it came back with an empty value.
0
 
LVL 76

Expert Comment

by:arnold
ID: 34121997
what are the contents of file1?
post the following

awk ' { print  NF } ' file1
Do all lines reflect/report 2?
0
 

Author Comment

by:Severcorr
ID: 34122057
I ran awk ' { print  NF } ' file1. It just printed out one character/number throughout. By the way what does awk ' ($1 >1' mean? Thanks.
0
 

Author Comment

by:Severcorr
ID: 34122130
I am sorry. When I tested the above file (e.g file) it worked. But I am using it to test my real file it doesn't work. I don't know why. Sorry to have you go through all this. Thanks.

Real file:

egrep -iv "`awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $3 } '`" file1
ACT_MSTR                                          130.18.156.193 294705808
ACT_MSTR                                          130.18.156.191 294705808
CC_PROVIDERS                                  130.18.156.193 2697541582
CC_PROVIDERS                                  130.18.156.191 2697541581
CLOSED_WIN_JOB_CENTERS             130.18.156.193 3353753290
CLOSED_WIN_JOB_CENTERS             130.18.156.191 3353753290
CLOSED_WIN_JOB_CENTERS_OLD     130.18.156.193 2817260696
CLOSED_WIN_JOB_CENTERS_OLD     130.18.156.191 2817260696
0
 
LVL 76

Expert Comment

by:arnold
ID: 34122182
Which character/number did the awk ' { print NF } ' file1 output?  Was it the number 2?

awk ' ($1 >1 ) { print $2 } ' deals with the output of uniq -c | sort -rn
uniq -c returns a table:
number_of_occurances   Element

sort -rn does a reverse sort based on the first column such that items occur frequently are at the top.

The awk ' ($1>1) deals with getting the first column and if the value is greater than 1 it means that the entry/element is not unique and based on your question is outputed for the -iv grep which negates the match of each line.

here is the logic:

From the inside out.
1) determine all the elements that are not unique.  This is done by splitting the data columns of file1 and only processing field2.
The items in field2 are then sorted (sort) . grouped (uniq -c). reverse sorted (sort -rn). The list of non-unique elements is outputed.  At the conclusion of this step, you have a list of:
294705808
3353753290

2) use the above list as a negation i.e. any line that matches this matter is discarded which is egrep/grep -v.  Because this is a per line pattern, the use of the -i flag for egrep is needed to run the check for each occurrence of a pattern

0
 
LVL 76

Expert Comment

by:arnold
ID: 34122213
The file command not work, because you now have three elements and the element on which you want to trigger is in the third position and not in the second position as your first post suggested.
The fix is as you see is to use the third ($3) element instead of the seconds ($2) when using awk.
0
 

Author Comment

by:Severcorr
ID: 34122254
I did that as you can see from the below. It's weird that it didn't work.

egrep -iv "`awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $3 } '`" file1
0
 
LVL 76

Expert Comment

by:arnold
ID: 34122418
You changed too many, you only need to change the last first one
The one that deals with building the non-exclusive/unique pattern

egrep -iv "`awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '`" file1
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 

Author Comment

by:Severcorr
ID: 34122480
It didn't work, but I accept your solution though.
0
 
LVL 76

Expert Comment

by:arnold
ID: 34122570
It works for me:
Runing
egrep -iv "`awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '`" file1

where file1 is:
ACT_MSTR                                          130.18.156.193 294705808
ACT_MSTR                                          130.18.156.191 294705808
CC_PROVIDERS                                  130.18.156.193 2697541582
CC_PROVIDERS                                  130.18.156.191 2697541581
CLOSED_WIN_JOB_CENTERS             130.18.156.193 3353753290
CLOSED_WIN_JOB_CENTERS             130.18.156.191 3353753290
CLOSED_WIN_JOB_CENTERS_OLD     130.18.156.193 2817260696
CLOSED_WIN_JOB_CENTERS_OLD     130.18.156.191 2817260696

file1
CC_PROVIDERS                                  130.18.156.193 2697541582
CC_PROVIDERS                                  130.18.156.191 2697541581


What separates the columns? are they a mixture of tabs and spaces?

0
 

Author Comment

by:Severcorr
ID: 34122615
Yes. They are mixture of spaces.
0
 
LVL 76

Expert Comment

by:arnold
ID: 34122965
are they spaces and tabs or just spaces.

The awk ' { print NF,$3} ' file1
should return the same number of fields
3 based on your latest example.
If the count varies, this will cause problems
and the second item should be the element on which you want to filter.
0
 

Author Comment

by:Severcorr
ID: 34123303
I ran the awk ' { print NF,$3} ' file1. It returned the below:


3 650947209
3 1667895989
3 1667895989
3 4232622253
3 4232622253
3 3436253646
0
 
LVL 76

Expert Comment

by:arnold
ID: 34123402
The items from running the command should return the two lines

that have 650947209 and 3436253646.

Try it one item at a time.
what is the output from running:
awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '
Does it return:
1667895989
1667895989
?

Note the pattern match for 650947209 will match *650947209 as well as 650947209* where * is any combination of letters/numbers/characters
0
 

Author Comment

by:Severcorr
ID: 34123514
$ cat file1
ACT_MSTR                                      130.18.156.193 294705808
ACT_MSTR                                      130.18.156.191 294705808
CC_PROVIDERS                              130.18.156.193 2697541582
CC_PROVIDERS                              130.18.156.191 2697541581
CLOSED_WIN_JOB_CENTERS         130.18.156.193 3353753290
CLOSED_WIN_JOB_CENTERS         130.18.156.191 3353753290
CLOSED_WIN_JOB_CENTERS_OLD 130.18.156.193 2817260696
CLOSED_WIN_JOB_CENTERS_OLD 130.18.156.191 2817260696

$ awk ' { print $3 } ' file2| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '

Returning:
3353753290
294705808
2817260696


Do you see any problem of the returning results? Thanks.



0
 

Author Comment

by:Severcorr
ID: 34123627
I just found out that if field3 contains zero, it doesn't work. Empty string will return when executing the command.


$ cat file1

ACT_MSTR                                      130.18.156.193 294705808
ACT_MSTR                                      130.18.156.191 294705808
CC_PROVIDERS                              130.18.156.193 2697541582
CC_PROVIDERS                              130.18.156.191 2697541581
CLOSED_WIN_JOB_CENTERS         130.18.156.193 3353753290
CLOSED_WIN_JOB_CENTERS         130.18.156.191 3353753290
CLOSED_WIN_JOB_CENTER_NEW  130.18.156.193 0
CLOSED_WIN_JOB_CENTER_NEW  130.18.156.191 0
CLOSED_WIN_JOB_CENTERS_OLD 130.18.156.193 2817260696
CLOSED_WIN_JOB_CENTERS_OLD 130.18.156.191 2817260696


$ awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '





0
 
LVL 76

Expert Comment

by:arnold
ID: 34124073
The problem is that 0 matches 130 in the second field.
-v tells it to discard that line which is why there is no output.
0
 

Author Comment

by:Severcorr
ID: 34124113
In my case, is there an alternative way to get it to work? Besides deleting lines that have zero in fied3. Thanks.
0
 
LVL 76

Assisted Solution

by:arnold
arnold earned 500 total points
ID: 34124298
is perl an option?

awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } ' | the_below_named_perl_script_name file1

Just to avoid your question, yes I tested it with the information of your last posting and it does work:

CC_PROVIDERS                              130.18.156.193 2697541582
CC_PROVIDERS                              130.18.156.191 2697541581

#!/usr/bin/perl

while (<STDIN>) {
chomp();
$hash{$_}+=1;
}
if ($#ARGV<0) {
print "Usage: $0 filename.";
exit;
}
else {
open FILE, "$ARGV[0]" || die "Error opening file $ARGV[0].\n";
while (<FILE> ) {
chomp();
@array=split(/\s+/,$_);
if ( exists $hash{$array[$#array]} ) {
#do nothing
}
else {
print "$_\n";
}}
close(FILE);
}

Open in new window

0
 

Accepted Solution

by:
Severcorr earned 0 total points
ID: 34124459
Thank you so much! you are great!
0
 

Author Comment

by:Severcorr
ID: 34124863
Can I combine those commands in one bash script? How? Thanks.


$ awk '{printf "%-45s %-14s %s\n", $2,$4,$7}' test > file1

$ awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } ' | /opt/mysql/5.1.46/test6.sh file1


0
 
LVL 76

Expert Comment

by:arnold
ID: 34125734
You can not.  The problem is that the output of the first formating awk, is the input, It is possible to replace the awk non-unique data extraction into a perl script

I am puzzled if this is data from a database, why not run the query on the database itself. i.e. group by field as a negative entry within a where clause.


What is in the test6.sh script or is it the name you gave to the perl script in my prior comment?

I'll look into a single script such that you will have the output by executing the following:
/opt/mysql/5.1.46/test6.sh test 2 4 7


But you can actually use test as the input instead of file1.  as long as you adjust the fields i.e. instead of $3 you would use $7.

The perl script will not need to be modified as it actually looks at the last element in the array which will be the last column of test if there are only 7 columns.
0
 
LVL 76

Expert Comment

by:arnold
ID: 34128709
The below should take the test file as an argument and output the unique report.

If you want to post a few rows from test I could modify the script further.
Syntax wise the script should work, the processing should be fine as well.

#!/usr/bin/perl

if ( $#ARGV<0) { #test to make sure an argument is provided
   print "Usage: $0 filename\n";
   exit;
}
elsif  ( [ -f $ARGV[0] ] ) { #check to make sure the argument provided is a regu
lar file
open FILE,"$ARGV[0]" || die "Failed to open the file $ARGV[0] for reading\n";
while (<STDIN>) {
chomp();
@array=split(/\s+/,$_);
#your example deals with row 2,4,7 in awk which start at 1.  Perl/split starts a
t 0 so the change is 1,3,6
$hash{$array[6]}+=1;
push (@filearray,$_); #create an array that contains all the lines from the file
 referenced in $ARGV[0]
} #finished building the hash and create
close (FILE);
foreach $row (@filearray) { #second pass at the file content which were pushed o
nto @filearray
@array=split(/\s+/,$row);
if ( exists $hash{$array[6]} && $hash{$array[6]}>1 ) { #there are two entries

}
else {
       printf "%-45s %-14s %s\n",$array[1],$array[3],$array[6];
} #close if

} #close foreach

} #close if

Open in new window

0
 

Author Closing Comment

by:Severcorr
ID: 34153364
Arnold is very patient and professional. His Unix skills are awesome.
0

Featured Post

Get up to 2TB FREE CLOUD per backup license!

An exclusive Black Friday offer just for Expert Exchange audience! Buy any of our top-rated backup solutions & get up to 2TB free cloud per system! Perform local & cloud backup in the same step, and restore instantly—anytime, anywhere. Grab this deal now before it disappears!

Join & Write a Comment

Suggested Solutions

Introduction A previously published article on Experts Exchange ("Joins in Oracle", http://www.experts-exchange.com/Database/Oracle/A_8249-Joins-in-Oracle.html) makes a statement about "Oracle proprietary" joins and mixes the join syntax with gen…
Configuring and using Oracle Database Gateway for ODBC Introduction First, a brief summary of what a Database Gateway is.  A Gateway is a set of driver agents and configurations that allow an Oracle database to communicate with other platforms…
This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now