Solved

UNIX BASH

Posted on 2010-11-11
31
481 Views
Last Modified: 2013-12-18
$ cat file1

field1                                               field2
ACT_MSTR                                      294705808
ACT_MSTR                                      294705808
CC_PROVIDERS                              2697541582
CC_PROVIDERS                              2697541581
CLOSED_WIN_JOB_CENTERS         3353753290
CLOSED_WIN_JOB_CENTERS         3353753290

if records of field1 are the same, compare the values of the 2 records based on field2. If values are not the same, print ONLY those records.

I would like my new file to be as follow:
CC_PROVIDERS                              2697541582
CC_PROVIDERS                              2697541581

THANKS!
0
Comment
Question by:Severcorr
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 16
  • 15
31 Comments
 
LVL 78

Assisted Solution

by:arnold
arnold earned 500 total points
ID: 34116320
egrep -iv "`awk ' { print $2 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '`" file1

awk ' { print $2 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1  ) { print $2 } '
The above will extract the common items in field2.
the egrep -i will treat each response line individual against the file referenced at the end in this case file1.
0
 

Author Comment

by:Severcorr
ID: 34116438
For some reasons, non of the commands were working. Thanks anyway.
0
 
LVL 78

Expert Comment

by:arnold
ID: 34116566
Which linux are you using?
Which shell do you have?

Check whether you have all the command available egrep, awk, sort, uniq.
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 

Author Comment

by:Severcorr
ID: 34116589
Here my unix version and those commands are there.

Linux WIAMigrationDB 2.6.18-164.11.1.el5.centos.plus #1 SMP Wed Jan 20 18:49:35 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
0
 
LVL 78

Expert Comment

by:arnold
ID: 34117904
You should have all those tools.
look in  /var/log/rpmpkgs

yum install gawk grep coreutils
they should be available in the /bin/ directory
try this then:
/bin/egrep -iv "`/bin/awk ' { print $2 } ' file1| /bin/grep -v field|/bin/sort | /bin/uniq -c | /bin/sort -rn | /bin/awk ' ($1 >1 ) { print $2 } '`"
0
 

Author Comment

by:Severcorr
ID: 34120671
I installed the gawk grep coreutils and it still doesn't work. Did you create a test file and tested scripts on your side? Thanks.
0
 
LVL 78

Expert Comment

by:arnold
ID: 34121641
Yes, I used a copy of the post you made. and named in file1. and when executing the commands, file1 was in the Current working directory i.e. ls file1 returned information about the file.

Could you post what errors you are getting?
0
 

Author Comment

by:Severcorr
ID: 34121763
I didn't get any error, but it came back with an empty value.
0
 
LVL 78

Expert Comment

by:arnold
ID: 34121997
what are the contents of file1?
post the following

awk ' { print  NF } ' file1
Do all lines reflect/report 2?
0
 

Author Comment

by:Severcorr
ID: 34122057
I ran awk ' { print  NF } ' file1. It just printed out one character/number throughout. By the way what does awk ' ($1 >1' mean? Thanks.
0
 

Author Comment

by:Severcorr
ID: 34122130
I am sorry. When I tested the above file (e.g file) it worked. But I am using it to test my real file it doesn't work. I don't know why. Sorry to have you go through all this. Thanks.

Real file:

egrep -iv "`awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $3 } '`" file1
ACT_MSTR                                          130.18.156.193 294705808
ACT_MSTR                                          130.18.156.191 294705808
CC_PROVIDERS                                  130.18.156.193 2697541582
CC_PROVIDERS                                  130.18.156.191 2697541581
CLOSED_WIN_JOB_CENTERS             130.18.156.193 3353753290
CLOSED_WIN_JOB_CENTERS             130.18.156.191 3353753290
CLOSED_WIN_JOB_CENTERS_OLD     130.18.156.193 2817260696
CLOSED_WIN_JOB_CENTERS_OLD     130.18.156.191 2817260696
0
 
LVL 78

Expert Comment

by:arnold
ID: 34122182
Which character/number did the awk ' { print NF } ' file1 output?  Was it the number 2?

awk ' ($1 >1 ) { print $2 } ' deals with the output of uniq -c | sort -rn
uniq -c returns a table:
number_of_occurances   Element

sort -rn does a reverse sort based on the first column such that items occur frequently are at the top.

The awk ' ($1>1) deals with getting the first column and if the value is greater than 1 it means that the entry/element is not unique and based on your question is outputed for the -iv grep which negates the match of each line.

here is the logic:

From the inside out.
1) determine all the elements that are not unique.  This is done by splitting the data columns of file1 and only processing field2.
The items in field2 are then sorted (sort) . grouped (uniq -c). reverse sorted (sort -rn). The list of non-unique elements is outputed.  At the conclusion of this step, you have a list of:
294705808
3353753290

2) use the above list as a negation i.e. any line that matches this matter is discarded which is egrep/grep -v.  Because this is a per line pattern, the use of the -i flag for egrep is needed to run the check for each occurrence of a pattern

0
 
LVL 78

Expert Comment

by:arnold
ID: 34122213
The file command not work, because you now have three elements and the element on which you want to trigger is in the third position and not in the second position as your first post suggested.
The fix is as you see is to use the third ($3) element instead of the seconds ($2) when using awk.
0
 

Author Comment

by:Severcorr
ID: 34122254
I did that as you can see from the below. It's weird that it didn't work.

egrep -iv "`awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $3 } '`" file1
0
 
LVL 78

Expert Comment

by:arnold
ID: 34122418
You changed too many, you only need to change the last first one
The one that deals with building the non-exclusive/unique pattern

egrep -iv "`awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '`" file1
0
 

Author Comment

by:Severcorr
ID: 34122480
It didn't work, but I accept your solution though.
0
 
LVL 78

Expert Comment

by:arnold
ID: 34122570
It works for me:
Runing
egrep -iv "`awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '`" file1

where file1 is:
ACT_MSTR                                          130.18.156.193 294705808
ACT_MSTR                                          130.18.156.191 294705808
CC_PROVIDERS                                  130.18.156.193 2697541582
CC_PROVIDERS                                  130.18.156.191 2697541581
CLOSED_WIN_JOB_CENTERS             130.18.156.193 3353753290
CLOSED_WIN_JOB_CENTERS             130.18.156.191 3353753290
CLOSED_WIN_JOB_CENTERS_OLD     130.18.156.193 2817260696
CLOSED_WIN_JOB_CENTERS_OLD     130.18.156.191 2817260696

file1
CC_PROVIDERS                                  130.18.156.193 2697541582
CC_PROVIDERS                                  130.18.156.191 2697541581


What separates the columns? are they a mixture of tabs and spaces?

0
 

Author Comment

by:Severcorr
ID: 34122615
Yes. They are mixture of spaces.
0
 
LVL 78

Expert Comment

by:arnold
ID: 34122965
are they spaces and tabs or just spaces.

The awk ' { print NF,$3} ' file1
should return the same number of fields
3 based on your latest example.
If the count varies, this will cause problems
and the second item should be the element on which you want to filter.
0
 

Author Comment

by:Severcorr
ID: 34123303
I ran the awk ' { print NF,$3} ' file1. It returned the below:


3 650947209
3 1667895989
3 1667895989
3 4232622253
3 4232622253
3 3436253646
0
 
LVL 78

Expert Comment

by:arnold
ID: 34123402
The items from running the command should return the two lines

that have 650947209 and 3436253646.

Try it one item at a time.
what is the output from running:
awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '
Does it return:
1667895989
1667895989
?

Note the pattern match for 650947209 will match *650947209 as well as 650947209* where * is any combination of letters/numbers/characters
0
 

Author Comment

by:Severcorr
ID: 34123514
$ cat file1
ACT_MSTR                                      130.18.156.193 294705808
ACT_MSTR                                      130.18.156.191 294705808
CC_PROVIDERS                              130.18.156.193 2697541582
CC_PROVIDERS                              130.18.156.191 2697541581
CLOSED_WIN_JOB_CENTERS         130.18.156.193 3353753290
CLOSED_WIN_JOB_CENTERS         130.18.156.191 3353753290
CLOSED_WIN_JOB_CENTERS_OLD 130.18.156.193 2817260696
CLOSED_WIN_JOB_CENTERS_OLD 130.18.156.191 2817260696

$ awk ' { print $3 } ' file2| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '

Returning:
3353753290
294705808
2817260696


Do you see any problem of the returning results? Thanks.



0
 

Author Comment

by:Severcorr
ID: 34123627
I just found out that if field3 contains zero, it doesn't work. Empty string will return when executing the command.


$ cat file1

ACT_MSTR                                      130.18.156.193 294705808
ACT_MSTR                                      130.18.156.191 294705808
CC_PROVIDERS                              130.18.156.193 2697541582
CC_PROVIDERS                              130.18.156.191 2697541581
CLOSED_WIN_JOB_CENTERS         130.18.156.193 3353753290
CLOSED_WIN_JOB_CENTERS         130.18.156.191 3353753290
CLOSED_WIN_JOB_CENTER_NEW  130.18.156.193 0
CLOSED_WIN_JOB_CENTER_NEW  130.18.156.191 0
CLOSED_WIN_JOB_CENTERS_OLD 130.18.156.193 2817260696
CLOSED_WIN_JOB_CENTERS_OLD 130.18.156.191 2817260696


$ awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } '





0
 
LVL 78

Expert Comment

by:arnold
ID: 34124073
The problem is that 0 matches 130 in the second field.
-v tells it to discard that line which is why there is no output.
0
 

Author Comment

by:Severcorr
ID: 34124113
In my case, is there an alternative way to get it to work? Besides deleting lines that have zero in fied3. Thanks.
0
 
LVL 78

Assisted Solution

by:arnold
arnold earned 500 total points
ID: 34124298
is perl an option?

awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } ' | the_below_named_perl_script_name file1

Just to avoid your question, yes I tested it with the information of your last posting and it does work:

CC_PROVIDERS                              130.18.156.193 2697541582
CC_PROVIDERS                              130.18.156.191 2697541581

#!/usr/bin/perl

while (<STDIN>) {
chomp();
$hash{$_}+=1;
}
if ($#ARGV<0) {
print "Usage: $0 filename.";
exit;
}
else {
open FILE, "$ARGV[0]" || die "Error opening file $ARGV[0].\n";
while (<FILE> ) {
chomp();
@array=split(/\s+/,$_);
if ( exists $hash{$array[$#array]} ) {
#do nothing
}
else {
print "$_\n";
}}
close(FILE);
}

Open in new window

0
 

Accepted Solution

by:
Severcorr earned 0 total points
ID: 34124459
Thank you so much! you are great!
0
 

Author Comment

by:Severcorr
ID: 34124863
Can I combine those commands in one bash script? How? Thanks.


$ awk '{printf "%-45s %-14s %s\n", $2,$4,$7}' test > file1

$ awk ' { print $3 } ' file1| grep -v field|sort | uniq -c | sort -rn | awk ' ($1 >1 ) { print $2 } ' | /opt/mysql/5.1.46/test6.sh file1


0
 
LVL 78

Expert Comment

by:arnold
ID: 34125734
You can not.  The problem is that the output of the first formating awk, is the input, It is possible to replace the awk non-unique data extraction into a perl script

I am puzzled if this is data from a database, why not run the query on the database itself. i.e. group by field as a negative entry within a where clause.


What is in the test6.sh script or is it the name you gave to the perl script in my prior comment?

I'll look into a single script such that you will have the output by executing the following:
/opt/mysql/5.1.46/test6.sh test 2 4 7


But you can actually use test as the input instead of file1.  as long as you adjust the fields i.e. instead of $3 you would use $7.

The perl script will not need to be modified as it actually looks at the last element in the array which will be the last column of test if there are only 7 columns.
0
 
LVL 78

Expert Comment

by:arnold
ID: 34128709
The below should take the test file as an argument and output the unique report.

If you want to post a few rows from test I could modify the script further.
Syntax wise the script should work, the processing should be fine as well.

#!/usr/bin/perl

if ( $#ARGV<0) { #test to make sure an argument is provided
   print "Usage: $0 filename\n";
   exit;
}
elsif  ( [ -f $ARGV[0] ] ) { #check to make sure the argument provided is a regu
lar file
open FILE,"$ARGV[0]" || die "Failed to open the file $ARGV[0] for reading\n";
while (<STDIN>) {
chomp();
@array=split(/\s+/,$_);
#your example deals with row 2,4,7 in awk which start at 1.  Perl/split starts a
t 0 so the change is 1,3,6
$hash{$array[6]}+=1;
push (@filearray,$_); #create an array that contains all the lines from the file
 referenced in $ARGV[0]
} #finished building the hash and create
close (FILE);
foreach $row (@filearray) { #second pass at the file content which were pushed o
nto @filearray
@array=split(/\s+/,$row);
if ( exists $hash{$array[6]} && $hash{$array[6]}>1 ) { #there are two entries

}
else {
       printf "%-45s %-14s %s\n",$array[1],$array[3],$array[6];
} #close if

} #close foreach

} #close if

Open in new window

0
 

Author Closing Comment

by:Severcorr
ID: 34153364
Arnold is very patient and professional. His Unix skills are awesome.
0

Featured Post

Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
ER Diagram 3 42
date show only hh:mm 2 40
How to find the cost of a stored procedure in Oracle and optimize it ?? 2 36
Migration from SQL server to oracle (XML input) 4 55
Have you ever had to make fundamental changes to a table in Oracle, but haven't been able to get any downtime?  I'm talking things like: * Dropping columns * Shrinking allocated space * Removing chained blocks and restoring the PCTFREE * Re-or…
From implementing a password expiration date, to datatype conversions and file export options, these are some useful settings I've found in Jasper Server.
This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question