• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 182
  • Last Modified:

Compare script in Perl

I have a text file (file_111.txt) with the entries like

'string1 string2: string3.string4'
'string267394: string2.string3'
'string1 string2: string3.string22'
'string267394: string2.string39'

Another text file (file_222.txt) with entries like

'string4'
'string3'
'string22'
'string66'
'string99'
'string39'

I wanted to compare the last string [String after the dot (.) ] in file_111.txt with the string in file_222.txt.

If the string in file_111.txt is found in file_222.txt then echo the matched line in file_111.txt.

For example if string4 in the first line in file_111.txt has found a match in file_222.txt then

echo 'string1 string2: string3.string4' has matched  

else

echo "no match found"


The script needs to take the path of the both text files as parameters

test.pl path/to/file_111.txt path/to/file_222.txt
0
gaurav sharma
Asked:
gaurav sharma
  • 12
  • 9
1 Solution
 
gaurav sharmaAuthor Commented:
Correction:

File_222.txt does not have the single quotes
string4
string3
string22
string66
string39
0
 
FishMongerCommented:
What have you tried?

How does the output of your script differ from what you want?

What warnings/errors are you receiving?

What part of the process is giving you trouble?

Please post your script and answer those questions so we can assist you with your homework assignment.
0
 
gaurav sharmaAuthor Commented:
So far I have :



#!/usr/bin/perl

$filename = 'file_111.txt';

# use the perl open function to open the file
open(FILE, $filename) or die "Could not read from $filename, program halting.";

# loop through each line in the file
# with the typical "perl file while" loop
while(<FILE>)
{
  # get rid of the pesky newline character
  chomp;

  # read the fields in the current line into an array
  @fields = split(':', $_);

  # print the first field
  print "$fields[1]\n";
}
close FILE;

Open in new window

output:

string3.string4'
string2.string3'
string3.string22'
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
gaurav sharmaAuthor Commented:
Tried giving the delimiter as '.' but prints out nothing :-(
0
 
FishMongerCommented:
Lets start at the beginning of the script.  All Perl scripts should load the warnings and strict pragmas.  They will point out lots of common mistakes/errors some of which could be difficult to track down without their help.  The strict pragma will require you to declare your vars.
#!/usr/bin/perl

use strict;
use warnings;

my $filename = 'file_111.txt';

Open in new window


Next deal with the script needs to take the path of the both text files as parameters requirement instead of hard coding the filenames in the script.

The script's arguments/parameters are found in the @ARGV array.  The first one is accessed via $ARGV[0] and the second is in $ARGV[1].

When you open a filehandle, use the 3 arg form of open and use a lexical var for the filehandle instead of the bareword.  Also, the die statement should include the reason it failed and that info is in the $! var.

my $file1 = $ARGV[0];
open my $fh1, '<', $file1 or die "Could not read from '$file1' because <$!>\n...program halting.";

Open in new window


Before you read/process File_111.txt, open File_222.txt and put its contents into a hash for easy lookup when you loop over File_111.txt.

The first arg to the split function should be a regex pattern, not a string.  In your case, I'd use a character class that splits on the colon and dot so that I can, in one step,  separate the 2 fields separated by the colon as well as splitting the second field on the dot to gain access to the part of the string that needs to be compared against the other file.

Please work on those points and post back with your results and more refined question on the part(s) that give you trouble.

At this point I don't wish to provide my own complete solution because this appears to be a learning exercise i.e., a class homework assignment.
0
 
gaurav sharmaAuthor Commented:
Appreciate your comments. But this is not an assignment of any sort. I am fixing somebody else's stuff here and don't have much time.
0
 
FishMongerCommented:
#!/usr/bin/perl

use strict;
use warnings;

my ($file1, $file2) = @ARGV;

open my $fh1, '<', $file1 or die "failed to open '$file1' <$!>";
open my $fh2, '<', $file2 or die "failed to open '$file2' <$!>";

my %f2;
while (<$fh2>) {
    chomp;
    $f2{$_}++;
}
close $fh2;

while (my $line = <$fh1>) {
    chomp $line;
    my $last_fld = (split /[:.]/, $line)[-1];
# or    my $last_fld = (split /\./, $line)[-1];

    if ($f2{$last_fld}) {
        print "$line has matched\n"
    }
    else {
        print "not match found\n";
    }
}
close $fh1;

Open in new window

0
 
gaurav sharmaAuthor Commented:
I ran the script with the attached files as  test.pl file_111.txt file_222.txt. It does not find a match.
file-222.txt
file-111.txt
0
 
FishMongerCommented:
I see 2 problems.

1) The contents of the files are reversed from what you stated in your opening post.

2) I thought the single quotes in the file where the lines will be split were there by mistake the same way as you stated for the other file so I did not account for those quotes in the parsing.

Here's an updated version and the output I recieve.
#!/usr/bin/perl

use strict;
use warnings;

my ($file1, $file2) = @ARGV;

open my $fh1, '<', $file1 or die "failed to open '$file1' <$!>";
open my $fh2, '<', $file2 or die "failed to open '$file2' <$!>";

my %f2;
while (<$fh2>) {
    chomp;
    $f2{$_}++;
}
close $fh2;

while (my $line = <$fh1>) {
    chomp $line;
    $line =~ s/'//g;
    my $last_fld = (split /\./, $line)[-1];

    if ($f2{$last_fld}) {
        print "$line has matched\n"
    }
    else {
        print "no match found\n";
    }
}
close $fh1;

Open in new window


c:\test>test.pl file_222.txt file_111.txt
PACKAGE BODY: ENCF.EM_K_ENTDAL_COMM_PRT_MMA has matched
no match found
no match found
PACKAGE BODY: TRON2000.DC_K_CWICDM_INT_MMA has matched
no match found
TYPE BODY: ENCF.NCF_ADMIN_OBJ has matched
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found
no match found

Open in new window

0
 
gaurav sharmaAuthor Commented:
Sorry about the confusion here.

1. The contents of the file_111.txt are of interest here. If a string in file_111.txt matches the string in    
     file_222.txt then print the complete  corresponding matched string from the file_222.txt. Example:

 NCF_ADMIN_OBJ in file_111.txt  is matched in file_222.txt      then echo

TYPE BODY: ENCF.NCF_ADMIN_OBJ has matched

else  echo

NCF_ADMIN_OBJ   is valid



2. The attached files are of correct format. No quotes in the file _111.txt
0
 
gaurav sharmaAuthor Commented:
If only  one string from file_111.txt matched the one in file_222.txt the output should be

TYPE BODY: ENCF.NCF_ADMIN_OBJ has matched
DC_K_CWICDM_INT_MMA is valid
EM_K_ENTDAL_COMM_PRT_MMA is valid
0
 
FishMongerCommented:
I'm not sure I understand the meaning of your last comment.

Is it matching using your specified requirements and producing the output you want?
0
 
gaurav sharmaAuthor Commented:
I am running the recent script  as

perl_test.pl file_222.txt file_111.txt      

::::::::::::::Desired output ::::::::::

 for example  :

Say the only one string in file_111.txt   NCF_ADMIN_OBJ has a match in the file_222.txt then the output should be
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
TYPE BODY: ENCF.NCF_ADMIN_OBJ has matched
EM_K_ENTDAL_COMM_PRT_MMA has no match
DC_K_CWICDM_INT_MMA has no match

::::::::::::::Current output ::::::::::::::::::::::::::::::::::::::

When I run the recent script that you have given I get the result as

TYPE BODY: ENCF.NCF_ADMIN_OBJ has matched  
'PACKAGE BODY: ENCF.EM_K_ENTDAL_COMM_PRT_MMA'   has no match
'PACKAGE BODY: ENCF.EM_K_ENTDAL_NCF_MMA'  has not matched
'PACKAGE BODY: TRON2000.DC_K_CONTROL_TECNICO_TRN' has no match
'PACKAGE BODY: TRON2000.DC_K_CWICDM_INT_MMA' has no match
'PACKAGE BODY: TRON2000.EM_K_PLAN_PAGO_TRN' has no match
'TYPE BODY: ENCF.NCF_ADMIN_OBJ' has no match
'TYPE BODY: ENCF.NCF_ALERTS_SCORING_OBJ' has no match
'TYPE BODY: ENCF.NCF_BANKRUPTCY_OBJ'  has no match
'TYPE BODY: ENCF.NCF_CHECK_SAVING_ACCOUNTS_OBJ'  has no match
'TYPE BODY: ENCF.NCF_COLLECTION_OBJ' has no match
'TYPE BODY: ENCF.NCF_CONSUMER_STMT_OBJ'  has no match
'TYPE BODY: ENCF.NCF_CREDIT_TRADES_OBJ' has no match
'TYPE BODY: ENCF.NCF_ERRORS_OBJ' has no match
'TYPE BODY: ENCF.NCF_FINANCIAL_COUNSELOR_OBJ'  has no match
27 rows


Let me know if you need further information

Thanks
0
 
FishMongerCommented:
The script I gave strips out the single quotes so if you're getting them in the output, then you must have altered the script.

Please post attachment of the script you ran and portions of both input files.
0
 
gaurav sharmaAuthor Commented:
Call to the script :

c:/scripts/perl_compare_wrksp_invalids.pl c:/test-1/workspace/invalids.txt c:/test-1/workspace/workspace.txt

output

PACKAGE BODY: TRON2000.tr_er_ty_yu_mma in the workspace is invalid ******* ERROR
PACKAGE BODY: TRON2000.ts_k_jrp_300901sfsfs0_mma in the workspace is valid
PACKAGE BODY: TRON2000.ts_k_jrp_3dfdssd011423230_mma in the workspace is valid
2 rows in the workspace is valid



desired output :

PACKAGE BODY: TRON2000.tr_er_ty_yu_mma in the workspace is invalid ******* ERROR
we_fg_hj_k_mma in the workspace is valid
fd_hf_ks_kd_mma in the workspace is valid
fg_cb_vn_kd_kl_mma in the workspace is valid
qw_kj_lk_oi_lm_mma in the workspace is valid
qw_df_vc_bh_lk_mma in the workspace is valid
ts_k_jrp_3dfdssd01140_mma in the workspace is valid
rtewtwert_dfsg_dafasd_mma in the workspace is valid
adf_afaf_afa_afA_mma in the workspace is valid


****Also the last line in workspace.txt which has number of rows has to be ignored.
 In this case the line is

2 rows
invalids.txt
perl-compare-wrksp-invalids.txt
workspace.txt
0
 
FishMongerCommented:
Where is the "PACKAGE BODY:" verbiage coming from?  Neither it nor the single quotes are from the perl script.

Is your wordpress app adding/reformatting the outpout of the script?
0
 
gaurav sharmaAuthor Commented:
There are not from the perl script. They are generated from a query in database.
0
 
gaurav sharmaAuthor Commented:
Its the same file_222.txt that I had intially.
0
 
FishMongerCommented:
Since this is coming from db queries outside of the perl script, you'll need to look at your code that is doing those queries and outputting the data.

You should start a new thread/question and provide that code.
0
 
gaurav sharmaAuthor Commented:
I thought a string comparison was possible with perl. Both the text files(invalids.txt and workspace.txt) are obtained using other scripts.

All I wanted to do is compare both the files in such a way that

1. Check if each string in workspace.txt is present in invalids.txt. If a match is found in the invalids.txt then print the corresponding line in invalids.txt

2. The above check in invalids.txt is to look for string between the dot(.)and colon(') at the end.
       
      For instance in the line 'PACKAGE BODY: TRON2000.ts_k_jrp_300901sfsfs0_mma' only consider

      ts_k_jrp_300901sfsfs0_mma


I am still not sure why we need to consider database specifics as we are only interested in the above format   in invalids.txt



3. If a match is found then print the complete corresponding line in invalids.txt or else just print


   the text from workspace.txt is valid.
0
 
FishMongerCommented:
Based on the sample input data files you posted the script I gave can not and will not produce the output you say you're getting.

So, either you modified the script or the content and format of the input files don't match what you've posted.

Also, you've been inconsistent in what you've said about the format of the lines i.e., you flip back and forth about them having single quotes.  Maybe those single quotes aren't really single quotes, but are actually the ` "smart quotes".
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 12
  • 9
Tackle projects and never again get stuck behind a technical roadblock.
Join Now