Link to home
Start Free TrialLog in
Avatar of Europa MacDonald
Europa MacDonaldFlag for United Kingdom of Great Britain and Northern Ireland

asked on

finding repeat instances of data with perl

Hello

I have list of data

345 654 345 432 236
345 654 345 432 236
345 654 345 432 236
654 345 345 432 236
654 345 345 432 236
654 345 345 432 236
710 654 345 345 436
710 654 345 345 432
710 654 345 345 433
710 654 345 345 433
710 654 345 345 433

through which I need to find repeat entries row on row, over about one million rows.

Manually I have to take the first row and check for a repeat of any of the values of that row with the next row. If there are any repeat values, then I have to mark down how many. Then I move onto the next row and repeat the process.

I then have to tally how many rows had whatever number of repeated values.

so the out put would look like

4
4
4
0
2
2
3
3
4
4
(for each row)

total

0 values were repeated on 1 rows
2  values were repeated on 2 rows
3   values were repeated on 3 rows
4   values were repeated on 5 rows

Could someone help me for code with this, output to a VIM file please ?
Avatar of arnold
arnold
Flag of United States of America image

See my response to your other question, ref array. Using entries in the array you can count whether a single line has a repeating pattern.

Not sure what your skillset is, programming languages. An implementation for your requests starts with logically laying out what your procedure would be if done manually, the. Choose a mechanism to achieve it.

The input, intermediate output, and final output do not seem to bare the basis on which each is reflected.
Avatar of Europa MacDonald

ASKER

** I know very little about perl, or programming. I am really just looking for a bit of code which I can insert in a .pl file, alongside a data file and get the out put that I require .

Thanks
I do not understand the basis on which the output you want in this question.

Given your 11 row data example
Are you counting individual numbers/characters appearance in the line?
I have to take any given row, for example the first row

345 654 345 432 236

and compare every value in it with every value in the row after it (in my example they just happen to be the same)

345 654 345 432 236

So in this case the return would be 5

Then to take that second row and compare every value within it with the third row

345 654 345 432 236

Again, in this case the return would be 5

Then to take that third row and compare it with every value in the fourth row and so on
Substr is a substring function that you can loop through the row
There are different ways
Hi Michael,

Put this in a script called cmp.pl, and make it executable:
#!/usr/bin/perl

while (<>)
{
        @line = split / /;
        if (@last_line)
        {
                $match = 0;
                for $i (0..$#line)
                { $match ++ if $line[$i] eq $last_line[$i] }
                print "$match\n";
                $match_tot += $match;
        }
        @last_line = @line;
}
print "$match_tot\n";

Open in new window

Then run it like this:
    ./cmp.pl <cmp.in >cmp.out

And cmp.out should then contain this:
5
5
3
5
5
1
4
4
5
5
42

Questions:
Q1. > "Could someone help me for code with this, output to a VIM file please ?"
       What do you mean by a VIM file?  Are you talking about something you can open with vim (vi improved)?
Q2. Does my answer above do what you want?  If not, what output do you expect from that input data?
Q3. Why does the output you supplied in your original question (which starts with "4") not relate to what you said in your post #42169102 (which starts with "5")?
Q4. What operating system are you using?  So far I have assumed Unix/Linux?
A1 yes, I am referring to the VIM editor - I am using windows 10
A2 the code you have provided does work, but it returns a list of "0"
A3 I have just selected different sample data

thankyou
Add to tel2's script the sort function to first sort the data.  Operate on the sorted data.
Hi Europa (sorry I called you Michael before - not sure where I got that from).

More questions:
Q5. When you say "Could someone help me for code with this, output to a VIM file please ?", what do you mean by this, exactly?  What do you mean by VIM file?
Q6. Do you want the source code of the script in a VIM file, or you want the output of the program to go into a VIM file?
Q7. What is the extension of a VIM file?  .vim?  (I don't usually use Perl or VIM in the Windows environment - I use them in Linux, so maybe that's why this confuses me.)
Q8. When you say 'the code you have provided does work, but it returns a list of "0"', in what respect does it work?  Do you mean you're getting no error messages or what?
Q9. Did you run this from the command (cmd) prompt, or PowerShell, or what?
Q10. Did you put the input data in the cmp.in file before running the script?  (Sorry, I forgot to spell that requirement out before, if it wasn't obvious.)
Q11. Please attach (don't just copy/paste) the exact input data you used when you got the list of "0" output.
Q12. Please give me the exact command you used when you got the result you asked for.
Q13. Picking up on Jan's suggestion, in what way is the input data meant to be sorted before the comparison?
Q14. Can I assume the input data is already sorted into the required order, or do I need to sort it?

Thanks.
tel2
5. I am using a text editor called VIM. the extension is .vim - I am using windows 10
6. I could take the source code from the screen, or in a txt file.
7. answered above
8. no error messages, there is output to a file. I have attached it. I ran it again on a sample set of data and the out put has changed slightly, but it is still wrong.
*** had to change extension to .txt because .vim files can not be uploaded to here. I use .vim files because the amount of data I have is too much for .txt notepad on windows ***
9. I ran it from the command line - but if you could alter it so that I can just double click it and it runs, that would be fantastic. Im not great at navigating he command line yet.
10 yes. that file is now called 003inputa.txt -
11
the original sample with rows to be counted:
 003inputa.txt
The code:
003.txt
the output from the code:
003outputa.txt
The desired output from the code:
003outputb.txt
12 I used 003.pl<003inputa.txt>003outputa.txt at the command line

13+14 the input data is already sorted into rows within either txt or vim files. doesnt need to be sorted any further. Or if I need to sort, it needs sorted to different criteria which I can do with another script to keep things simple (for me)

thankyou very much


Europa ;)
ASKER CERTIFIED SOLUTION
Avatar of tel2
tel2
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
tel2 thanks for your help. Its not so much that its a moving target, more that I am working at this end too and also with the questions I realise that I have not asked the original question properly.
OK, thanks for the points.