Link to home
Start Free TrialLog in
Avatar of Simon336697
Simon336697Flag for Australia

asked on

Making a text file into an array, and substr

Hi guys!
Hope you masters can help with your wisdom.
What I have is the following..

===========================A text file called sorty.txt
SS0594Nxxxx
SS0594Nxxxx
SS0594Nxxxx
SS2834Nxxxx
SS2834Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx

etc etc
------------------------------ The above is just a snippet.

In the above are entries for computer names.

What I want to do with this text file is the following....
- Find UNIQUE entries and pump to a new file (eg.unique.txt).
- The script would read the sorty.txt file, and find unique entries starting from the 0 position, and length 6 characters.
- For example, if there were 4 lines starting with "SS0594" in sorty.txt, then the script would find and write  to a new file the line "SS0594".  
- Only 1 entry in the new file for SS0594, not four. Then, if there were 2 lines starting with "SS2887" in sorty.txt, the script would find and write to a new file the line "SS2887", and so on and so on.

sorty.txt                                                 unique.txt
SS0594Nxxxx                                       SS0594
SS0594Nxxxx                                       SS2834
SS0594Nxxxx                                       ST7523
SS2834Nxxxx                                  
SS2834Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx

============================================= So far I have the following:

open (FINDUNIQUE, "< sorty.txt") or die "can't open sorty.txt: $!";

$a = 0;
while( <FINDUNIQUE> )      {
   $Sys = $_;
      chomp ($Sys);
   @System[$a] = $Sys;
#print "a = $a - system = $System[$a]\n";
   $a++
}
      foreach $line (@System) {
      $six = substr($line,0,6);
                      print "$line\n";
         foreach $line1 (@System) {
           $newsix = substr ($line1, 0,6);      
            if ($six == $newsix) {
               shift (@System);
                
         }
      }
   }
   
======================================================

Any help greatly appreciated.
Somehow I have to (I think):
a) Turn the sorty.txt file into an array (slurp or something?)
b) Go through each line aind the first 6 characters of a line
c) If there are multiple lines with the same first 6 characters, then only use one unique.
I think I might have to use shift or something to move these entries from the array, along with a next function.

Thanks guys.
Avatar of ozo
ozo
Flag of United States of America image

perl -ne 's/(.{6}).*/$1/; print unless $h{$_}++' < sorty.txt > unique.txt
SOLUTION
Avatar of avizit
avizit

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
there's a unixish way beside the perlish:
  cut -b1-6 sorty.txt|sort|uniq>unique.txt
Avatar of Simon336697

ASKER

Hi ozo!

Mate are you running this at a command line? I cant get this happening

perl -ne 's/(.{6}).*/$1/; print unless $h{$_}++' < sorty.txt > unique.txt
I would like to be able to learn a solution involving shift and substr if you guys are able to.
What goes wrong when you try it?
with substr
perl -lne 'substr($_,6)=""; print unless $h{$_}++'  < sorty.txt  > unique.txt
Hi Ozo!

Works great....my fault. I didnt put double quotes:

I had:
C:\>perl -ne 's/(.{6}).*/$1/; print unless $h{$_}++' < sorty.txt > unique.txt
Can't find string terminator "'" anywhere before EOF at -e line 1.

The following worked:
perl -ne "s/(.{6}).*/$1/; print unless $h{$_}++" < sorty.txt > unique.txt

Ozo!

Can I just ask you, what does this part do?

ne "s/(.{6}).*/$1/;



I know this works, but dont know how....dumb me....would love to know how he came up with that, and how it works.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry,  DOS shell handles command line quotes differently
Thanks guys.
Im nearly there in my understanding of it:

perl -ne "s/(.{6}).*/$1/; print unless $h{$_}++" < sorty.txt > unique.txt

So does the above mean:

1) Use the -e switch to enter multilines
2) Use the -n swtich to treat the command line script like a while < > statement
3) Is the s/ part a substitution?
Not sure what the /$1 means
and
the h$ unless part

Sorry to bug you geniuses, but I really like to try and understand what you give me, even out of respect for your help.
Thats ok ozo...youre a champion
perldoc perlrun
...
       -e commandline
            may be used to enter one line of program.  If -e is given, Perl
            will not look for a filename in the argument list.  Multiple -e
            commands may be given to build up a multi-line script.  Make sure
            to use semicolons where you would in a normal program.
...
       -n   causes Perl to assume the following loop around your program,
            which makes it iterate over filename arguments somewhat like sed
            -n or awk:

              LINE:
                while (<>) {
                    ...             # your program goes here
                }

            Note that the lines are not printed by default.  See -p to have
            lines printed.  If a file named by an argument cannot be opened
            for some reason, Perl warns you about it and moves on to the next
            file.

            Here is an efficient way to delete all files that haven't been
            modified for at least a week:

                find . -mtime +7 -print | perl -nle unlink

            This is faster than using the -exec switch of find because you
            don't have to start a process on every filename found.  It does
            suffer from the bug of mishandling newlines in pathnames, which
            you can fix if you follow the example under -0.

            "BEGIN" and "END" blocks may be used to capture control before or
            after the implicit program loop, just as in awk.


perldoc perlop
...
       s/PATTERN/REPLACEMENT/egimosx
               Searches a string for a pattern, and if found, replaces that
               pattern with the replacement text and returns the number of
               substitutions made.  Otherwise it returns false (specifically,
               the empty string).

perldoc perlvar
...
       $<digits>
               Contains the subpattern from the corresponding set of capturing
               parentheses from the last pattern match, not counting patterns
               matched in nested blocks that have been exited already.
               (Mnemonic: like \digits.)  These variables are all read-only
               and dynamically scoped to the current BLOCK.

see also
perldoc -q duplicate
Thanks ozo!

So can I ask how does $1 replace (.{6}).*
What is $1 ozo?
$1 contains what was matched by the 1st set of of capturing  parentheses, the (.{6})

$<digits>
               Contains the subpattern from the corresponding set of capturing
               parentheses from the last pattern match, not counting patterns
               matched in nested blocks that have been exited already.
               (Mnemonic: like \digits.)  These variables are all read-only
               and dynamically scoped to the current BLOCK.
Thank you so much ozo, i hate to ask..very last question:

unless $h{$_}++"

Im not sure how you got this, and what this is actually doing.

I promise no more questions after this.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You are brilliant.
Thank you OZO and AHOFF!!