[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Making a text file into an array, and substr

Posted on 2007-07-23
20
Medium Priority
?
244 Views
Last Modified: 2010-04-16
Hi guys!
Hope you masters can help with your wisdom.
What I have is the following..

===========================A text file called sorty.txt
SS0594Nxxxx
SS0594Nxxxx
SS0594Nxxxx
SS2834Nxxxx
SS2834Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx

etc etc
------------------------------ The above is just a snippet.

In the above are entries for computer names.

What I want to do with this text file is the following....
- Find UNIQUE entries and pump to a new file (eg.unique.txt).
- The script would read the sorty.txt file, and find unique entries starting from the 0 position, and length 6 characters.
- For example, if there were 4 lines starting with "SS0594" in sorty.txt, then the script would find and write  to a new file the line "SS0594".  
- Only 1 entry in the new file for SS0594, not four. Then, if there were 2 lines starting with "SS2887" in sorty.txt, the script would find and write to a new file the line "SS2887", and so on and so on.

sorty.txt                                                 unique.txt
SS0594Nxxxx                                       SS0594
SS0594Nxxxx                                       SS2834
SS0594Nxxxx                                       ST7523
SS2834Nxxxx                                  
SS2834Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx

============================================= So far I have the following:

open (FINDUNIQUE, "< sorty.txt") or die "can't open sorty.txt: $!";

$a = 0;
while( <FINDUNIQUE> )      {
   $Sys = $_;
      chomp ($Sys);
   @System[$a] = $Sys;
#print "a = $a - system = $System[$a]\n";
   $a++
}
      foreach $line (@System) {
      $six = substr($line,0,6);
                      print "$line\n";
         foreach $line1 (@System) {
           $newsix = substr ($line1, 0,6);      
            if ($six == $newsix) {
               shift (@System);
                
         }
      }
   }
   
======================================================

Any help greatly appreciated.
Somehow I have to (I think):
a) Turn the sorty.txt file into an array (slurp or something?)
b) Go through each line aind the first 6 characters of a line
c) If there are multiple lines with the same first 6 characters, then only use one unique.
I think I might have to use shift or something to move these entries from the array, along with a next function.

Thanks guys.
0
Comment
Question by:Simon336697
  • 9
  • 8
  • 2
  • +1
20 Comments
 
LVL 85

Expert Comment

by:ozo
ID: 19553964
perl -ne 's/(.{6}).*/$1/; print unless $h{$_}++' < sorty.txt > unique.txt
0
 
LVL 11

Assisted Solution

by:avizit
avizit earned 80 total points
ID: 19553967
sort filename | uniq -w 6  

does the above do what you want ?
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 19554056
there's a unixish way beside the perlish:
  cut -b1-6 sorty.txt|sort|uniq>unique.txt
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 1

Author Comment

by:Simon336697
ID: 19554093
Hi ozo!

Mate are you running this at a command line? I cant get this happening

perl -ne 's/(.{6}).*/$1/; print unless $h{$_}++' < sorty.txt > unique.txt
0
 
LVL 1

Author Comment

by:Simon336697
ID: 19554098
I would like to be able to learn a solution involving shift and substr if you guys are able to.
0
 
LVL 85

Expert Comment

by:ozo
ID: 19554111
What goes wrong when you try it?
0
 
LVL 85

Expert Comment

by:ozo
ID: 19554122
with substr
perl -lne 'substr($_,6)=""; print unless $h{$_}++'  < sorty.txt  > unique.txt
0
 
LVL 1

Author Comment

by:Simon336697
ID: 19554135
Hi Ozo!

Works great....my fault. I didnt put double quotes:

I had:
C:\>perl -ne 's/(.{6}).*/$1/; print unless $h{$_}++' < sorty.txt > unique.txt
Can't find string terminator "'" anywhere before EOF at -e line 1.

The following worked:
perl -ne "s/(.{6}).*/$1/; print unless $h{$_}++" < sorty.txt > unique.txt

Ozo!

Can I just ask you, what does this part do?

ne "s/(.{6}).*/$1/;



0
 
LVL 1

Author Comment

by:Simon336697
ID: 19554137
I know this works, but dont know how....dumb me....would love to know how he came up with that, and how it works.
0
 
LVL 51

Assisted Solution

by:ahoffmann
ahoffmann earned 200 total points
ID: 19554166
> s/(.{6}).*/$1/;
extract just the first 6 characters of the string (same as: cut -b1-6)
0
 
LVL 85

Expert Comment

by:ozo
ID: 19554170
Sorry,  DOS shell handles command line quotes differently
0
 
LVL 1

Author Comment

by:Simon336697
ID: 19554276
Thanks guys.
Im nearly there in my understanding of it:

perl -ne "s/(.{6}).*/$1/; print unless $h{$_}++" < sorty.txt > unique.txt

So does the above mean:

1) Use the -e switch to enter multilines
2) Use the -n swtich to treat the command line script like a while < > statement
3) Is the s/ part a substitution?
Not sure what the /$1 means
and
the h$ unless part

Sorry to bug you geniuses, but I really like to try and understand what you give me, even out of respect for your help.
0
 
LVL 1

Author Comment

by:Simon336697
ID: 19554279
Thats ok ozo...youre a champion
0
 
LVL 85

Expert Comment

by:ozo
ID: 19554359
perldoc perlrun
...
       -e commandline
            may be used to enter one line of program.  If -e is given, Perl
            will not look for a filename in the argument list.  Multiple -e
            commands may be given to build up a multi-line script.  Make sure
            to use semicolons where you would in a normal program.
...
       -n   causes Perl to assume the following loop around your program,
            which makes it iterate over filename arguments somewhat like sed
            -n or awk:

              LINE:
                while (<>) {
                    ...             # your program goes here
                }

            Note that the lines are not printed by default.  See -p to have
            lines printed.  If a file named by an argument cannot be opened
            for some reason, Perl warns you about it and moves on to the next
            file.

            Here is an efficient way to delete all files that haven't been
            modified for at least a week:

                find . -mtime +7 -print | perl -nle unlink

            This is faster than using the -exec switch of find because you
            don't have to start a process on every filename found.  It does
            suffer from the bug of mishandling newlines in pathnames, which
            you can fix if you follow the example under -0.

            "BEGIN" and "END" blocks may be used to capture control before or
            after the implicit program loop, just as in awk.


perldoc perlop
...
       s/PATTERN/REPLACEMENT/egimosx
               Searches a string for a pattern, and if found, replaces that
               pattern with the replacement text and returns the number of
               substitutions made.  Otherwise it returns false (specifically,
               the empty string).

perldoc perlvar
...
       $<digits>
               Contains the subpattern from the corresponding set of capturing
               parentheses from the last pattern match, not counting patterns
               matched in nested blocks that have been exited already.
               (Mnemonic: like \digits.)  These variables are all read-only
               and dynamically scoped to the current BLOCK.

0
 
LVL 85

Expert Comment

by:ozo
ID: 19554363
see also
perldoc -q duplicate
0
 
LVL 1

Author Comment

by:Simon336697
ID: 19554383
Thanks ozo!

So can I ask how does $1 replace (.{6}).*
What is $1 ozo?
0
 
LVL 85

Expert Comment

by:ozo
ID: 19554417
$1 contains what was matched by the 1st set of of capturing  parentheses, the (.{6})

$<digits>
               Contains the subpattern from the corresponding set of capturing
               parentheses from the last pattern match, not counting patterns
               matched in nested blocks that have been exited already.
               (Mnemonic: like \digits.)  These variables are all read-only
               and dynamically scoped to the current BLOCK.
0
 
LVL 1

Author Comment

by:Simon336697
ID: 19554439
Thank you so much ozo, i hate to ask..very last question:

unless $h{$_}++"

Im not sure how you got this, and what this is actually doing.

I promise no more questions after this.
0
 
LVL 85

Accepted Solution

by:
ozo earned 1720 total points
ID: 19554507
the 1st time $_ is seen $h{$_}++ will return false, and set $h{$_} to 1
the 2nd time $_ is seen $h{$_}++ will return 1, and set $h{$_} to 2
the 3rd time $_ is seen $h{$_}++ will return 21, and set $h{$_} to 3
...


perldoc -q duplicate
       How can I remove duplicate elements from a list or array?

       There are several possible ways, depending on whether the array is
       ordered and whether you wish to preserve the ordering.

       a)  If @in is sorted, and you want @out to be sorted: (this assumes all
           true values in the array)

               $prev = "not equal to $in[0]";
               @out = grep($_ ne $prev && ($prev = $_, 1), @in);

           This is nice in that it doesn't use much extra memory, simulating
           uniq(1)'s behavior of removing only adjacent duplicates.  The ", 1"
           guarantees that the expression is true (so that grep picks it up)
           even if the $_ is 0, "", or undef.

       b)  If you don't know whether @in is sorted:

               undef %saw;
               @out = grep(!$saw{$_}++, @in);

       c)  Like (b), but @in contains only small integers:

               @out = grep(!$saw[$_]++, @in);

       d)  A way to do (b) without any loops or greps:

               undef %saw;
               @saw{@in} = ();
               @out = sort keys %saw;  # remove sort if undesired

       e)  Like (d), but @in contains only small positive integers:

               undef @ary;
               @ary[@in] = @in;
               @out = grep {defined} @ary;

       But perhaps you should have been using a hash all along, eh?




0
 
LVL 1

Author Comment

by:Simon336697
ID: 19554566
You are brilliant.
Thank you OZO and AHOFF!!

0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
Suggested Courses
Course of the Month19 days, 18 hours left to enroll

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question