Simon336697
asked on
Making a text file into an array, and substr
Hi guys!
Hope you masters can help with your wisdom.
What I have is the following..
========================== =A text file called sorty.txt
SS0594Nxxxx
SS0594Nxxxx
SS0594Nxxxx
SS2834Nxxxx
SS2834Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
etc etc
-------------------------- ---- The above is just a snippet.
In the above are entries for computer names.
What I want to do with this text file is the following....
- Find UNIQUE entries and pump to a new file (eg.unique.txt).
- The script would read the sorty.txt file, and find unique entries starting from the 0 position, and length 6 characters.
- For example, if there were 4 lines starting with "SS0594" in sorty.txt, then the script would find and write to a new file the line "SS0594".
- Only 1 entry in the new file for SS0594, not four. Then, if there were 2 lines starting with "SS2887" in sorty.txt, the script would find and write to a new file the line "SS2887", and so on and so on.
sorty.txt unique.txt
SS0594Nxxxx SS0594
SS0594Nxxxx SS2834
SS0594Nxxxx ST7523
SS2834Nxxxx
SS2834Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
========================== ========== ========= So far I have the following:
open (FINDUNIQUE, "< sorty.txt") or die "can't open sorty.txt: $!";
$a = 0;
while( <FINDUNIQUE> ) {
$Sys = $_;
chomp ($Sys);
@System[$a] = $Sys;
#print "a = $a - system = $System[$a]\n";
$a++
}
foreach $line (@System) {
$six = substr($line,0,6);
print "$line\n";
foreach $line1 (@System) {
$newsix = substr ($line1, 0,6);
if ($six == $newsix) {
shift (@System);
}
}
}
========================== ========== ========== ========
Any help greatly appreciated.
Somehow I have to (I think):
a) Turn the sorty.txt file into an array (slurp or something?)
b) Go through each line aind the first 6 characters of a line
c) If there are multiple lines with the same first 6 characters, then only use one unique.
I think I might have to use shift or something to move these entries from the array, along with a next function.
Thanks guys.
Hope you masters can help with your wisdom.
What I have is the following..
==========================
SS0594Nxxxx
SS0594Nxxxx
SS0594Nxxxx
SS2834Nxxxx
SS2834Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
etc etc
--------------------------
In the above are entries for computer names.
What I want to do with this text file is the following....
- Find UNIQUE entries and pump to a new file (eg.unique.txt).
- The script would read the sorty.txt file, and find unique entries starting from the 0 position, and length 6 characters.
- For example, if there were 4 lines starting with "SS0594" in sorty.txt, then the script would find and write to a new file the line "SS0594".
- Only 1 entry in the new file for SS0594, not four. Then, if there were 2 lines starting with "SS2887" in sorty.txt, the script would find and write to a new file the line "SS2887", and so on and so on.
sorty.txt unique.txt
SS0594Nxxxx SS0594
SS0594Nxxxx SS2834
SS0594Nxxxx ST7523
SS2834Nxxxx
SS2834Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
ST7523Nxxxx
==========================
open (FINDUNIQUE, "< sorty.txt") or die "can't open sorty.txt: $!";
$a = 0;
while( <FINDUNIQUE> ) {
$Sys = $_;
chomp ($Sys);
@System[$a] = $Sys;
#print "a = $a - system = $System[$a]\n";
$a++
}
foreach $line (@System) {
$six = substr($line,0,6);
print "$line\n";
foreach $line1 (@System) {
$newsix = substr ($line1, 0,6);
if ($six == $newsix) {
shift (@System);
}
}
}
==========================
Any help greatly appreciated.
Somehow I have to (I think):
a) Turn the sorty.txt file into an array (slurp or something?)
b) Go through each line aind the first 6 characters of a line
c) If there are multiple lines with the same first 6 characters, then only use one unique.
I think I might have to use shift or something to move these entries from the array, along with a next function.
Thanks guys.
perl -ne 's/(.{6}).*/$1/; print unless $h{$_}++' < sorty.txt > unique.txt
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
there's a unixish way beside the perlish:
cut -b1-6 sorty.txt|sort|uniq>unique .txt
cut -b1-6 sorty.txt|sort|uniq>unique
ASKER
Hi ozo!
Mate are you running this at a command line? I cant get this happening
perl -ne 's/(.{6}).*/$1/; print unless $h{$_}++' < sorty.txt > unique.txt
Mate are you running this at a command line? I cant get this happening
perl -ne 's/(.{6}).*/$1/; print unless $h{$_}++' < sorty.txt > unique.txt
ASKER
I would like to be able to learn a solution involving shift and substr if you guys are able to.
What goes wrong when you try it?
with substr
perl -lne 'substr($_,6)=""; print unless $h{$_}++' < sorty.txt > unique.txt
perl -lne 'substr($_,6)=""; print unless $h{$_}++' < sorty.txt > unique.txt
ASKER
Hi Ozo!
Works great....my fault. I didnt put double quotes:
I had:
C:\>perl -ne 's/(.{6}).*/$1/; print unless $h{$_}++' < sorty.txt > unique.txt
Can't find string terminator "'" anywhere before EOF at -e line 1.
The following worked:
perl -ne "s/(.{6}).*/$1/; print unless $h{$_}++" < sorty.txt > unique.txt
Ozo!
Can I just ask you, what does this part do?
ne "s/(.{6}).*/$1/;
Works great....my fault. I didnt put double quotes:
I had:
C:\>perl -ne 's/(.{6}).*/$1/; print unless $h{$_}++' < sorty.txt > unique.txt
Can't find string terminator "'" anywhere before EOF at -e line 1.
The following worked:
perl -ne "s/(.{6}).*/$1/; print unless $h{$_}++" < sorty.txt > unique.txt
Ozo!
Can I just ask you, what does this part do?
ne "s/(.{6}).*/$1/;
ASKER
I know this works, but dont know how....dumb me....would love to know how he came up with that, and how it works.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Sorry, DOS shell handles command line quotes differently
ASKER
Thanks guys.
Im nearly there in my understanding of it:
perl -ne "s/(.{6}).*/$1/; print unless $h{$_}++" < sorty.txt > unique.txt
So does the above mean:
1) Use the -e switch to enter multilines
2) Use the -n swtich to treat the command line script like a while < > statement
3) Is the s/ part a substitution?
Not sure what the /$1 means
and
the h$ unless part
Sorry to bug you geniuses, but I really like to try and understand what you give me, even out of respect for your help.
Im nearly there in my understanding of it:
perl -ne "s/(.{6}).*/$1/; print unless $h{$_}++" < sorty.txt > unique.txt
So does the above mean:
1) Use the -e switch to enter multilines
2) Use the -n swtich to treat the command line script like a while < > statement
3) Is the s/ part a substitution?
Not sure what the /$1 means
and
the h$ unless part
Sorry to bug you geniuses, but I really like to try and understand what you give me, even out of respect for your help.
ASKER
Thats ok ozo...youre a champion
perldoc perlrun
...
-e commandline
may be used to enter one line of program. If -e is given, Perl
will not look for a filename in the argument list. Multiple -e
commands may be given to build up a multi-line script. Make sure
to use semicolons where you would in a normal program.
...
-n causes Perl to assume the following loop around your program,
which makes it iterate over filename arguments somewhat like sed
-n or awk:
LINE:
while (<>) {
... # your program goes here
}
Note that the lines are not printed by default. See -p to have
lines printed. If a file named by an argument cannot be opened
for some reason, Perl warns you about it and moves on to the next
file.
Here is an efficient way to delete all files that haven't been
modified for at least a week:
find . -mtime +7 -print | perl -nle unlink
This is faster than using the -exec switch of find because you
don't have to start a process on every filename found. It does
suffer from the bug of mishandling newlines in pathnames, which
you can fix if you follow the example under -0.
"BEGIN" and "END" blocks may be used to capture control before or
after the implicit program loop, just as in awk.
perldoc perlop
...
s/PATTERN/REPLACEMENT/egim osx
Searches a string for a pattern, and if found, replaces that
pattern with the replacement text and returns the number of
substitutions made. Otherwise it returns false (specifically,
the empty string).
perldoc perlvar
...
$<digits>
Contains the subpattern from the corresponding set of capturing
parentheses from the last pattern match, not counting patterns
matched in nested blocks that have been exited already.
(Mnemonic: like \digits.) These variables are all read-only
and dynamically scoped to the current BLOCK.
...
-e commandline
may be used to enter one line of program. If -e is given, Perl
will not look for a filename in the argument list. Multiple -e
commands may be given to build up a multi-line script. Make sure
to use semicolons where you would in a normal program.
...
-n causes Perl to assume the following loop around your program,
which makes it iterate over filename arguments somewhat like sed
-n or awk:
LINE:
while (<>) {
... # your program goes here
}
Note that the lines are not printed by default. See -p to have
lines printed. If a file named by an argument cannot be opened
for some reason, Perl warns you about it and moves on to the next
file.
Here is an efficient way to delete all files that haven't been
modified for at least a week:
find . -mtime +7 -print | perl -nle unlink
This is faster than using the -exec switch of find because you
don't have to start a process on every filename found. It does
suffer from the bug of mishandling newlines in pathnames, which
you can fix if you follow the example under -0.
"BEGIN" and "END" blocks may be used to capture control before or
after the implicit program loop, just as in awk.
perldoc perlop
...
s/PATTERN/REPLACEMENT/egim
Searches a string for a pattern, and if found, replaces that
pattern with the replacement text and returns the number of
substitutions made. Otherwise it returns false (specifically,
the empty string).
perldoc perlvar
...
$<digits>
Contains the subpattern from the corresponding set of capturing
parentheses from the last pattern match, not counting patterns
matched in nested blocks that have been exited already.
(Mnemonic: like \digits.) These variables are all read-only
and dynamically scoped to the current BLOCK.
see also
perldoc -q duplicate
perldoc -q duplicate
ASKER
Thanks ozo!
So can I ask how does $1 replace (.{6}).*
What is $1 ozo?
So can I ask how does $1 replace (.{6}).*
What is $1 ozo?
$1 contains what was matched by the 1st set of of capturing parentheses, the (.{6})
$<digits>
Contains the subpattern from the corresponding set of capturing
parentheses from the last pattern match, not counting patterns
matched in nested blocks that have been exited already.
(Mnemonic: like \digits.) These variables are all read-only
and dynamically scoped to the current BLOCK.
$<digits>
Contains the subpattern from the corresponding set of capturing
parentheses from the last pattern match, not counting patterns
matched in nested blocks that have been exited already.
(Mnemonic: like \digits.) These variables are all read-only
and dynamically scoped to the current BLOCK.
ASKER
Thank you so much ozo, i hate to ask..very last question:
unless $h{$_}++"
Im not sure how you got this, and what this is actually doing.
I promise no more questions after this.
unless $h{$_}++"
Im not sure how you got this, and what this is actually doing.
I promise no more questions after this.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
You are brilliant.
Thank you OZO and AHOFF!!
Thank you OZO and AHOFF!!