• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 279
  • Last Modified:

Perl Help

Hello,

I have a file with approximately 700,000 records. The records consist of phrases anywhere between 3-10 on each line maybe more.

I'm looking for a way to remove words which occur less than 1,000 times across the entire file.

I use the following command to generate a list of words(instances):

tr ' ' '\n' |sort |uniq -c

I tried using the following command but it did not produce any results:

perl -ne 's/(\S+)/$s{$1}/g,print,next if !@ARGV; ++$s{$_} for split; if( eof ){ $s{$_}=$s{$_}>=1000&&$_ for keys %s}' file.txt > newfile.txt

Thanks in advance.
0
faithless1
Asked:
faithless1
  • 2
1 Solution
 
tel2Commented:
Hi FL1,

If you're happy with the word frequency list that your "tr..." line is producing, then there's probably not much need for Perl here.

Please replace your entire "tr... " line with the following, where 'wordfile' is the input file:
    tr ' ' '\n' <wordfile | sort | uniq -c | grep -v '^      '
and if you're happy with that, but you don't want the frequencies, use this instead:
    tr ' ' '\n' <wordfile | sort | uniq -c | grep -v '^      ' | cut -c9-
Note: There are 6 spaces after the "^".

If that doesn't work, please post the exact code you're running, and tell us what went wrong.
0
 
tel2Commented:
...and in case your version of 'uniq' spaces things differently from mine, here are alternatives to the above 2 commands:
    tr ' ' '\n' <in1 | sort | uniq -c | grep "[0-9]\{4,\}"
and to remove the frequencies:
    tr ' ' '\n' <in1 | sort | uniq -c | grep "[0-9]\{4,\}" | awk '{print $2}'
0
 
namethisCommented:
Your perl script is okay, you just need to specify file.txt twice:
perl -ne 's/(\S+)/$s{$1}/g,print,next if !@ARGV; ++$s{$_} for split; if( eof ){ $s{$_}=$s{$_}>=1000&&$_ for keys %s}' file.txt file.txt > newfile.txt

Open in new window

0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now