Solved

On Unix or Windows2K platform, need a random line sampler

Posted on 2002-04-02
9
278 Views
Last Modified: 2013-12-13
I have a 2000+ line text file and need to randomly select 250 lines from the file.

Any STANDARD or FREEWARE/DOWNLOADABLE tools to do this either in the Unix or Windows environments?

I have Unix (Compaq Tru64) or Windows (Win2K) available to do it on.  I've also got OpenVMS but I don't expect
that there's anybody here in Expert's Exchange who can answer questions on that venerable OS... if there are,
then fine, on there too -- since that's where I NEED the resultant text file of 250 lines.

Time dependent; have to have this file before Monday.
0
Comment
Question by:jlw011597
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
  • +3
9 Comments
 
LVL 22

Expert Comment

by:cookre
ID: 6914135
Well, Perl does have a randomizer, so:

1. Make a sorted array X of 250 random, unique integer between 1 and the number of lines in the file

2. Make pass through file, selecting the lines specified by X

I'll post some code in a day or two, if nothing better shows up or if nobody else has the time to do it.
0
 
LVL 3

Expert Comment

by:FlamingSword
ID: 6914271
PERL runs on both Unice and Windows (ActivePerl is best there)
0
 
LVL 24

Expert Comment

by:SunBow
ID: 6914307
Note that Perl can be useful as cgi or as means to understand how to utilize underlying OS kernel features via scripts.

I often find O'Reilly as a useful resource. For example:
http://www.perl.com/pub/q/faqs

Sample ReadMe:
http://www.perl.com/CPAN-local/modules/by-module/Crypt/Crypt-Random-1.11.readme
(filename: Crypt-Random-1.11.tar.gz)

http://www.activestate.com/
Has numerous code samples for
 Perl  
 Python  
 PHP  
 Tcl  
 XSLT  
-- you can get Windoze versions of needed code from them (yes, even freeware, open source, all them there neat fun words)
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 11

Expert Comment

by:griessh
ID: 6914311
250 consecutive lines or random lines?
Are there duplicates allowed?

======
Werner
0
 

Author Comment

by:jlw011597
ID: 6914412
<PRE>
From: griessh
                     250 consecutive lines or random lines?
                     Are there duplicates allowed?

Random lines, duplicates not allowed.  They're email addresses from a sample
population for a research study.  By choosing a random subset from a large population
the researchers hope to defuse complaints of SPAM by the selected individuals.

From: Others....    all suggested PERL.

          Ah, well... I don't subscribe to the Unix build-it-yourself school so was hoping
          for some actual application that did this.  But the researchers, when told it was
          a stumbling block, found a website (www.randomizer.org, I think) and had it
          supply a set of 250 random, no duplicates numbers in the entire set of 2000+
          records, and a 3rd party inserted the 2000+ records into an EXCEL spreadsheet,
          then did a query selecting the rows that matched the 250 random numbers.
   
So, done.  And delivered back to me via Email to my OpenVMS system where the
resultant file becomes the restricted access mailing list for sending the study request out to those 250 randomly selected members of the 2000+ member population.

0
 
LVL 11

Expert Comment

by:griessh
ID: 6914522
jlw

Great! I suggest to go to Community Support at http://www.experts-exchange.com/jsp/qList.jsp?ta=commspt , post a request (with this URL included) and ask them to PAQ this question and refund your points since you have your own solution.

======
Werner
0
 
LVL 24

Expert Comment

by:SunBow
ID: 6914750
qBasic can do rather quickly. It was included with NT4 'Server'!

Interesting, the no-duplicate. Either pre or post process one presumes.

I disagree with cookre on the array.. IMO the random numbers should come one at a time, no new number until the prior record processed. This means, of course, that a MUST disagree with randomizer, having prebuilt a list of numbers prior to runtime. What has 'appearance' of random, actually is not.

But since you are happy, I assume for your purpose it'll reduce your working set satisfactorily.

> the researchers hope

IMO, One unsolicited memo can be research, all subsequent having no opt-in are eSpam. Label it anything they like, if it looks like a duck, walks like a duck.....

> Time dependent; have to have this file before Monday.

hmm, probably a good editor would do as well, just delete lines, at random, until only 250 left... doesn't take all that long. I can delete faster than type.
0
 
LVL 22

Expert Comment

by:cookre
ID: 6915638
SunBow, the array IS built at runtime with a seed based on time so one doesn't always get the same sequence.  Moreover, the main purpose of the array is to keep track of those numbers already selected so they are not selected again.

jtw, griessh's suggesting is ideal, since the link to randomizer.org will likely be of value to others in the future.

(whew, now I don't have to actually write anything...)
0
 
LVL 1

Accepted Solution

by:
Moondancer earned 0 total points
ID: 6915800
Per your request in Community Support today, I have refunded your 100 points to you for this question and closing it by moving it to our PAQ at zero points.  The PAQ is our Previously Asked Question database.

http://www.experts-exchange.com/commspt/Q.20284482.html  In your request you stated that you found your own solution and will cut/paste it below.
___________________________________________________________

Please make this so.  If the points can't be refunded, then assign them to griessh -- while he didn't solve the problem, he's the one that suggests putting this into PAQ status so others can find it.  Another comment from
cookre says that his suggestion would help future users because the link supplied to www.randomizer.org will likely be of value to others.  Makes me think one of them, probably griessh, deserves the points.

Obviously if I select my OWN comment with the www.randomizer.org linkage in it as the accepted answer your system's going to balk, and if I select griessh's comment quoted above as the answer, to assign the points to him myself, then the comments containing www.randomizer.org won't be so noticable to subsequent searchers. So we need to have MY comment citing the randomizer site and the way we used it as the answer, but assign the points elsewise.  This make sense? --
___________________________________________________________
ABSOLUTELY...

You can now post a question in this same Topic Area to award points and entitle it Points for __expertname__ and in comments please paste the URL (Link) to this question. The expert will then add comments or propose an answer which you, in turn, accept to grade and close.

More information for you in the Community Support link.

:)
Moondancer - EE Moderator

0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A short article about problems I had with the new location API and permissions in Marshmallow
Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
Introduction to Processes

690 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question