?
Solved

Shuffling strings?

Posted on 2003-03-01
18
Medium Priority
?
661 Views
Last Modified: 2008-03-10
    I am a biologist attempting to write simple programs to manipulate DNAsequences.  I am attempting to self-teach myself C/C++ programming, so please excuse me if my questions seem incredibly simple or basic.  The books I have purchased at the bookstore are good foundations, but offer limited practical examples.  
     Is there a way to shuffle character arrays in the same way as the typical card shuffling codes?  I am seeking assistance in how I could go about "shuffling" an array of characters.  I have an array, let's just call it aString, and I want to shuffle the characters of that array into another array called shufString.  I want to be able to shuffle that string t quantity of times (t is fixed, based on pre-defined confidence levels).  
     I had thought of generating an index z that could be used to pick a character at random from aString as follows:

z=(int)floor((((double)random())/divisor)*strlen(aString);

Am I at least headed into the right direction or barking up the wrong tree?  And how would I go about placing the picked characters into the new array in the order they are chosen?  Any assistance or ideas is greatly apprecited.  Thanks in advance!  :-)
0
Comment
Question by:bgredfern
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 2
  • +4
18 Comments
 
LVL 1

Expert Comment

by:TheBeaver
ID: 8049532

ShuffleString(char *src, char *dst)
{
  int pos=0;

  while( pos < strlen(src) )
  {
    dst[pos] = src[rand() % strlen(src)];
    pos++;
  }
}

This is if you dont care about duplicates
0
 
LVL 1

Accepted Solution

by:
TheBeaver earned 200 total points
ID: 8049555
Now, if you do only want each character coming out once, we then have to mark off each character...

ShuffleString(char *src, char *dst)
{
 int pos=0, x;
 int len = strlen(src);            // Just because we use it so much
 char *used = malloc(strlen(src)); // I'm allocating memory because I dont know the length of the string

 memset(used, len, 0);             // Set everything to unused
 while( pos < len )
 {
   x = rand() % len;
   while( used[x] )              
     x = (x+1) % len;             // Its used so move on to the next. This code will also loop back to 0 so its guarenteed to find an unused character

   dst[pos] = src[x];
   used[x] = 1;
   pos++;
 }
 free( used);
}


0
 
LVL 8

Expert Comment

by:akshayxx
ID: 8050272
have you tried frying ur string .. if not here is a function that u can use on most of UNIX and clones..
this comes with GNUC library.. if not .. u can always get hold of the 'good and tested' code .
http://www.rt.com/man/strfry.3.html
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 8

Expert Comment

by:akshayxx
ID: 8050378
this is how GNUC people have done it .. i would highly recommend this .. u may need to do some changes .. let me know if it works as it is .. else we can help customise/port this to ur working needs/platform..

/* Copyright (C) 1992, 1996, 1999, 2002 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, write to the Free
   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
   02111-1307 USA.  */

#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

char *
strfry (char *string)
{
  static int init;
  static struct random_data rdata;
  size_t len, i;

  if (!init)
    {
      static char state[32];
      rdata.state = NULL;
      __initstate_r (time ((time_t *) NULL) ^ getpid (),
               state, sizeof (state), &rdata);
      init = 1;
    }

  len = strlen (string);
  for (i = 0; i < len; ++i)
    {
      int32_t j;
      char c;

      __random_r (&rdata, &j);
      j %= len;

      c = string[i];
      string[i] = string[j];
      string[j] = c;
    }

  return string;
}
0
 

Expert Comment

by:javabeanx
ID: 8052740
This is a C++ solution.

if your compiler provides stl support then use a function  
eg.
#include<algorithm>
#include<stdio.h>
void main()
{
 char aString[100];
 .
 .

 .
 n=strlen(abc);
 n--;
random_shuffle(&aString[0],&aString[n-1]);
}
0
 

Expert Comment

by:ramyaniv
ID: 8053597
Do you know the bioinformatics package GCG ?
There is an application for this purpose, it's called SHUFFLE and you can find about it in:
http://bioinfo.md.huji.ac.il/GCG10doc/shuffle.html
0
 

Expert Comment

by:michaelwynne
ID: 8064835
Bear in mind that no computer can generate a "true" random number. They only generate pseudo random numbers. Every time the program runs you will get the same sequence of numbers. This may or may not be important to you.

If you want to vary the pseudo random sequence you must call the rseed() function with a different seed value each time.

If you require "true" random numbers you will have to use something other than a computer to generate them.

Regards

Michael
0
 
LVL 20

Expert Comment

by:jmcg
ID: 10058702
Nothing has happened on this question in more than 10 months. It's time for cleanup!

My recommendation, which I will post in the Cleanup topic area, is to
split points between akshayxx and TheBeaver.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer
0
 
LVL 84

Expert Comment

by:ozo
ID: 10058808
Both akshayxx and TheBeaver's solutions produce a biased shuffle.
0
 
LVL 20

Expert Comment

by:jmcg
ID: 10059113
Thanks for taking a look, Ozo.

While I see problems with TheBeaver's solutions, the second one and the strfry( ) algorithm put forward by akshayxx appeared to me to be acceptable solutions. Would they be better if they used the high-order bits rather than the low order bits (I'm looking skeptically at those % len expressions)? Is one of the other offered answers better (I could not follow the link offered by  ramyaniv)?
0
 
LVL 84

Expert Comment

by:ozo
ID: 10059198
I'm not worried about using the low order bits instead of the high order.
with a good random number generator (maybe a dangerous assumption) it won't matter
but consider when len = 3
we take
  j %= len;
3 times, for 3^3=27 equally likely outcomes
we want to represent all 3!=6 sequences equally
but 27 is not a multiple of 6, so some sequences will appear more often than others.
0
 
LVL 20

Expert Comment

by:jmcg
ID: 10059385
I'm probably no longer qualified to dispute this with you. I'm not sure this counts as "bias" in a bad way, any more than the fact that flipping a coin 10 times will get you a combination of exactly 5 heads and 5 tails less than 25% of the time (.2461).
0
 
LVL 84

Expert Comment

by:ozo
ID: 10059448
fliping a fair coin 10 times should give any of the 1024 possibilites with equal likelyhood
there are 10!/(5!5!) ways to get exactly 5 heads and 5 tails so an unbiased coin should give this result with probability 252/1024
0
 
LVL 20

Expert Comment

by:jmcg
ID: 10059678
Just so. Applying the strfry function 27 times to a string of length 3, you'll get an assortment with unequal numbers of each permutation. If you apply it another 27 times, you'll get another, most likely different, assortment. In the long haul, all of the permutations will tend towards being equally likely to appear. The fact that 27 is not divisible by 6 does not make the algorithm biased.
0
 
LVL 84

Expert Comment

by:ozo
ID: 10059971
In the long run, all 27 possible ways of selecting j%3 three times will tend to appear equally often
but the 6 possible permutations cannot be distributed equally among the 27 different selections
so the 6 permutations will not tend toward being equally likely to appear
0
 
LVL 20

Expert Comment

by:jmcg
ID: 10067553
I see your reasoning, but I think this is like that "Monty Hall" problem. Shall we run an experiment to see if it convinces one way or another?  Perhaps as a result of all this interest, our friend redfern has accepted one of the solutions we considered inferior.
0
 
LVL 84

Expert Comment

by:ozo
ID: 10084313
The answer from TheBeaver is even worse.  It tends to produce runs of consecutive numbers.
For the record, the solution from  akshayxx needs to have its j %= len; replaced by j %= (i+1)
in order to produce a fair shuffle.

for (i = 0; i < len; ++i)
    {
      int32_t j;
      char c;

      __random_r (&rdata, &j);
      j %= (i+1);

      c = string[i];
      string[i] = string[j];
      string[j] = c;
    }

0
 
LVL 20

Expert Comment

by:jmcg
ID: 10085845
OK, I'm persuaded that ozo is absolutely correct about what's wrong with the strfry algorithm. My take on how it should be fixed is a little different:

for (i = 0; i+1 <len; ++i)
    {
      int32_t j;
      char c;

      __random_r (&rdata, &j);
      j = j % (len - i) + i; /* result is in range (i, len-1) */

      c = string[i];
      string[i] = string[j];
      string[j] = c;
    }

This way, element 0 of the result is chosen first as a random choice from among all of the elements in the initial array and exchanged into place. At each following step, the current element is chosen as a random choice amongst those elements that have not already been chosen. Our last choice is made when there are just two unchosen elements to choose from (so we never attempt the modulo operation with a divisor of 1).

Now it could be that ozo's algorithm, if it were to initialize the loop variable to 1 instead of 0, is mathematically equivalent to (and a few instructions shorter than) this. If we can establish that, it might be appropriate to file a bug report against the GNU libc strfry.c source at http://www-gnats.gnu.org:8080/cgi-bin/wwwgnats.pl (or whereever it is that GNU libc bugs are accepted now).

0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
Summary: This tutorial covers some basics of pointer, pointer arithmetic and function pointer. What is a pointer: A pointer is a variable which holds an address. This address might be address of another variable/address of devices/address of fu…
The goal of this video is to provide viewers with basic examples to understand how to create, access, and change arrays in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use switch statements in the C programming language.
Suggested Courses
Course of the Month12 days, 13 hours left to enroll

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question