Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
• Status: Solved
• Priority: Medium
• Security: Public
• Views: 666
• Last Modified:

Shuffling strings?

I am a biologist attempting to write simple programs to manipulate DNAsequences.  I am attempting to self-teach myself C/C++ programming, so please excuse me if my questions seem incredibly simple or basic.  The books I have purchased at the bookstore are good foundations, but offer limited practical examples.
Is there a way to shuffle character arrays in the same way as the typical card shuffling codes?  I am seeking assistance in how I could go about "shuffling" an array of characters.  I have an array, let's just call it aString, and I want to shuffle the characters of that array into another array called shufString.  I want to be able to shuffle that string t quantity of times (t is fixed, based on pre-defined confidence levels).
I had thought of generating an index z that could be used to pick a character at random from aString as follows:

z=(int)floor((((double)random())/divisor)*strlen(aString);

Am I at least headed into the right direction or barking up the wrong tree?  And how would I go about placing the picked characters into the new array in the order they are chosen?  Any assistance or ideas is greatly apprecited.  Thanks in advance!  :-)
0
bgredfern
Asked:
• 6
• 5
• 2
• +4
1 Solution

Commented:

ShuffleString(char *src, char *dst)
{
int pos=0;

while( pos < strlen(src) )
{
dst[pos] = src[rand() % strlen(src)];
pos++;
}
}

This is if you dont care about duplicates
0

Commented:
Now, if you do only want each character coming out once, we then have to mark off each character...

ShuffleString(char *src, char *dst)
{
int pos=0, x;
int len = strlen(src);            // Just because we use it so much
char *used = malloc(strlen(src)); // I'm allocating memory because I dont know the length of the string

memset(used, len, 0);             // Set everything to unused
while( pos < len )
{
x = rand() % len;
while( used[x] )
x = (x+1) % len;             // Its used so move on to the next. This code will also loop back to 0 so its guarenteed to find an unused character

dst[pos] = src[x];
used[x] = 1;
pos++;
}
free( used);
}

0

Commented:
have you tried frying ur string .. if not here is a function that u can use on most of UNIX and clones..
this comes with GNUC library.. if not .. u can always get hold of the 'good and tested' code .
http://www.rt.com/man/strfry.3.html
0

Commented:
this is how GNUC people have done it .. i would highly recommend this .. u may need to do some changes .. let me know if it works as it is .. else we can help customise/port this to ur working needs/platform..

/* Copyright (C) 1992, 1996, 1999, 2002 Free Software Foundation, Inc.
This file is part of the GNU C Library.

The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.

The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, write to the Free
Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA.  */

#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

char *
strfry (char *string)
{
static int init;
static struct random_data rdata;
size_t len, i;

if (!init)
{
static char state[32];
rdata.state = NULL;
__initstate_r (time ((time_t *) NULL) ^ getpid (),
state, sizeof (state), &rdata);
init = 1;
}

len = strlen (string);
for (i = 0; i < len; ++i)
{
int32_t j;
char c;

__random_r (&rdata, &j);
j %= len;

c = string[i];
string[i] = string[j];
string[j] = c;
}

return string;
}
0

Commented:
This is a C++ solution.

if your compiler provides stl support then use a function
eg.
#include<algorithm>
#include<stdio.h>
void main()
{
char aString[100];
.
.

.
n=strlen(abc);
n--;
random_shuffle(&aString[0],&aString[n-1]);
}
0

Commented:
Do you know the bioinformatics package GCG ?
There is an application for this purpose, it's called SHUFFLE and you can find about it in:
http://bioinfo.md.huji.ac.il/GCG10doc/shuffle.html
0

Commented:
Bear in mind that no computer can generate a "true" random number. They only generate pseudo random numbers. Every time the program runs you will get the same sequence of numbers. This may or may not be important to you.

If you want to vary the pseudo random sequence you must call the rseed() function with a different seed value each time.

If you require "true" random numbers you will have to use something other than a computer to generate them.

Regards

Michael
0

OwnerCommented:
Nothing has happened on this question in more than 10 months. It's time for cleanup!

My recommendation, which I will post in the Cleanup topic area, is to
split points between akshayxx and TheBeaver.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer
0

Commented:
Both akshayxx and TheBeaver's solutions produce a biased shuffle.
0

OwnerCommented:
Thanks for taking a look, Ozo.

While I see problems with TheBeaver's solutions, the second one and the strfry( ) algorithm put forward by akshayxx appeared to me to be acceptable solutions. Would they be better if they used the high-order bits rather than the low order bits (I'm looking skeptically at those % len expressions)? Is one of the other offered answers better (I could not follow the link offered by  ramyaniv)?
0

Commented:
I'm not worried about using the low order bits instead of the high order.
with a good random number generator (maybe a dangerous assumption) it won't matter
but consider when len = 3
we take
j %= len;
3 times, for 3^3=27 equally likely outcomes
we want to represent all 3!=6 sequences equally
but 27 is not a multiple of 6, so some sequences will appear more often than others.
0

OwnerCommented:
I'm probably no longer qualified to dispute this with you. I'm not sure this counts as "bias" in a bad way, any more than the fact that flipping a coin 10 times will get you a combination of exactly 5 heads and 5 tails less than 25% of the time (.2461).
0

Commented:
fliping a fair coin 10 times should give any of the 1024 possibilites with equal likelyhood
there are 10!/(5!5!) ways to get exactly 5 heads and 5 tails so an unbiased coin should give this result with probability 252/1024
0

OwnerCommented:
Just so. Applying the strfry function 27 times to a string of length 3, you'll get an assortment with unequal numbers of each permutation. If you apply it another 27 times, you'll get another, most likely different, assortment. In the long haul, all of the permutations will tend towards being equally likely to appear. The fact that 27 is not divisible by 6 does not make the algorithm biased.
0

Commented:
In the long run, all 27 possible ways of selecting j%3 three times will tend to appear equally often
but the 6 possible permutations cannot be distributed equally among the 27 different selections
so the 6 permutations will not tend toward being equally likely to appear
0

OwnerCommented:
I see your reasoning, but I think this is like that "Monty Hall" problem. Shall we run an experiment to see if it convinces one way or another?  Perhaps as a result of all this interest, our friend redfern has accepted one of the solutions we considered inferior.
0

Commented:
The answer from TheBeaver is even worse.  It tends to produce runs of consecutive numbers.
For the record, the solution from  akshayxx needs to have its j %= len; replaced by j %= (i+1)
in order to produce a fair shuffle.

for (i = 0; i < len; ++i)
{
int32_t j;
char c;

__random_r (&rdata, &j);
j %= (i+1);

c = string[i];
string[i] = string[j];
string[j] = c;
}

0

OwnerCommented:
OK, I'm persuaded that ozo is absolutely correct about what's wrong with the strfry algorithm. My take on how it should be fixed is a little different:

for (i = 0; i+1 <len; ++i)
{
int32_t j;
char c;

__random_r (&rdata, &j);
j = j % (len - i) + i; /* result is in range (i, len-1) */

c = string[i];
string[i] = string[j];
string[j] = c;
}

This way, element 0 of the result is chosen first as a random choice from among all of the elements in the initial array and exchanged into place. At each following step, the current element is chosen as a random choice amongst those elements that have not already been chosen. Our last choice is made when there are just two unchosen elements to choose from (so we never attempt the modulo operation with a divisor of 1).

Now it could be that ozo's algorithm, if it were to initialize the loop variable to 1 instead of 0, is mathematically equivalent to (and a few instructions shorter than) this. If we can establish that, it might be appropriate to file a bug report against the GNU libc strfry.c source at http://www-gnats.gnu.org:8080/cgi-bin/wwwgnats.pl (or whereever it is that GNU libc bugs are accepted now).

0

Featured Post

• 6
• 5
• 2
• +4
Tackle projects and never again get stuck behind a technical roadblock.