Link to home
Start Free TrialLog in
Avatar of jeaniem
jeaniem

asked on

A concatenate formula to create a list of all and any possible combinations of rows in excel (OSX 2008)

Hello,

I have looked into every nook and simply cannot find the exit here, so if anybody can, please provide some insight.. (Mac OSX user - MS Excel 2008 (no VBA)

I have a list of content categories (string) from different sources (index) and my task is to create a list for a mapping document that will concatenate (or not) each row from each source in all and any combination(s) (there are about 1000).

This formula is the closest that I have found (assumes 5 columns (not rows, but I can transpose them):

=$A$1 & IF(MOD(ROW(),2)>=1,"+" & B$1,"")& IF(MOD(ROW(),4)>=2,"+" & C$1,"")& IF(MOD(ROW(),8)>=4,"+" & D$1,"")& IF(MOD(ROW(),16)>=8,"+" & E$1,"")

It looks as though the iterations (n^y) are programmed manually, although I could conceivably see these intervals being read form a list elsewhere.

I will need to keep the association between each category and its source index (maybe some kind of delimiter within another) and also apply a new index to each resulting group (unique to group).

I hope that's enough information?

Many thanks in advance!

AF
Avatar of Runrigger
Runrigger
Flag of United Kingdom of Great Britain and Northern Ireland image

can you attach a brief sample of source data + target list?
Avatar of jeaniem
jeaniem

ASKER

HI Runrigger,

Extremely kind of you to come back to me - I'm no mathematician except that this is quite an interesting problem (conceptually..).

Find attached - I did this by hand so there could be some errors although I think it's cool as I can see N^Y patters emerging here...

Thank you very much indeed for having a look at this.

AF

NB.  Here we're working with 4 rows (producing 31 in target) - I have roughly 1000!
SRC-TGT-data-.xls
Avatar of jeaniem

ASKER

Sorry -

Just to add, the sample is on the 2nd tab.. forgot to delete the first.

AF
Yes 'transposed' your formula works perfectly for the number of categories you have right now.
=$A$7 & IF(MOD(ROW(),2)>=1,"+" & A$8,"")& IF(MOD(ROW(),4)>=2,"+" & A$9,"")& IF(MOD(ROW(),8)>=4,"+" & A$10,"")& IF(MOD(ROW(),16)>=8,"+" & A$11,"")
If you are looking at up to 1000 categories, you could never get sufficient characters in your formula bar to cope.

So we will need to right an UDF (user defined formula) to do the concatenation, as in VBA we can loop.
I'll play around and see what comes out...
Avatar of jeaniem

ASKER

Thank you very much!  I just wish that MS provided VBA in Mac Excel as it's restrictive otherwise...
I notice in your list that you don't have the permutation
electro + ballet   (as an example)
all of them start with rock.
Do you need these other permutations as well?
Avatar of jeaniem

ASKER

Yes, sorry - I did it by hand and it's more than possible that I missed some off.

NB.  Are you sure that you're looking at the second tab, pls disregard the first.
No I was looking at tab1. I don't understand what I am seeing in Tab2 or the logic behind the CPindex and Groupindex. What does this have to do with permutations?
Avatar of jeaniem

ASKER

Apologies - my bad:

CP_Index is the index from the source (CP means Content Provider in my private universe!), and group index a new reference created for the target so that I can identify each different group of combinations / permutations after the event.

In this target data, rather than grouping each in a string and delimiting with a "+", I have organised them into rows where each different "string" (in tab 1 terms) is renamed a "group" and given a new index (Group_Index) to link them together.

Have you heard of Real Studio as an alternative to VBA on Mac?  I'm downloading it now.
I only know that VBA on Mac is being discontinued by Microsoft. I don't use MacOS so no knowledge about any details.

Back to your problem: take line 19 - theatre 3 12
So I assume this is the 'short notation' for ATMs + theatre
and the number 12 is just a sequential number .

If you were to use binary numbers for your index, then you can read out the permutation straight away:
rock  0001
musicals 0010
theatre 0100
ATM   1000

so: 1010 is ATM & musicals

That will make it far easier to create your list, if you know what I mean.

The only thing I don't understand is how you want to use this list on Tab2. You now have 'musicals' on 9 locations, so what are these going to be linked to?
jeaniem, it is still not very clear what it is that you are trying to achieve.

How exactly do you derive the index, see first 7 rows, what does this mean?

rock      1      1
rock      1      2
musicals      2      2
rock      1      4
theatre      3      4
rock      1      5
ATM's      4      5
RunRigger,
attached a kind of commented list:

SRC-TGT-data-.xls
Avatar of jeaniem

ASKER

Yes, that's basically it - effectively I'm transposing it back into a list:

"line1:  musicals+ATM's+theatre"

can also be described as:

Category             Line
musicals             1
ATM's                 1
theatre                1

...where line 1 represents the group index (ie. a number assigned to the target iteration).

I need to carry over more data however as musicals could conceivably come from more than one provider and I'll need to configure at that level of granularity:  I've therefore added a new column in the target data that inherits the source "CP" (content provider) index:

Category            CP_Index             Line ("Group Index")
musicals             1                           1
ATM's                 2                           1
theatre                3                           1

- - - - - - - - - - - -

Does that help at all?  Honestly though, this is mostly just formatting so I can probably shuffle things around a bit if this takes up too much of your time:  it's generating the results that I'm struggling with.

Thank you both..

OK, now I am beginning to understand and I think that with a little more clarity on the requirements, we may be able to help;

If you have say a 1000 categories, how do you want to proceed with determining the various combinations or groups of categories;

Within your example (rock, musicals,theatres,atms), rock can have any combination of all 4, as could musicals, atms and theatres for that matter.

If you had 20 categories, are you then wanting to expand rock to included any and all combinations with all 20 categories?

I hope not, because if you indicate yes, you are talking millions and millions of combinations!

I think that I can safely assume not therefore! However, what we need to know is how we determine what combinations are possible and how logically we can determine that?
For ten items in your list, the number of permutations will be over 1000, and so your target list would be about 3000-4000 items long in the way you've set it up.

for 1000 items you are talking about >10^150 items. More than Excel can deal with, or any program on this earth.

Even if you restrict your permuations to combinations for only 4 items from your source list, than with a 1000 items in the source you will have 41 billion (41 * 10^6) permutations.

Time for a sense check
Avatar of jeaniem

ASKER

Agree -

So we'll cut our cloth... do you need to know exactly how many rows right now or is it a value that we can change retrospectively?

Incredible maths.
might help us to know your objective, and what you have had to perform manually previous to trying to develop this solution.

Like mentioned above, anything over say five items is difficult to capture in a formula, due to the length the formula would become.
Using a 'binary' solution you could make it easy up to nine items in your source list, and possibly a couple more working with strings.

Any more items in the source list and things become unwieldy I think.

Your problem is the way you want to present it. I still don't understand why you would want to present it then way you propose in tab2, as it has so many superfluous lines: lines 19, 20 & 21 all mean the same thing, so why print it out three times?

Your tab1 is easier to achive for any number of source items (with a practical limit at about 10)
Avatar of jeaniem

ASKER

Okay, I'll take your advice:  

Really I think it's fine to provide the information as in tab 1 as ultimately it's the content that counts rather than the aesthetics!

Is there anything that I can provide to help?  You are both so kind for offering to help with this rubic's cube of a problem...

AF
jeaniem, I truly believe that the problem lies in the huge numbers of permutations that you will be needing to generate (look at the UK lottery, 6 numbers from 49 balls, gives odds of 14mn to 1, so you can imagine the numbers 1000 categories is going to generate).

Without knowing your specific objective, its is going to be virtually impossible to proceed.

I have enjoyed the discussion, but I don't believe I can be of any further help, so good luck (I will continue to monitor and if I can say anything useful, I will).
Dave
Avatar of jeaniem

ASKER

I totally understand.

Thank you very much indeed for you kind time and help - perhaps the reality is that this approach is simply not realistic and that there's another way to achieve this.

I am in awe of your mathematical ability, sijpie equally - very grateful to you both.

Bon continuation..
Have a look here. I've amended your sheet with a little 'userform'

In column A you can add up to 8 categories (more if you amend the formulas)
in column B the index for the category gets generated
in the top entry box (D2:E10) you can select the categories for the song (in columnD) and it will give you the index number for the song in E3

in the bottom entry box (D14:E15) you enter a index number and it will give you the categories.

The lists in I,J & K yo don't need for this example, but it shows how the formula in E15 works.
Avatar of jeaniem

ASKER

Hi Sijpie,

Wow - that is great:  I'll learn a lot from looking into this:

I'm not sure that I can see anything attached however?

(!) AF
Something went wron uploading thefile
SRC-TGT-data-.xls
Avatar of jeaniem

ASKER

Incredible -

There are formulas in there that I never knew existed!  I would have been stumped for decades had it not been for your help.

I wil deconstruct what I can to learn how to manipulate it and play around with the data to see what it can produce ongoing.

Wizards! Thank you very, very much again.
ASKER CERTIFIED SOLUTION
Avatar of sijpie
sijpie
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of jeaniem

ASKER

Really kind to have spent the time looking into my conundrum - deeply appreciated!
In the sheet the formula =DEC2BIN() is used (column J). That is a formula that becomes available by installing the Analysis Toolpack Add-in (Tools, Add-in...)

But it is not used other than for demo here.