asked on

How do I create a formula to reference differing cell ranges for input into a more complex calculation using Excel?

Hi there,

I need help putting together what might be a complex formula. I think I know the conceptual building blocks, but need that last bit of help to bring it all together - or alternatively be given a different approach. First some background.

I am using pwrSIMILARITY to calculate textual similarity score between any two given input cells, eg.:

=pwrSIMILARITY(M292,N292)

Open in new window

Great! However what I am trying to do is calculate multiple comparisons. The spreadsheet I have has multiple columns with similar text. Some of these columns might be blank. For example, out of a possible 17 columns for any given row, 3 may have non-empty text.

pwrSIMILARITY is symmetrical in that the order of input cells produces the same result. I can leverage this in a basic permutation calculation. For example, for 3 input columns, (say C1 to C3) there are just 3 'paired' calculations to perform:

=pwrSIMILARITY(C1,C2)
=pwrSIMILARITY(C2,C3)
=pwrSIMILARITY(C3,C1)

Open in new window

I can then take the average of those to roughly produce a score which I interpret as the overall similarity score for text across columns C1 to C3.

I can use COUNTA and PERMUT to work out the number of pair calculations as follows:

U1=COUNTA(D292:T292) //equals 3
V1=PERMUT(U1,2)/2 //equals 3

Open in new window

The questions are:

How do I get a reference to the cells that COUNTA includes in its result? Noting that there will be instances where this may be one or more columns with intermediate blank ones.

Assuming I can get those, how do I actually generate the pairings, and get a formula that plugs in the cell references into pwrSIMILARITY as in the above example? Note: for 6 input columns, that's 15 unique pairings...

Basically, how do aI get a robust formula that will readily handle 1 to n columns of text to compare?

I think the answer is a combination of INDEX, MATCH, OFFSET functions and arrays - and maybe a User Defined Function to generate the pairing bit of the formula. Or is there a better way?

Thanks in advance!

Xav.

ASKER CERTIFIED SOLUTION

byundt

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

byundt

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

xavier_da

ASKER

Hi byundt,

Your assumptions were correct and spot on. I wanted to obtain the average so your MeanSimilarity function met that requirement. Key for that was the handling of ranges, in particular the ability to reference them as a contiguous block, or cell-by-cell - that's quite powerful and the bit I was missing from a conceptual understanding viewpoint. Bonus is that makes computing permutations irrelevant for this purpose.

Only change I made was case sensitivity. I've confirmed your solution with the data I have been working with.

Thanks for a quick and elegant solution and also taking time to explain the underlying approach and concepts involved; great work!

Regards,

Xav.