Link to home
Start Free TrialLog in
Avatar of xavier_da
xavier_daFlag for Australia

asked on

How do I create a formula to reference differing cell ranges for input into a more complex calculation using Excel?

Hi there,

I need help putting together what might be a complex formula. I think I know the conceptual building blocks, but need that last bit of help to bring it all together - or alternatively be given a different approach. First some background.

I am using pwrSIMILARITY to calculate textual similarity score between any two given input cells, eg.:

=pwrSIMILARITY(M292,N292)

Open in new window


Great! However what I am trying to do is calculate multiple comparisons. The spreadsheet I have has multiple columns with similar text. Some of these columns might be blank. For example, out of a possible 17 columns for any given row, 3 may have non-empty text.

pwrSIMILARITY is symmetrical in that the order of input cells produces the same result. I can leverage this in a basic permutation calculation. For example, for 3 input columns, (say C1 to C3) there are just 3 'paired' calculations to perform:

=pwrSIMILARITY(C1,C2)
=pwrSIMILARITY(C2,C3)
=pwrSIMILARITY(C3,C1)

Open in new window


I can then take the average of those to roughly produce a score which I interpret as the overall similarity score for text across columns C1 to C3.

I can use COUNTA and PERMUT to work out the number of pair calculations as follows:

U1=COUNTA(D292:T292) //equals 3
V1=PERMUT(U1,2)/2 //equals 3

Open in new window


The questions are:

How do I get a reference to the cells that COUNTA includes in its result? Noting that there will be instances where this may be one or more columns with intermediate blank ones.

Assuming I can get those, how do I actually generate the pairings, and get a formula that plugs in the cell references into pwrSIMILARITY as in the above example? Note: for 6 input columns, that's 15 unique pairings...

Basically, how do aI get a robust formula that will readily handle 1 to n columns of text to compare?

I think the answer is a combination of INDEX, MATCH, OFFSET functions and arrays - and maybe a User Defined Function to generate the pairing bit of the formula. Or is there a better way?

Thanks in advance!

Xav.
ASKER CERTIFIED SOLUTION
Avatar of byundt
byundt
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of xavier_da

ASKER

Hi byundt,

Your assumptions were correct and spot on. I wanted to obtain the average so your MeanSimilarity function met that requirement. Key for that was the handling of ranges, in particular the ability to reference them as a contiguous block, or cell-by-cell - that's quite powerful and the bit I was missing from a conceptual understanding viewpoint. Bonus is that makes computing permutations irrelevant for this purpose.

Only change I made was case sensitivity. I've confirmed your solution with the data I have been working with.

Thanks for a quick and elegant solution and also taking time to explain the underlying approach and concepts involved; great work!

Regards,

Xav.