xavier_da
asked on
How do I create a formula to reference differing cell ranges for input into a more complex calculation using Excel?
Hi there,
I need help putting together what might be a complex formula. I think I know the conceptual building blocks, but need that last bit of help to bring it all together - or alternatively be given a different approach. First some background.
I am using pwrSIMILARITY to calculate textual similarity score between any two given input cells, eg.:
Great! However what I am trying to do is calculate multiple comparisons. The spreadsheet I have has multiple columns with similar text. Some of these columns might be blank. For example, out of a possible 17 columns for any given row, 3 may have non-empty text.
pwrSIMILARITY is symmetrical in that the order of input cells produces the same result. I can leverage this in a basic permutation calculation. For example, for 3 input columns, (say C1 to C3) there are just 3 'paired' calculations to perform:
I can then take the average of those to roughly produce a score which I interpret as the overall similarity score for text across columns C1 to C3.
I can use COUNTA and PERMUT to work out the number of pair calculations as follows:
The questions are:
How do I get a reference to the cells that COUNTA includes in its result? Noting that there will be instances where this may be one or more columns with intermediate blank ones.
Assuming I can get those, how do I actually generate the pairings, and get a formula that plugs in the cell references into pwrSIMILARITY as in the above example? Note: for 6 input columns, that's 15 unique pairings...
Basically, how do aI get a robust formula that will readily handle 1 to n columns of text to compare?
I think the answer is a combination of INDEX, MATCH, OFFSET functions and arrays - and maybe a User Defined Function to generate the pairing bit of the formula. Or is there a better way?
Thanks in advance!
Xav.
I need help putting together what might be a complex formula. I think I know the conceptual building blocks, but need that last bit of help to bring it all together - or alternatively be given a different approach. First some background.
I am using pwrSIMILARITY to calculate textual similarity score between any two given input cells, eg.:
=pwrSIMILARITY(M292,N292)
Great! However what I am trying to do is calculate multiple comparisons. The spreadsheet I have has multiple columns with similar text. Some of these columns might be blank. For example, out of a possible 17 columns for any given row, 3 may have non-empty text.
pwrSIMILARITY is symmetrical in that the order of input cells produces the same result. I can leverage this in a basic permutation calculation. For example, for 3 input columns, (say C1 to C3) there are just 3 'paired' calculations to perform:
=pwrSIMILARITY(C1,C2)
=pwrSIMILARITY(C2,C3)
=pwrSIMILARITY(C3,C1)
I can then take the average of those to roughly produce a score which I interpret as the overall similarity score for text across columns C1 to C3.
I can use COUNTA and PERMUT to work out the number of pair calculations as follows:
U1=COUNTA(D292:T292) //equals 3
V1=PERMUT(U1,2)/2 //equals 3
The questions are:
How do I get a reference to the cells that COUNTA includes in its result? Noting that there will be instances where this may be one or more columns with intermediate blank ones.
Assuming I can get those, how do I actually generate the pairings, and get a formula that plugs in the cell references into pwrSIMILARITY as in the above example? Note: for 6 input columns, that's 15 unique pairings...
Basically, how do aI get a robust formula that will readily handle 1 to n columns of text to compare?
I think the answer is a combination of INDEX, MATCH, OFFSET functions and arrays - and maybe a User Defined Function to generate the pairing bit of the formula. Or is there a better way?
Thanks in advance!
Xav.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Your assumptions were correct and spot on. I wanted to obtain the average so your MeanSimilarity function met that requirement. Key for that was the handling of ranges, in particular the ability to reference them as a contiguous block, or cell-by-cell - that's quite powerful and the bit I was missing from a conceptual understanding viewpoint. Bonus is that makes computing permutations irrelevant for this purpose.
Only change I made was case sensitivity. I've confirmed your solution with the data I have been working with.
Thanks for a quick and elegant solution and also taking time to explain the underlying approach and concepts involved; great work!
Regards,
Xav.