• Status: Solved
• Priority: Medium
• Security: Public
• Views: 190

# Correlation algorithsm

Hi,

I have two sets of strings, some being overlapped. Say, in time T

Set A = {"a", "b", "c", "d", "e"}
Set B = {"e", "f", "g"}
Overlap={"e"}

In time S, they are

Set A = {"ab", "bd", "cga", "da1", "eda", "ka1", "ed2"}
Set B = {"cga", "fw2", "gae", "3e2"}
Overlap={"cga"}

The number of elements in each of the sets varies by date, and given time we should see that the elements for each of the set are relatively stable.

I want to design a java program to determine at time X which set a string Y likely belongs to. Let us say string Y appeared y1 times in set A and y2 times in set B in the past.

Any idea?

Thanks!

0
wsyy
• 6
• 5
• 3
1 Solution

Commented:
Sorry, I could not understand.

So you are saying that you have statisical data fopr prvious times that certain string appeared y1 times in A and y2 times in B.
Now you want to determine the probablity of this string appearing in set A?
Well that would probably be  y1/(y1 + y2).
What is the significance of A and B having common strings?

Please, explain in a little bit more detail.
0

Author Commented:
How to define significance?

y1/(y1+y2) seems reasonable at first glance. However, similar words tend to appear together right?

I would think the possibility has something to do the other overlapped strings. So you are right.

But I don't know what and how to measure the correction between one specific string and the other strings that were overlapped before and may or may not appear presently.
0

Commented:

Still, if the lists are formed imdependently then the expectation of whether certain string will apear in A or in B
will not be affected by the fact that it sometimes appears in both.

This requires more understanding of what kind of lists these are
and how they are being formed
0

Author Commented:
Sorry, for_yan, your response doesn't solve my issue.
0

Commented:
I think you need to define the problem more clearly.

>However, similar words tend to appear together right?

which words are similar, and do they really tend to appear together ?
If those are arbitrary combinations of letters and digits then similar words
would not appear together unless you impose certain policy on their selection process.

If you know anything about the underlying operations - where these lists come from, etc.
this may also help.

Otherwise with this little information , it is very difficult to give you any
sensible recommendation.

0

Commented:
If there is essentially a set number of words, you can build a correlation matrix. For each pair of words, count how many times they appear together and divide that by the total number of appearances to get a correlation score. That will be a very large matrix though.

0

Author Commented:
I don't think the matrix focused solution is doable.

Are there any metrics that can replace the pair correlation?
0

Commented:
There are certainly possible solutions that don't require pairwise comparisons. It all depends on how the correlation surfaces. If the strings can be grouped into buckets and the presence of some number of strings in the bucket increase the chance of other strings in that bucket appearing, then the calculations could be very fast. It's all dependent on your application.
0

Author Commented:
Tommy, could you please provide some examples or point me to the right resources? thanks
0

Commented:
Do the strings group naturally?
0

Author Commented:
sorry for very late response!

No I haven't grouped strings naturally.
0

Commented:
If there is no way to group them, then you'll have to do pairwise at some point. If the correlations remain consistent over time, then you will only need to do the big part once.
0

Author Commented:
What does pairwise mean? How can I do so?

Sorry for late response.
0

Commented:
Pairwise means you compare each object to each other object.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

## Featured Post

• 6
• 5
• 3
Tackle projects and never again get stuck behind a technical roadblock.