Solved

# Correlation algorithsm

Posted on 2011-10-16
185 Views
Hi,

I have two sets of strings, some being overlapped. Say, in time T

Set A = {"a", "b", "c", "d", "e"}
Set B = {"e", "f", "g"}
Overlap={"e"}

In time S, they are

Set A = {"ab", "bd", "cga", "da1", "eda", "ka1", "ed2"}
Set B = {"cga", "fw2", "gae", "3e2"}
Overlap={"cga"}

The number of elements in each of the sets varies by date, and given time we should see that the elements for each of the set are relatively stable.

I want to design a java program to determine at time X which set a string Y likely belongs to. Let us say string Y appeared y1 times in set A and y2 times in set B in the past.

Any idea?

Thanks!

0
Question by:wsyy

LVL 47

Expert Comment

Sorry, I could not understand.

So you are saying that you have statisical data fopr prvious times that certain string appeared y1 times in A and y2 times in B.
Now you want to determine the probablity of this string appearing in set A?
Well that would probably be  y1/(y1 + y2).
What is the significance of A and B having common strings?

Please, explain in a little bit more detail.
0

Author Comment

How to define significance?

y1/(y1+y2) seems reasonable at first glance. However, similar words tend to appear together right?

I would think the possibility has something to do the other overlapped strings. So you are right.

But I don't know what and how to measure the correction between one specific string and the other strings that were overlapped before and may or may not appear presently.
0

LVL 47

Expert Comment

Still, if the lists are formed imdependently then the expectation of whether certain string will apear in A or in B
will not be affected by the fact that it sometimes appears in both.

This requires more understanding of what kind of lists these are
and how they are being formed
0

Author Comment

Sorry, for_yan, your response doesn't solve my issue.
0

LVL 47

Expert Comment

I think you need to define the problem more clearly.

>However, similar words tend to appear together right?

which words are similar, and do they really tend to appear together ?
If those are arbitrary combinations of letters and digits then similar words
would not appear together unless you impose certain policy on their selection process.

If you know anything about the underlying operations - where these lists come from, etc.
this may also help.

Otherwise with this little information , it is very difficult to give you any
sensible recommendation.

0

LVL 37

Expert Comment

If there is essentially a set number of words, you can build a correlation matrix. For each pair of words, count how many times they appear together and divide that by the total number of appearances to get a correlation score. That will be a very large matrix though.

0

Author Comment

I don't think the matrix focused solution is doable.

Are there any metrics that can replace the pair correlation?
0

LVL 37

Expert Comment

There are certainly possible solutions that don't require pairwise comparisons. It all depends on how the correlation surfaces. If the strings can be grouped into buckets and the presence of some number of strings in the bucket increase the chance of other strings in that bucket appearing, then the calculations could be very fast. It's all dependent on your application.
0

Author Comment

Tommy, could you please provide some examples or point me to the right resources? thanks
0

LVL 37

Expert Comment

Do the strings group naturally?
0

Author Comment

sorry for very late response!

No I haven't grouped strings naturally.
0

LVL 37

Expert Comment

If there is no way to group them, then you'll have to do pairwise at some point. If the correlations remain consistent over time, then you will only need to do the big part once.
0

Author Comment

What does pairwise mean? How can I do so?

Sorry for late response.
0

LVL 37

Accepted Solution

Pairwise means you compare each object to each other object.
0

## Featured Post

This was posted to the Netbeans forum a Feb, 2010 and I also sent it to Verisign. Who didn't help much in my struggles to get my application signed. ------------------------- Start The idea here is to target your cell phones with the correct…
Prime numbers are natural numbers greater than 1 that have only two divisors (the number itself and 1). By “divisible” we mean dividend % divisor = 0 (% indicates MODULAR. It gives the reminder of a division operation). We’ll follow multiple approac…
Viewers learn about the “while” loop and how to utilize it correctly in Java. Additionally, viewers begin exploring how to include conditional statements within a while loop and avoid an endless loop. Define While Loop: Basic Example: Explanatio…
The viewer will learn how to implement Singleton Design Pattern in Java.