• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 187
  • Last Modified:

Correlation algorithsm

Hi,

I have two sets of strings, some being overlapped. Say, in time T

Set A = {"a", "b", "c", "d", "e"}
Set B = {"e", "f", "g"}
Overlap={"e"}

In time S, they are

Set A = {"ab", "bd", "cga", "da1", "eda", "ka1", "ed2"}
Set B = {"cga", "fw2", "gae", "3e2"}
Overlap={"cga"}

The number of elements in each of the sets varies by date, and given time we should see that the elements for each of the set are relatively stable.

I want to design a java program to determine at time X which set a string Y likely belongs to. Let us say string Y appeared y1 times in set A and y2 times in set B in the past.

Any idea?

Thanks!

0
wsyy
Asked:
wsyy
  • 6
  • 5
  • 3
1 Solution
 
for_yanCommented:
Sorry, I could not understand.

So you are saying that you have statisical data fopr prvious times that certain string appeared y1 times in A and y2 times in B.
Now you want to determine the probablity of this string appearing in set A?
Well that would probably be  y1/(y1 + y2).
What is the significance of A and B having common strings?

Please, explain in a little bit more detail.
0
 
wsyyAuthor Commented:
How to define significance?

y1/(y1+y2) seems reasonable at first glance. However, similar words tend to appear together right?

I would think the possibility has something to do the other overlapped strings. So you are right.

But I don't know what and how to measure the correction between one specific string and the other strings that were overlapped before and may or may not appear presently.
0
 
for_yanCommented:

Still, if the lists are formed imdependently then the expectation of whether certain string will apear in A or in B
will not be affected by the fact that it sometimes appears in both.

This requires more understanding of what kind of lists these are
and how they are being formed
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
wsyyAuthor Commented:
Sorry, for_yan, your response doesn't solve my issue.
0
 
for_yanCommented:
I think you need to define the problem more clearly.

>However, similar words tend to appear together right?

which words are similar, and do they really tend to appear together ?
If those are arbitrary combinations of letters and digits then similar words
would not appear together unless you impose certain policy on their selection process.

If you know anything about the underlying operations - where these lists come from, etc.
this may also help.

Otherwise with this little information , it is very difficult to give you any
sensible recommendation.

 
0
 
TommySzalapskiCommented:
If there is essentially a set number of words, you can build a correlation matrix. For each pair of words, count how many times they appear together and divide that by the total number of appearances to get a correlation score. That will be a very large matrix though.

0
 
wsyyAuthor Commented:
I don't think the matrix focused solution is doable.

Are there any metrics that can replace the pair correlation?
0
 
TommySzalapskiCommented:
There are certainly possible solutions that don't require pairwise comparisons. It all depends on how the correlation surfaces. If the strings can be grouped into buckets and the presence of some number of strings in the bucket increase the chance of other strings in that bucket appearing, then the calculations could be very fast. It's all dependent on your application.
0
 
wsyyAuthor Commented:
Tommy, could you please provide some examples or point me to the right resources? thanks
0
 
TommySzalapskiCommented:
Do the strings group naturally?
0
 
wsyyAuthor Commented:
sorry for very late response!

No I haven't grouped strings naturally.
0
 
TommySzalapskiCommented:
If there is no way to group them, then you'll have to do pairwise at some point. If the correlations remain consistent over time, then you will only need to do the big part once.
0
 
wsyyAuthor Commented:
What does pairwise mean? How can I do so?

Sorry for late response.
0
 
TommySzalapskiCommented:
Pairwise means you compare each object to each other object.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 6
  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now