Solved

How to add two normal distributions?

Posted on 2007-03-29
6
3,723 Views
Last Modified: 2012-05-05
Is there an efficient way to add two normal distributions?

Let's say I have two multivariate normal distributions with means m1 and m2, and covariance matrices C1 and C2, and that the number of elements in each distribution is n1 and n2.

The mean of the sum of the distributions would then be
(m1 * n1 + m2 * n2) / (n1 + n2)

But is there an efficient way to calulate the new covariance matrix, other than iterating over all of the points of the two distributions? I have a feeling there should be, but I can't see it now.
0
Comment
Question by:loveslave
  • 3
6 Comments
 
LVL 22

Expert Comment

by:NovaDenizen
ID: 18816811
We know C1 is a matrix of covariance values for pairs of elements in the first distribution, and likewise for C2.  So what are the covariance values for pairs combining one element from the first and 1 from the second?

x1 = n1 x 1 vector of variables in first distribution
x2 = n2 x 1 vector

C1 = n1 x n1 covariance matrix
C2 = n2 x n2 covariance matrix

You need to figure out the cross-covariance values between elements of x1 and elements of x2.  If you can assume that the values are independent, then XC12 is all zeros.
XC12 = n1 x n2 cross-covariance matrix between x1 and x2

x12 = transpose(x1 x2)
C12 is the (n1+n2)x(n1+n2) covariance matrix for x12
C12 = [ C1                         XC12 ]
          [ transpose(XC12)    C2    ]
0
 
LVL 1

Author Comment

by:loveslave
ID: 18821997
Thanks a lot for your answer, but maybe I mis-phrased my question.

Both distributions have elements that are three-dimensional vectors (actually, they represent groups of points in 3D-space). So both C1, C2, and C12 are 3x3 matrices, right? What I'm after is to compute the covariance for the collection of all points in the two distributions. So the dimensionality is still the same, it's only the number of points that is bigger.
0
 
LVL 22

Accepted Solution

by:
NovaDenizen earned 125 total points
ID: 18823899
Ok, I think I understand the situation now.  You have two collections of 3-d points, and you know the covariance matrix for the 3 dimensions in each sample point for each dimension, and you want to combine the covariance matrices of the two collections.

I think it's doable.

variance(x) = b * ( sum(x^2) - n*sum(x)^2 )

There are two ways to calculate variance and covariance, the population method using b=1/n, and the unbiased method using b=1/(n-1).  See http://en.wikipedia.org/wiki/Variance, section "Population variance and sample variance".  I don't know which method you used, but you need to know it to dig the sum(x^2) value back out from your variance calculations.

sum(x) = n*mean(x)

sum(x^2) = variance(x)/b + n*sum(x)^2 = variance(x)/b + n^2*mean(x)^2

If x12 is the combined set of x1 and x2, then
sum(x12) = sum(x1) + sum(x2)
sum(x12^2) = sum(x1^2) + sum(x2^2)
variance(x12) = b*(sum(x12^2) - n*sum(x12)^2)

In code:
Given n1 (# of points in first collection), v1 (variance of a variable in the first collection), m1 (mean of that same variable), and similar n2,v2,m2 from the second collection:
sumsq1 = v1/b + n1*(n1*m1)^2 = v1/b+n1^3*m1^2
sumsq2 = v2/b + n2^3*m2^2
n12 = n1 + n2
sumsq12 = sumsq1 + sumsq2
m12 = (n1 * m1 + n2 * m2) / n12
v12 = b*(sumsq12 - n12^2 * m12^2)

That's how you do variance for each dimension.  Now for covariance:

cov(x,y) = E((x-mx)*(y-my))
= E(xy - y*mx - x*my + mx*my)
= E(xy) - mx*E(y) - E(x)*my + mx*my
= E(xy) - mx*my - mx*my + mx*my
= E(xy) - mx*my
cov(x,y) = sum(x*y)/n - sum(x)*sum(y)/n^2
sum(x*y) = n*cov(x,y) + sum(x)*sum(y)/n

Like before
x12 is x1 and y1 combined, likewise for y12.
sum(x12*y12) = sum(x1*y1) + sum(x2*y2)

In code:
Given n1 (# of points in first collection), cxy1 (covariance(x,y) from first collection), mx1 (mean of x from first collection), my1 (mean of y from first collection), and similar n2,cxy2,mx2,my2 for second collection:
sumxy1 = n1 * cxy1 + mx1 * my1 * n1
sumxy2 = n2 * cxy2 + mx2 * my2 * n2
sumxy12 = sumxy1 + sumxy2
cxy12 = sumxy12 / n12 - mx12*my12

That's the pattern.
0
 
LVL 22

Expert Comment

by:NovaDenizen
ID: 18823924
The b's above should be replaced with b1, b2, and b12 as appropriate.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Excel - Average versus AverageA = Why use AverageA? 2 75
new car buying (price bargaining) ideas.. 5 91
Compound interest rate 3 54
Simplify expression 3 85
How to Win a Jar of Candy Corn: A Scientific Approach! I love mathematics. If you love mathematics also, you may enjoy this tip on how to use math to win your own jar of candy corn and to impress your friends. As I said, I love math, but I gu…
Foreword (May 2015) This web page has appeared at Google.  It's definitely worth considering! https://www.google.com/about/careers/students/guide-to-technical-development.html How to Know You are Making a Difference at EE In August, 2013, one …
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
This is a video describing the growing solar energy use in Utah. This is a topic that greatly interests me and so I decided to produce a video about it.

930 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now