close

How can I use covariance in random number generation
of a normal distribution?

For example, I would use VBA to implement Box-Mueller
using mean and std dev. I have implemented B-M in C.
Can someone provide an algorithm (VBA, C or pseudocode)
that incorporates covariance between two and among
3 or more random variables, each normally distributed?

Is that called multivariate normal random number
generation?

FYI, This is for Monte Carlo simulation, if that matters.
And I would be content with (even prefer) an algorithm
that depends on a uniform RNG.

Also, can I accomplish this without using VBA?

For example, some people have suggested using
NORMDIST(RAND(),mean,sd) as an RNG with normal
distribution based on mean and std dev. I don't know
how good that is and how that compares quality-wise to
Box-Mueller, even assuming that RAND() is a good
uniform RNG. (I believe some people question that
assumption.)

Frankly, I don't know how good Box-Mueller is quality-wise
either, assuming a good uniform RNG to start with. I
believe some people quibble over B-M as well. But I
prefer to keep the discussion simple for now. I am
interested in fundamental algorithms, not quibbles about
perfection.

I think you meant NORMINV(RAND()...) not NORMDIST(RAND()...)
The former is a mathematically exact way to generate normal random variates,
as is Box-Muller. Numerically, I would avoid the INV approach unless you
have Excel 2003, sincer earlier inverse functions are not equal to the task.

Both approaches can magnifiy inadequacies of the uniform random number
generator. See
groups.google.com/group/micro...b3b6faa69eb726
for more comments in this vein.

As for a correlation structure, if x,y,z are independent normal random
variables, then x z and y z are correlated normal random variables. You just
need to generate a few extra variables to drive your correlation structure.

Jerry

quot; wrote:

gt; How can I use covariance in random number generation
gt; of a normal distribution?
gt;
gt; For example, I would use VBA to implement Box-Mueller
gt; using mean and std dev. I have implemented B-M in C.
gt; Can someone provide an algorithm (VBA, C or pseudocode)
gt; that incorporates covariance between two and among
gt; 3 or more random variables, each normally distributed?
gt;
gt; Is that called multivariate normal random number
gt; generation?
gt;
gt; FYI, This is for Monte Carlo simulation, if that matters.
gt; And I would be content with (even prefer) an algorithm
gt; that depends on a uniform RNG.
gt;
gt; Also, can I accomplish this without using VBA?
gt;
gt; For example, some people have suggested using
gt; NORMDIST(RAND(),mean,sd) as an RNG with normal
gt; distribution based on mean and std dev. I don't know
gt; how good that is and how that compares quality-wise to
gt; Box-Mueller, even assuming that RAND() is a good
gt; uniform RNG. (I believe some people question that
gt; assumption.)
gt;
gt; Frankly, I don't know how good Box-Mueller is quality-wise
gt; either, assuming a good uniform RNG to start with. I
gt; believe some people quibble over B-M as well. But I
gt; prefer to keep the discussion simple for now. I am
gt; interested in fundamental algorithms, not quibbles about
gt; perfection.



-

Browse to Google Groups and search for quot;excel correlated normal randomquot;
(without the quotes), and you'll get plenty of answers to your questions.

- Mike
www.mikemiddleton.com

quot; gt; wrote
in message ...
gt; How can I use covariance in random number generation
gt; of a normal distribution?
gt;
gt; For example, I would use VBA to implement Box-Mueller
gt; using mean and std dev. I have implemented B-M in C.
gt; Can someone provide an algorithm (VBA, C or pseudocode)
gt; that incorporates covariance between two and among
gt; 3 or more random variables, each normally distributed?
gt;
gt; Is that called multivariate normal random number
gt; generation?
gt;
gt; FYI, This is for Monte Carlo simulation, if that matters.
gt; And I would be content with (even prefer) an algorithm
gt; that depends on a uniform RNG.
gt;
gt; Also, can I accomplish this without using VBA?
gt;
gt; For example, some people have suggested using
gt; NORMDIST(RAND(),mean,sd) as an RNG with normal
gt; distribution based on mean and std dev. I don't know
gt; how good that is and how that compares quality-wise to
gt; Box-Mueller, even assuming that RAND() is a good
gt; uniform RNG. (I believe some people question that
gt; assumption.)
gt;
gt; Frankly, I don't know how good Box-Mueller is quality-wise
gt; either, assuming a good uniform RNG to start with. I
gt; believe some people quibble over B-M as well. But I
gt; prefer to keep the discussion simple for now. I am
gt; interested in fundamental algorithms, not quibbles about
gt; perfection.
quot;Jerry W. Lewisquot; wrote:
gt; I think you meant NORMINV(RAND()...) not NORMDIST(RAND()...)

Yes. And thanks for your response. I was hoping to catch
your attention :-).

Also, immediately after I posted, I wondered if I am more
interested in correlation (e.g. RSQ) than covariance. The
non-academic material I am reading uses the terms
interchangably. I'm afraid I fell into the same trap. Since
they seem to be related, I hope it does not matter much;
just a detail, I hope.

gt; As for a correlation structure, if x,y,z are independent
gt; normal random variables, then x z and y z are correlated
gt; normal random variables. You just need to generate
gt; a few extra variables to drive your correlation structure.

I sense that this is very relevant and important to me. But
I confess that you lost me completely. (My fault, not yours.)
I have no idea what to do with the correlation quot;structurequot;
(matrix?) in order to build the RNG.

Can you be more specific?

Perhaps a numerical example would help. The following
is based on real data. I wish I had numbers that
demonstrate closer correlation and negative correlation.
But it is difficult to tweak the data to make that happen.
I hope you can imagine that is the case. (And I hope the
table is columnarized properly when I post this.)

avgsd
X13.1.9%
Y11.0.9%
Z 4.9% 2.1%

rsq:
XYZ
X10.72210.0048
Y0.722110.0030
Z0.00480.00301

covar:
X Y Z
X 0.0228 0.0182-0.0003
Y 0.0182 0.0135-0.0001
Z-0.0003-0.0001 0.0004Ignoring the correlation factor, an RNG might be simply
NORMINV(RAND(),avg,sd). Alternatively, the Box-Mueller
algorithm is (simplified for illustration only):

x1 = 2*rnd() - 1
x2 = (2*rnd() - 1) * sqrt(1 - x1^2)
w = x1^2 x2^2
return x1*sqrt(2*ln(w)/w)

But if I simply do that for each of X, Y and Z and
ignore correlation, I worry that the results of a
simulation might not demonstrate the expected
quot;negativequot; correlation (I wish) between Y and Z and
the quot;closequot; correlation (I wish) between X and Y.

So how do I change (or replace) the simple RNG to
incorporate correlation? What would the Excel formula
be? Or what would the VBA algorithm look like --
perhaps just a tweak of the B-M algorithm?

quot;Mike Middletonquot; wrote:
gt; Browse to Google Groups and search for quot;excel correlated
gt; normal randomquot; (without the quotes)

I am glad I got your attention, too :-).

Thanks. That does indeed provide more useful hits than
the google search I had done earlier. It finds some useful
papers, which I have yet to read -- including one that I
had failed to bookmark some months ago and could not
find again :-gt;). And I am still plowing through the results
to see if there are any other useful hits.

In the meantime, I am particularly interested in the method
you suggested in Jan05. It is simple enough even for me :-).

Can you answer some questions?

To generate random values from two correlated normal
distributions X and Y, you suggested:

z1 = norminv(rand(), 0, 1)
z2 = norminv(rand(), 0, 1)
x = meanX stdevX*z1
y = meanY stdevY*(z1*r z2*sqrt(1-r^2))

1. Did I translate your formulas correctly? I changed the
notation slightly.

2. Does this method have a common name (a la quot;Box-Muellerquot;
or quot;polar methodquot;)? It would be useful to know for my
own documentation as well as if/when I discuss it with
others.

3. I believe quot;rhoquot; is sqrt(rsq(X,Y)) -- i.e. pearson(X,Y).
Right? You had noted that quot;rhoquot; must be between -1
and 1.

4. I believe the above works well for two random variables.
What about for N random variables, N gt;= 3?

For example, for random variables X1, X2, X3 and X4,
would I compute simply:

z1 = norminv(rand(), 0, 1)
z2 = norminv(rand(), 0, 1)
z3 = norminv(rand(), 0, 1)
z4 = norminv(rand(), 0, 1)
x1 = meanX1 stdevX1*z1
x2 = meanX2 stdevX2*(z1*rX12 z2*sqrt(1-rX12^2))
x3 = meanX3 stdevX3*(z1*rX13 z3*sqrt(1-rX13^2))
x4 = meanX4 stdevX4*(z1*rX14 z4*sqrt(1-rX14^2))

where, quot;rX12quot; is quot;rhoquot; for X1 and X2; quot;rX13quot; is quot;rhoquot; for
X1 and X3; etc.

The choice of the correlation with X1 for all cases seems
somewhat arbitrary. I wonder if there are more terms or
more complex terms in the multivariate case, or if the
algorithm should be embellished in some other way.

Thanks again for the pointer (google search).

The usual representation of Box-Muller is
www.taygeta.com/random/gaussian.html

If X,Y, and Z are independent N(0,1), a*X b*Y and c*Y d*Z are distributed
N(0,a^2 b^2) and N(0,c^2 d^2) with covariance b*c. Just choose a,b,c, and d
that will give the desired covariance structure (which will be unchanged by
constants to give the desired mean).

More generally, if y is a column vector of n independent N(0,1) variables,
and M is a matrix of constants with n columns, then
MMULT(M,y)
is multivariate normal with mean vector 0 and variance-covariance matrix
MMULT(M,TRANSPOSE(M)). Again you can add in constants for the desired mean
without violence to the covariance structure.

Jerry

quot; wrote:

gt; quot;Jerry W. Lewisquot; wrote:
gt; gt; I think you meant NORMINV(RAND()...) not NORMDIST(RAND()...)
gt;
gt; Yes. And thanks for your response. I was hoping to catch
gt; your attention :-).
gt;
gt; Also, immediately after I posted, I wondered if I am more
gt; interested in correlation (e.g. RSQ) than covariance. The
gt; non-academic material I am reading uses the terms
gt; interchangably. I'm afraid I fell into the same trap. Since
gt; they seem to be related, I hope it does not matter much;
gt; just a detail, I hope.
gt;
gt; gt; As for a correlation structure, if x,y,z are independent
gt; gt; normal random variables, then x z and y z are correlated
gt; gt; normal random variables. You just need to generate
gt; gt; a few extra variables to drive your correlation structure.
gt;
gt; I sense that this is very relevant and important to me. But
gt; I confess that you lost me completely. (My fault, not yours.)
gt; I have no idea what to do with the correlation quot;structurequot;
gt; (matrix?) in order to build the RNG.
gt;
gt; Can you be more specific?
gt;
gt; Perhaps a numerical example would help. The following
gt; is based on real data. I wish I had numbers that
gt; demonstrate closer correlation and negative correlation.
gt; But it is difficult to tweak the data to make that happen.
gt; I hope you can imagine that is the case. (And I hope the
gt; table is columnarized properly when I post this.)
gt;
gt; avgsd
gt; X13.1.9%
gt; Y11.0.9%
gt; Z 4.9% 2.1%
gt;
gt; rsq:
gt; XYZ
gt; X10.72210.0048
gt; Y0.722110.0030
gt; Z0.00480.00301
gt;
gt; covar:
gt; X Y Z
gt; X 0.0228 0.0182-0.0003
gt; Y 0.0182 0.0135-0.0001
gt; Z-0.0003-0.0001 0.0004
gt;
gt;
gt; Ignoring the correlation factor, an RNG might be simply
gt; NORMINV(RAND(),avg,sd). Alternatively, the Box-Mueller
gt; algorithm is (simplified for illustration only):
gt;
gt; x1 = 2*rnd() - 1
gt; x2 = (2*rnd() - 1) * sqrt(1 - x1^2)
gt; w = x1^2 x2^2
gt; return x1*sqrt(2*ln(w)/w)
gt;
gt; But if I simply do that for each of X, Y and Z and
gt; ignore correlation, I worry that the results of a
gt; simulation might not demonstrate the expected
gt; quot;negativequot; correlation (I wish) between Y and Z and
gt; the quot;closequot; correlation (I wish) between X and Y.
gt;
gt; So how do I change (or replace) the simple RNG to
gt; incorporate correlation? What would the Excel formula
gt; be? Or what would the VBA algorithm look like --
gt; perhaps just a tweak of the B-M algorithm?

Errata (quot;oh, for the love of P...!quot;) ....

I wrote in one response:
gt; I wondered if I am more interested in correlation
gt; (e.g. RSQ) than covariance

And in another response:
gt; I believe quot;rhoquot; is sqrt(rsq(X,Y)) -- i.e. pearson(X,Y).

I meant pearson() in both cases, which is neither rsq()
nor sqrt(rsq()). I guess it is called rquot;sqquot; for a reason ;-).
Klunk!

Lo and behold, pearson() does give me the close(er)
and negative correlations factors that I expected from
the descriptions of the original data.

pearson(X,Y) = 0.8497
pearson(X,Z) = -0.0695
pearson(Y,Z) = -0.0544

-

(1) Looks OK for the two-variable situation.

(2) I don't think there is a special name for these expressions, which are
special cases of the general mulitvariate normal expressions. Box-Muller is
a method for obtaining a single normal random number from two uniform random
numbers.

(3) OK. Rho is the correlation coefficient (Excel's CORREL and PEARSON
worksheet functions).

(4) Your four-variable expressions are not correct. The general case needs
matrix notation involving all pairwise correlations. Here's some expressions
(can't remember the source) for the special case of three normal random
numbers:

If x1, x2, and x3 are independent random variates, and c12, c13, and c23 are
the desired correlations among them, then the correlated random variates v1,
v2, and v3, can be calculated most easily as follows:

v1 = x1
v2 = x1*c12 x2*(1 - c12^2)^0.5
v3 = x1*c13 x2*(c23 - c12*c13)/((1-c12^2)^0.5)
x3*[1-([c23 - c12*c13]^2)/(1-c12^2)-c13^2]^0.5

- Mike
www.mikemiddleton.com

quot; gt; wrote
in message ...
gt; quot;Mike Middletonquot; wrote:
gt;gt; Browse to Google Groups and search for quot;excel correlated
gt;gt; normal randomquot; (without the quotes)
gt;
gt; I am glad I got your attention, too :-).
gt;
gt; Thanks. That does indeed provide more useful hits than
gt; the google search I had done earlier. It finds some useful
gt; papers, which I have yet to read -- including one that I
gt; had failed to bookmark some months ago and could not
gt; find again :-gt;). And I am still plowing through the results
gt; to see if there are any other useful hits.
gt;
gt; In the meantime, I am particularly interested in the method
gt; you suggested in Jan05. It is simple enough even for me :-).
gt;
gt; Can you answer some questions?
gt;
gt; To generate random values from two correlated normal
gt; distributions X and Y, you suggested:
gt;
gt; z1 = norminv(rand(), 0, 1)
gt; z2 = norminv(rand(), 0, 1)
gt; x = meanX stdevX*z1
gt; y = meanY stdevY*(z1*r z2*sqrt(1-r^2))
gt;
gt; 1. Did I translate your formulas correctly? I changed the
gt; notation slightly.
gt;
gt; 2. Does this method have a common name (a la quot;Box-Muellerquot;
gt; or quot;polar methodquot;)? It would be useful to know for my
gt; own documentation as well as if/when I discuss it with
gt; others.
gt;
gt; 3. I believe quot;rhoquot; is sqrt(rsq(X,Y)) -- i.e. pearson(X,Y).
gt; Right? You had noted that quot;rhoquot; must be between -1
gt; and 1.
gt;
gt; 4. I believe the above works well for two random variables.
gt; What about for N random variables, N gt;= 3?
gt;
gt; For example, for random variables X1, X2, X3 and X4,
gt; would I compute simply:
gt;
gt; z1 = norminv(rand(), 0, 1)
gt; z2 = norminv(rand(), 0, 1)
gt; z3 = norminv(rand(), 0, 1)
gt; z4 = norminv(rand(), 0, 1)
gt; x1 = meanX1 stdevX1*z1
gt; x2 = meanX2 stdevX2*(z1*rX12 z2*sqrt(1-rX12^2))
gt; x3 = meanX3 stdevX3*(z1*rX13 z3*sqrt(1-rX13^2))
gt; x4 = meanX4 stdevX4*(z1*rX14 z4*sqrt(1-rX14^2))
gt;
gt; where, quot;rX12quot; is quot;rhoquot; for X1 and X2; quot;rX13quot; is quot;rhoquot; for
gt; X1 and X3; etc.
gt;
gt; The choice of the correlation with X1 for all cases seems
gt; somewhat arbitrary. I wonder if there are more terms or
gt; more complex terms in the multivariate case, or if the
gt; algorithm should be embellished in some other way.
gt;
gt; Thanks again for the pointer (google search).
quot;Mike Middletonquot; wrote:
gt; (4) Your four-variable expressions are not correct. The
gt; general case needs matrix notation involving all pairwise
gt; correlations.

As I expected. I just wanted confirmation because the OP
in the Jan05 tried to draw the same simplifying (and hopeful)
conclusion that I did, and no one ever corrected him.

gt; If x1, x2, and x3 are independent random variates, and
gt; c12, c13, and c23 are the desired correlations among
gt; them, then the correlated random variates v1, v2, and
gt; v3, can be calculated most easily as follows:
gt; v1 = x1
gt; v2 = x1*c12 x2*(1 - c12^2)^0.5
gt; v3 = x1*c13 x2*(c23 - c12*c13)/((1-c12^2)^0.5)
gt; x3*[1-([c23 - c12*c13]^2)/(1-c12^2)-c13^2]^0.5

This is actually very helpful, if only for illustrative purposes.
Could you provide the equations for __4__ independent
random variates? It is not yet clear to me where this
progression is headed.



-

You wrote: gt; Could you provide the equations for __4__ independent random
variates? lt;

No, I cannot. Perhaps you can find someone else to translate the
general-case matrix operations into quot;equations.quot;

- Mike
www.mikemiddleton.com

quot; gt; wrote
in message ...
gt; quot;Mike Middletonquot; wrote:
gt;gt; (4) Your four-variable expressions are not correct. The
gt;gt; general case needs matrix notation involving all pairwise
gt;gt; correlations.
gt;
gt; As I expected. I just wanted confirmation because the OP
gt; in the Jan05 tried to draw the same simplifying (and hopeful)
gt; conclusion that I did, and no one ever corrected him.
gt;
gt;gt; If x1, x2, and x3 are independent random variates, and
gt;gt; c12, c13, and c23 are the desired correlations among
gt;gt; them, then the correlated random variates v1, v2, and
gt;gt; v3, can be calculated most easily as follows:
gt;gt; v1 = x1
gt;gt; v2 = x1*c12 x2*(1 - c12^2)^0.5
gt;gt; v3 = x1*c13 x2*(c23 - c12*c13)/((1-c12^2)^0.5)
gt;gt; x3*[1-([c23 - c12*c13]^2)/(1-c12^2)-c13^2]^0.5
gt;
gt; This is actually very helpful, if only for illustrative purposes.
gt; Could you provide the equations for __4__ independent
gt; random variates? It is not yet clear to me where this
gt; progression is headed.

arrow
arrow
    全站熱搜
    創作者介紹
    創作者 software 的頭像
    software

    software

    software 發表在 痞客邦 留言(0) 人氣()