嗯,在用MatMate语言编写示例之后,我看到已经有一个python-answer,这可能是更可取的,因为python被广泛使用。但是,由于您仍然有疑问,我将向您展示我使用Matmate-matrix语言的方法,这也许是更加自我的注释。
方法1
(使用MatMate):
v=12 // 12 variables
f=3 // subset-correlation based on 3 common factors
vg = v / f // variables per subsets
// generate hidden factor-matrix
// randomu(rows,cols ,lowbound, ubound) gives uniform random matrix
// without explicite bounds the default is: randomu(rows,cols,0,100)
L = { randomu(vg,f) || randomu(vg,f)/100 || randomu(vg,f)/100 , _
randomu(vg,f)/100 || randomu(vg,f) || randomu(vg,f)/100 , _
randomu(vg,f)/100 || randomu(vg,f)/100 || randomu(vg,f) }
// make sure there is itemspecific variance
// by appending a diagonal-matrix with random positive entries
L = L || mkdiag(randomu(v,1,10,20))
// make covariance and correlation matrix
cov = L *' // L multiplied with its transpose
cor = covtocorr(cov)
set ccdezweite=3 ccfeldweite=8
list cor
cor =
1.000, 0.321, 0.919, 0.489, 0.025, 0.019, 0.019, 0.030, 0.025, 0.017, 0.014, 0.014
0.321, 1.000, 0.540, 0.923, 0.016, 0.015, 0.012, 0.030, 0.033, 0.016, 0.012, 0.015
0.919, 0.540, 1.000, 0.679, 0.018, 0.014, 0.012, 0.029, 0.028, 0.014, 0.012, 0.012
0.489, 0.923, 0.679, 1.000, 0.025, 0.022, 0.020, 0.040, 0.031, 0.014, 0.011, 0.014
0.025, 0.016, 0.018, 0.025, 1.000, 0.815, 0.909, 0.758, 0.038, 0.012, 0.018, 0.014
0.019, 0.015, 0.014, 0.022, 0.815, 1.000, 0.943, 0.884, 0.035, 0.012, 0.014, 0.012
0.019, 0.012, 0.012, 0.020, 0.909, 0.943, 1.000, 0.831, 0.036, 0.013, 0.015, 0.010
0.030, 0.030, 0.029, 0.040, 0.758, 0.884, 0.831, 1.000, 0.041, 0.017, 0.022, 0.020
0.025, 0.033, 0.028, 0.031, 0.038, 0.035, 0.036, 0.041, 1.000, 0.831, 0.868, 0.780
0.017, 0.016, 0.014, 0.014, 0.012, 0.012, 0.013, 0.017, 0.831, 1.000, 0.876, 0.848
0.014, 0.012, 0.012, 0.011, 0.018, 0.014, 0.015, 0.022, 0.868, 0.876, 1.000, 0.904
0.014, 0.015, 0.012, 0.014, 0.014, 0.012, 0.010, 0.020, 0.780, 0.848, 0.904, 1.000
这里的问题可能是,我们定义了子矩阵块,它们之间具有高相关性,而它们之间的相关性很小,这不是通过编程的方式而是通过常数级联表达式进行的。也许可以在python中更优雅地建模这种方法。
方法2(a)
之后,有一种完全不同的方法,在
该方法中,我们
将可能的剩余协方差以100%的随机量填充到因子加载矩阵中。这是在Pari / GP中完成的:
{L = matrix(8,8); \\ generate an empty factor-loadings-matrix
for(r=1,8,
rv=1.0; \\ remaining variance for variable is 1.0
for(c=1,8,
pv=if(c<8,random(100)/100.0,1.0); \\ define randomly part of remaining variance
cv= pv * rv; \\ compute current partial variance
rv = rv - cv; \\ compute the now remaining variance
sg = (-1)^(random(100) % 2) ; \\ also introduce randomly +- signs
L[r,c] = sg*sqrt(cv) ; \\ compute factor loading as signed sqrt of cv
)
);}
cor = L * L~
并且产生的相关矩阵是
1.000 -0.7111 -0.08648 -0.7806 0.8394 -0.7674 0.6812 0.2765
-0.7111 1.000 0.06073 0.7485 -0.7550 0.8052 -0.8273 0.05863
-0.08648 0.06073 1.000 0.5146 -0.1614 0.1459 -0.4760 -0.01800
-0.7806 0.7485 0.5146 1.000 -0.8274 0.7644 -0.9373 -0.06388
0.8394 -0.7550 -0.1614 -0.8274 1.000 -0.5823 0.8065 -0.1929
-0.7674 0.8052 0.1459 0.7644 -0.5823 1.000 -0.7261 -0.4822
0.6812 -0.8273 -0.4760 -0.9373 0.8065 -0.7261 1.000 -0.1526
0.2765 0.05863 -0.01800 -0.06388 -0.1929 -0.4822 -0.1526 1.000
由于因子加载矩阵的累积生成规则,可能会生成具有主要主成分的相关矩阵。另外,最好使方差的最后一部分成为唯一因子,以确保正定性。我将其留在程序中,以保持对一般原理的关注。
一个100x100的关联矩阵具有以下关联频率(四舍五入到1 dec位数)
e f e: entry(rounded) f: frequency
-----------------------------------------------------
-1.000, 108.000
-0.900, 460.000
-0.800, 582.000
-0.700, 604.000
-0.600, 548.000
-0.500, 540.000
-0.400, 506.000
-0.300, 482.000
-0.200, 488.000
-0.100, 464.000
0.000, 434.000
0.100, 486.000
0.200, 454.000
0.300, 468.000
0.400, 462.000
0.500, 618.000
0.600, 556.000
0.700, 586.000
0.800, 536.000
0.900, 420.000
1.000, 198.000
[更新]。嗯,100x100矩阵的条件不好;即使使用200位数精度,Pari / GP也无法使用polroots(charpoly())函数正确确定特征值。我在loadingsmatrix L上进行了Jacobi旋转到pca形式,并发现大部分极小的特征值,将它们以对数打印到以10为底的位置(大致给出小数点的位置)。从左到右阅读,然后逐行阅读:
log_10(eigenvalues):
1.684, 1.444, 1.029, 0.818, 0.455, 0.241, 0.117, -0.423, -0.664, -1.040
-1.647, -1.799, -1.959, -2.298, -2.729, -3.059, -3.497, -3.833, -4.014, -4.467
-4.992, -5.396, -5.511, -6.366, -6.615, -6.834, -7.535, -8.138, -8.263, -8.766
-9.082, -9.482, -9.940, -10.167, -10.566, -11.110, -11.434, -11.788, -12.079, -12.722
-13.122, -13.322, -13.444, -13.933, -14.390, -14.614, -15.070, -15.334, -15.904, -16.278
-16.396, -16.708, -17.022, -17.746, -18.090, -18.358, -18.617, -18.903, -19.186, -19.476
-19.661, -19.764, -20.342, -20.648, -20.805, -20.922, -21.394, -21.740, -21.991, -22.291
-22.792, -23.184, -23.680, -24.100, -24.222, -24.631, -24.979, -25.161, -25.282, -26.211
-27.181, -27.626, -27.861, -28.054, -28.266, -28.369, -29.074, -29.329, -29.539, -29.689
-30.216, -30.784, -31.269, -31.760, -32.218, -32.446, -32.785, -33.003, -33.448, -34.318
[更新2]
方法2(b)
的改进可能是将特定于项目的方差增加到某个非边际水平,并减少到相当少的公因数(例如,itemnumber的整数平方根):
{ dimr = 100;
dimc = sqrtint(dimr); \\ 10 common factors
L = matrix(dimr,dimr+dimc); \\ loadings matrix
\\ with dimr itemspecific and
\\ dimc common factors
for(r=1,dim,
vr=1.0; \\ complete variance per item
vu=0.05+random(100)/1000.0; \\ random variance +0.05
\\ for itemspecific variance
L[r,r]=sqrt(vu); \\ itemspecific factor loading
vr=vr-vu;
for(c=1,dimc,
cv=if(c<dimc,random(100)/100,1.0)*vr;
vr=vr-cv;
L[r,dimr+c]=(-1)^(random(100) % 2)*sqrt(cv)
)
);}
cov=L*L~
cp=charpoly(cov) \\ does not work even with 200 digits precision
pr=polroots(cp) \\ spurious negative and complex eigenvalues...
结果的结构
就相关分布而言:
仍然相似(PariGP也令人讨厌的不可分解性),但是当通过loadsmatrix的雅各比旋转发现特征值时,它具有更好的结构,对于一个新计算的示例,我得到的特征值为
log_10(eigenvalues):
1.677, 1.326, 1.063, 0.754, 0.415, 0.116, -0.262, -0.516, -0.587, -0.783
-0.835, -0.844, -0.851, -0.854, -0.858, -0.862, -0.862, -0.868, -0.872, -0.873
-0.878, -0.882, -0.884, -0.890, -0.895, -0.896, -0.896, -0.898, -0.902, -0.904
-0.904, -0.909, -0.911, -0.914, -0.920, -0.923, -0.925, -0.927, -0.931, -0.935
-0.939, -0.939, -0.943, -0.948, -0.951, -0.955, -0.956, -0.960, -0.967, -0.969
-0.973, -0.981, -0.986, -0.989, -0.997, -1.003, -1.005, -1.011, -1.014, -1.019
-1.022, -1.024, -1.031, -1.038, -1.040, -1.048, -1.051, -1.061, -1.064, -1.068
-1.070, -1.074, -1.092, -1.092, -1.108, -1.113, -1.120, -1.134, -1.139, -1.147
-1.150, -1.155, -1.158, -1.166, -1.171, -1.175, -1.184, -1.184, -1.192, -1.196
-1.200, -1.220, -1.237, -1.245, -1.252, -1.262, -1.269, -1.282, -1.287, -1.290