多元线性回归模拟

14

我是R语言的新手。我想知道如何从满足回归的所有四个假设的多重线性回归模型进行模拟。

好的谢谢。

假设我要基于此数据集模拟数据：

y<-c(18.73,14.52,17.43,14.54,13.44,24.39,13.34,22.71,12.68,19.32,30.16,27.09,25.40,26.05,33.49,35.62,26.07,36.78,34.95,43.67)
x1<-c(610,950,720,840,980,530,680,540,890,730,670,770,880,1000,760,590,910,650,810,500)
x2<-c(1,1,3,2,1,1,3,3,2,2,1,3,3,2,2,2,3,3,1,2)

fit<-lm(y~x1+x2)
summary(fit)

然后我得到输出：

Call:
lm(formula = y ~ x1 + x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-13.2805  -7.5169  -0.9231   7.2556  12.8209 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) 42.85352   11.33229   3.782  0.00149 **
x1          -0.02534    0.01293  -1.960  0.06662 . 
x2           0.33188    2.41657   0.137  0.89238   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.679 on 17 degrees of freedom
Multiple R-squared:  0.1869,    Adjusted R-squared:  0.09127 
F-statistic: 1.954 on 2 and 17 DF,  p-value: 0.1722

我的问题是如何模拟模仿上面原始数据的新数据？

r multiple-regression simulation

— 诺·希瑟姆·哈隆
source

29

如果您还没有它们，请先设置一些预测变量，，... $x_1$ $x_2$
选择预测变量的系数（“真”），即，包括截距。 $\beta_i$ $\beta_0$
选择误差方差或等效地选择其平方根 $\sigma^2$ $\sigma$
生成误差项作为独立的随机法线向量，均值为0，方差为 $\varepsilon$ $\sigma^2$
令 $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_k x_k + \varepsilon$

那么你可以退步在你的 $y$ $x$

例如，在R中，您可以执行以下操作：

x1 <- 11:30
x2 <- runif(20,5,95)
x3 <- rbinom(20,1,.5)

b0 <- 17
b1 <- 0.5
b2 <- 0.037
b3 <- -5.2
sigma <- 1.4

eps <- rnorm(x1,0,sigma)
y <- b0 + b1*x1  + b2*x2  + b3*x3 + eps

根据模型生成的单个模拟。然后跑步 $y$

 summary(lm(y~x1+x2+x3))

给

Call:
lm(formula = y ~ x1 + x2 + x3)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6967 -0.4970  0.1152  0.7536  1.6511 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 16.28141    1.32102  12.325 1.40e-09 ***
x1           0.55939    0.04850  11.533 3.65e-09 ***
x2           0.01715    0.01578   1.087    0.293    
x3          -4.91783    0.66547  -7.390 1.53e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.241 on 16 degrees of freedom
Multiple R-squared:  0.9343,    Adjusted R-squared:  0.9219 
F-statistic: 75.79 on 3 and 16 DF,  p-value: 1.131e-09

您可以通过几种方式简化此过程，但我认为将其拼写将有助于开始操作。

如果要模拟一个新的随机但具有相同的总体系数，只需重新运行上述过程的最后两行（生成一个新的random 和），对应于算法的步骤3和4。 $y$ epsy

— Glen_b-恢复莫妮卡
source

是否可以更改估算的标准误？我使用了稍微修改的脚本（rnorm()而不是11:30），但是无论我增加多少误差（西格玛），估计的标准误差都大致相似。

— 丹尼尔（Daniel）

2

这是另一个生成多元线性回归的代码，误差遵循正态分布：

sim.regression<-function(n.obs=10,coefficients=runif(10,-5,5),s.deviation=.1){

  n.var=length(coefficients)  
  M=matrix(0,ncol=n.var,nrow=n.obs)

  beta=as.matrix(coefficients)

  for (i in 1:n.var){
    M[,i]=rnorm(n.obs,0,1)
  }

  y=M %*% beta + rnorm(n.obs,0,s.deviation)

  return (list(x=M,y=y,coeff=coefficients))

}

— TP箭头
source