KNN归因R包


14

我正在寻找KNN归因软件包。我一直在查看插补包(http://cran.r-project.org/web/packages/imputation/imputation.pdf),但是由于某种原因,KNN 插补功能(即使遵循描述中的示例)也似乎归零(如下所示)。我一直在环顾四周,但仍找不到任何东西,因此想知道是否有人对好的KNN插补包有其他建议?

w ^

在下面的代码中-NA值替换为零-不替换为Knn平均值

require(imputation)
x = matrix(rnorm(100),10,10)
x.missing = x > 1
x[x.missing] = NA
kNNImpute(x, 3)
x

1
根据源代码github.com/jeffwong/imputation/blob/master/R/kNN.R,任何无法估算的条目都将设置为零。您看到这么多零的原因是因为程序包作者选择的算法无法为这些条目估算值。最好以某种方式放宽算法以获得这些值的合理估计。
Flounderer

(请参阅上述链接中的代码的91-93行)
Flounderer 2013年

我前不久有这个问题,发布在stackoverflow上
Alex W

只是值得注意的是:有没有希望任何推算模型将有丢失的数据,你已经生成的无偏估计(根据你如何把它丢)。当然,我认为您对kNNImpute完全工作(而不是良好工作)更感兴趣,因此您可能不在乎偏见。
Cliff AB

您是否有使用KNN的特定原因?预测均值匹配非常相似,并且具有许多最佳属性。
RayVelcoro 2015年

Answers:


10

您也可以尝试以下软件包: DMwR

它在3 NN的情况下失败,给出“ knnImputation(x,k = 3)中的错误:没有足够完整的情况来计算邻居”。

但是,尝试2给。

> knnImputation(x,k=2)
             [,1]       [,2]       [,3]       [,4]       [,5]        [,6]
 [1,] -0.59091360 -1.2698175  0.5556009 -0.1327224 -0.8325065  0.71664000
 [2,] -1.27255074 -0.7853602  0.7261897  0.2969900  0.2969556 -0.44612831
 [3,]  0.55473981  0.4748735  0.5158498 -0.9493917 -1.5187722 -0.99377854
 [4,] -0.47797654  0.1647818  0.6167311 -0.5149731  0.5240514 -0.46027809
 [5,] -1.08767831 -0.3785608  0.6659499 -0.7223724 -0.9512409 -1.60547053
 [6,] -0.06153279  0.9486815 -0.5464601  0.1544475  0.2835521 -0.82250221
 [7,] -0.82536029 -0.2906253 -3.0284281 -0.8473210  0.7985286 -0.09751927
 [8,] -1.15366189  0.5341000 -1.0109258 -1.5900281  0.2742328  0.29039928
 [9,] -1.49504465 -0.5419533  0.5766574 -1.2412777 -1.4089572 -0.71069839
[10,] -0.35935440 -0.2622265  0.4048126 -2.0869817  0.2682486  0.16904559
             [,7]       [,8]        [,9]      [,10]
 [1,]  0.58027159 -1.0669137  0.48670802  0.5824858
 [2,] -0.48314440 -1.0532693 -0.34030385 -1.1041681
 [3,] -2.81996446  0.3191438 -0.48117020 -0.0352633
 [4,] -0.55080515 -1.0620243 -0.51383557  0.3161907
 [5,] -0.56808769 -0.3696951  0.35549191  0.3202675
 [6,] -0.25043479 -1.0389393  0.07810902  0.5251606
 [7,] -0.41667318  0.8809541 -0.04613332 -1.1586756
 [8,] -0.06898363 -1.0736161  0.62698065 -1.0373835
 [9,]  0.30051583 -0.2936140  0.31417921 -1.4155193
[10,] -0.68180034 -1.0789745  0.58290920 -1.0197956

您可以使用complete.cases(x)测试足够的观测值,其中该值必须至少为k。

解决此问题的一种方法是放宽您的要求(即减少不完整的行),方法是:1)增加NA阈值,或者2)增加您的观测值。

这是第一个:

> x = matrix(rnorm(100),10,10)
> x.missing = x > 2
> x[x.missing] = NA
> complete.cases(x)
 [1]  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
> knnImputation(x,k=3)
             [,1]       [,2]       [,3]       [,4]        [,5]       [,6]       [,7]        [,8]        [,9]       [,10]
 [1,]  0.86882569 -0.2409922  0.3859031  0.5818927 -1.50310330  0.8752261 -0.5173105 -2.18244988 -0.28817656 -0.63941237
 [2,]  1.54114079  0.7227511  0.7856277  0.8512048 -1.32442954 -2.1668744  0.7017532 -0.40086348 -0.41251883  0.42924986
 [3,]  0.60062917 -0.5955623  0.6192783 -0.3836310  0.06871570  1.7804657  0.5965411 -1.62625036  1.27706937  0.72860273
 [4,] -0.07328279 -0.1738157  1.4965579 -1.1686115 -0.06954318 -1.0171604 -0.3283916  0.63493884  0.72039689 -0.20889111
 [5,]  0.78747874 -0.8607320  0.4828322  0.6558960 -0.22064430  0.2001473  0.7725701  0.06155196  0.09011719 -1.01902968
 [6,]  0.17988720 -0.8520000 -0.5911523  1.8100573 -0.56108621  0.0151522 -0.2484345 -0.80695513 -0.18532984 -1.75115335
 [7,]  1.03943492  0.4880532 -2.7588922 -0.1336166 -1.28424057  1.2871333  0.7595750 -0.55615677 -1.67765572 -0.05440992
 [8,]  1.12394474  1.4890366 -1.6034648 -1.4315445 -0.23052386 -0.3536677 -0.8694188 -0.53689507 -1.11510406 -1.39108817
 [9,] -0.30393916  0.6216156  0.1559639  1.2297105 -0.29439390  1.8224512 -0.4457441 -0.32814665  0.55487894 -0.22602598
[10,]  1.18424722 -0.1816049 -2.2975095 -0.7537477  0.86647524 -0.8710603  0.3351710 -0.79632184 -0.56254688 -0.77449398
> x
             [,1]       [,2]       [,3]       [,4]       [,5]       [,6]       [,7]        [,8]        [,9]       [,10]
 [1,]  0.86882569 -0.2409922  0.3859031  0.5818927 -1.5031033  0.8752261 -0.5173105 -2.18244988 -0.28817656 -0.63941237
 [2,]  1.54114079  0.7227511  0.7856277  0.8512048 -1.3244295 -2.1668744  0.7017532 -0.40086348 -0.41251883  0.42924986
 [3,]  0.60062917 -0.5955623  0.6192783 -0.3836310  0.0687157  1.7804657  0.5965411 -1.62625036  1.27706937  0.72860273
 [4,] -0.07328279 -0.1738157  1.4965579 -1.1686115         NA -1.0171604 -0.3283916  0.63493884  0.72039689 -0.20889111
 [5,]  0.78747874 -0.8607320  0.4828322         NA -0.2206443  0.2001473  0.7725701  0.06155196  0.09011719 -1.01902968
 [6,]  0.17988720 -0.8520000 -0.5911523  1.8100573 -0.5610862  0.0151522 -0.2484345 -0.80695513 -0.18532984 -1.75115335
 [7,]  1.03943492  0.4880532 -2.7588922 -0.1336166 -1.2842406  1.2871333  0.7595750 -0.55615677 -1.67765572 -0.05440992
 [8,]  1.12394474  1.4890366 -1.6034648 -1.4315445 -0.2305239 -0.3536677 -0.8694188 -0.53689507 -1.11510406 -1.39108817
 [9,] -0.30393916  0.6216156  0.1559639  1.2297105 -0.2943939  1.8224512 -0.4457441 -0.32814665  0.55487894 -0.22602598
[10,]  1.18424722 -0.1816049 -2.2975095 -0.7537477  0.8664752 -0.8710603  0.3351710 -0.79632184 -0.56254688 -0.77449398

这是第二个例子

x = matrix(rnorm(1000),100,10)
x.missing = x > 1
x[x.missing] = NA

complete.cases(x)

  [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
 [22] FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [43]  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [64] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
 [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE

至少满足k = 3个完整行,因此可以推算k = 3。

> head(knnImputation(x,k=3))
            [,1]       [,2]       [,3]       [,4]       [,5]       [,6]       [,7]       [,8]        [,9]       [,10]
[1,]  0.01817557 -2.8141502  0.3929944  0.1495092 -1.7218396  0.4159133 -0.8438809  0.6599224 -0.02451113 -1.14541016
[2,]  0.51969964 -0.4976021 -0.1495392 -0.6448184 -0.6066386 -1.6210476 -0.3118440  0.2477855 -0.30986749  0.32424673
...

5
require(imputation)
x = matrix(rnorm(100),10,10)
x.missing = x > 1
x[x.missing] = NA
y <- kNNImpute(x, 3)

attributes(y)

$names
[1] "x"              "missing.matrix"

y$x

> x(原始矩阵)

             [,1]        [,2]       [,3]       [,4]        [,5]        [,6]        [,7]
 [1,]  0.38515909  0.52661156  0.6164138  0.3095225  0.55909716 -1.16543168 -0.70714440
 [2,] -0.39222402 -1.29703536  0.4429824 -1.3950116          NA -0.46841443 -0.57563472
 [3,] -2.04467869 -0.52022405         NA  0.7219057 -0.93573417 -1.51490638  0.62356689
 [4,] -1.08684345  0.63083074         NA  0.5603603  0.48583414          NA -0.69447183
 [5,]  0.30116921  0.25127476 -0.2132160         NA -1.63484823 -0.58266488  0.34432576
 [6,]  0.82152305 -0.12900915 -1.8498997  0.8012059          NA -0.14987133 -1.11232289
 [7,]  0.27912763 -0.68923032 -0.2355762 -0.2541675 -0.14181344 -0.08519797  0.13061823
 [8,]  0.06653984 -0.87521539 -0.0980306 -0.4350224  0.05021324 -1.66963624 -0.09204772
 [9,]  0.12687240 -0.62717646 -0.1258722         NA -0.86913445  0.68365036          NA
[10,]  0.56680502  0.03318012  0.1411861  0.6573134 -0.14747073          NA -1.37949278
             [,8]        [,9]       [,10]
 [1,] -2.67066748          NA -0.64370528
 [2,] -1.26864936 -1.95692064  0.28917897
 [3,] -0.27816124 -0.20332695 -1.29456054
 [4,] -1.10917662 -0.59598910 -0.32475962
 [5,] -0.15448822  0.71667444 -1.60827152
 [6,] -0.66691445  0.05396037  0.04074923
 [7,]  0.05644956  0.99416556 -0.77808427
 [8,] -0.32294266          NA -2.50933697
 [9,] -0.67226044          NA          NA
[10,] -0.84866945 -0.54318570          NA

> y $ x(估算矩阵)

            [,1]        [,2]        [,3]        [,4]        [,5]        [,6]        [,7]
 [1,]  0.38515909  0.52661156  0.61641378  0.30952251  0.55909716 -1.16543168 -0.70714440
 [2,] -0.39222402 -1.29703536  0.44298237 -1.39501160 -0.22157531 -0.46841443 -0.57563472
 [3,] -2.04467869 -0.52022405  0.08298882  0.72190573 -0.93573417 -1.51490638  0.62356689
 [4,] -1.08684345  0.63083074 -0.66707695  0.56036034  0.48583414 -0.98956026 -0.69447183
 [5,]  0.30116921  0.25127476 -0.21321600 -0.02480909 -1.63484823 -0.58266488  0.34432576
 [6,]  0.82152305 -0.12900915 -1.84989965  0.80120592 -0.76323053 -0.14987133 -1.11232289
 [7,]  0.27912763 -0.68923032 -0.23557619 -0.25416751 -0.14181344 -0.08519797  0.13061823
 [8,]  0.06653984 -0.87521539 -0.09803060 -0.43502238  0.05021324 -1.66963624 -0.09204772
 [9,]  0.12687240 -0.62717646 -0.12587221  0.00000000 -0.86913445  0.68365036  0.00000000
[10,]  0.56680502  0.03318012  0.14118610  0.65731337 -0.14747073  0.00000000 -1.37949278
             [,8]        [,9]       [,10]
 [1,] -2.67066748  0.04286260 -0.64370528
 [2,] -1.26864936 -1.95692064  0.28917897
 [3,] -0.27816124 -0.20332695 -1.29456054
 [4,] -1.10917662 -0.59598910 -0.32475962
 [5,] -0.15448822  0.71667444 -1.60827152
 [6,] -0.66691445  0.05396037  0.04074923
 [7,]  0.05644956  0.99416556 -0.77808427
 [8,] -0.32294266  0.00000000 -2.50933697
 [9,] -0.67226044  0.00000000  0.00000000
[10,] -0.84866945 -0.54318570  0.00000000

它推定了它可以提供的价值。那些无法估算的值设置为零。


似乎imputation软件包不再存在(对于R版本3.1.2)
Ehsan M. Kermani

它在github中,用google搜索。
marbel

5

插补包不再在CRAN上。

VIM是除DMwR之外的一种提供kNN插补功能的软件包。

也易于使用:

library("VIM")
kNN(x, k=3)

1
install.packages("DMwR")*  # for use of knnImputation.

require(DMwR)
x  = matrix(rnorm(100), 10, 10)
x.missing= x >1
x[x.missing] = NA
complete.cases(x)
y <- knnImputation(x, 3)

0

R无法归因的原因是,在许多情况下,行中缺少多个属性,因此无法计算最近的邻居。您可以做的另一种选择是,插入具有正态分布的预测概率的区间变量(或者,如果其偏斜使用具有相似偏斜的Gamma分布)。并在类变量的情况下使用决策树预测缺失值。

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.