如何在R中的data.frame中查找因子的所有唯一组合的摘要统计信息？[关闭]

11

我想为data.frame中每个唯一的因素组合计算data.frame中变量的摘要。我应该使用plyr做到这一点吗？我可以使用循环而不是apply（）; 因此只要找出每种独特的组合就足够了。

r categorical-data aggregation plyr

— 拉塞尔皮尔斯
source

1

当您询问因子的独特组合，然后询问有关独特组合的摘要时，问题会产生误导。

— Wojtek

7

虽然我认为aggregate这可能是您正在寻找的解决方案，但是如果您想创建所有可能因素组合的明确列表，expand.grid则可以为您完成。例如

> expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50),
             sex = c("Male","Female"))
       height weight    sex
1      60    100   Male
2      65    100   Male
... 
30     80    100 Female
31     60    150 Female

然后，您可以遍历结果数据框中的每一行，以从原始数据中提取记录。

— 马克·弗雷德里克森
source

11

请参阅aggregate和by。例如，从帮助文件中获取aggregate：

## Compute the averages according to region and the occurrence of more
## than 130 days of frost.
aggregate(state.x77,
      list(Region = state.region,
           Cold = state.x77[,"Frost"] > 130),
      mean)

— 安妮子
source

1

最快的正确答案

— 约翰·

3

这是plyr解决方案，它的优点是返回多个汇总统计信息并为长时间计算生成进度条：

library(ez) #for a data set
data(ANT)
cell_stats = ddply(
    .data = ANT #use the ANT data
    , .variables = .(cue,flanker) #uses each combination of cue and flanker
    , .fun = function(x){ #apply this function to each combin. of cue & flanker
        to_return = data.frame(
            , acc = mean(x$acc)
            , mrt = mean(x$rt[x$acc==1])
        )
        return(to_return)
    }
    , .progress = 'text'
)

— 迈克·劳伦斯
source

谢谢！尽管我必须在对data.frame的调用中添加逗号，但这仍然有效。stats = ddply（.data = ords，.variables =。（Symbol，SysID，Hour），.fun = function（x）{to_return = data.frame（s = sum（x Profit））return（to_return）}，.progress ='text'）

P r o f i t), m = m e a n (x

$Profit) , m = mean(x$

1

除了其他建议外，您还可以describe.by()在psych软件包中找到有用的功能。它可用于显示跨因子变量级别的数字变量的摘要统计信息。

— 杰罗米·安格利姆
source

1

我个人喜欢cast()reshape软件包中的，因为它很简单：

library(reshape)
cast(melt(tips), sex ~ smoker | variable, c(sd,mean, length))

— 布兰登·贝特尔森（Brandon Bertelsen）
source

1

在library(doBy)还有的summaryBy()功能，例如，

summaryBy(DV1 + DV2 ~ Height+Weight+Sex,data=my.data)

— 拉塞尔皮尔斯
source