Answers:
也许桌子是你要的?
dummyData = rep(c(1,2, 2, 2), 25)
table(dummyData)
# dummyData
# 1 2
# 25 75
## or another presentation of the same data
as.data.frame(table(dummyData))
# dummyData Freq
# 1 1 25
# 2 2 75
hist
。table
似乎比慢很多hist
。我想知道为什么。谁能确认?
order()
结果即可。即x <- as.data.frame(table(dummyData)); x[order(x$Freq, decreasing = TRUE), ]
正如Chase所建议的那样,table()函数是一个不错的选择。如果要分析大型数据集,另一种方法是在数据表包中使用.N函数。
确保通过以下方式安装了数据表包
install.packages("data.table")
码:
# Import the data.table package
library(data.table)
# Generate a data table object, which draws a number 10^7 times
# from 1 to 10 with replacement
DT<-data.table(x=sample(1:10,1E7,TRUE))
# Count Frequency of each factor level
DT[,.N,by=x]
要获取包含唯一值计数的无量纲整数向量,请使用c()
。
dummyData = rep(c(1, 2, 2, 2), 25) # Chase's reproducible data
c(table(dummyData)) # get un-dimensioned integer vector
1 2
25 75
str(c(table(dummyData)) ) # confirm structure
Named int [1:2] 25 75
- attr(*, "names")= chr [1:2] "1" "2"
如果您需要将唯一值的计数输入到另一个函数中,这可能会很有用,并且比在t(as.data.frame(table(dummyData))[,2]
Chase的答案的注释中张贴的更短,更惯用。感谢Ricardo Saporta 在这里向我指出了这一点。
如果需要将唯一值的数量作为包含您的值的数据框中的附加列(例如,可能表示样本量的列),plyr提供了一种简洁的方法:
data_frame <- data.frame(v = rep(c(1,2, 2, 2), 25))
library("plyr")
data_frame <- ddply(data_frame, .(v), transform, n = length(v))
ddply(data_frame, .(v), count)
。同样值得一提的是,您需要library("plyr")
致电进行ddply
工作。
transform
而不是mutate
使用时plyr
。
如果要在data.frame(例如train.data)上唯一运行,并获得计数(可用作分类器中的权重),则可以执行以下操作:
unique.count = function(train.data, all.numeric=FALSE) {
# first convert each row in the data.frame to a string
train.data.str = apply(train.data, 1, function(x) paste(x, collapse=','))
# use table to index and count the strings
train.data.str.t = table(train.data.str)
# get the unique data string from the row.names
train.data.str.uniq = row.names(train.data.str.t)
weight = as.numeric(train.data.str.t)
# convert the unique data string to data.frame
if (all.numeric) {
train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1,
function(x) as.numeric(unlist(strsplit(x, split=","))))))
} else {
train.data.uniq = as.data.frame(t(apply(cbind(train.data.str.uniq), 1,
function(x) unlist(strsplit(x, split=",")))))
}
names(train.data.uniq) = names(train.data)
list(data=train.data.uniq, weight=weight)
}
count_unique_words <-function(wlist) {
ucountlist = list()
unamelist = c()
for (i in wlist)
{
if (is.element(i, unamelist))
ucountlist[[i]] <- ucountlist[[i]] +1
else
{
listlen <- length(ucountlist)
ucountlist[[i]] <- 1
unamelist <- c(unamelist, i)
}
}
ucountlist
}
expt_counts <- count_unique_words(population)
for(i in names(expt_counts))
cat(i, expt_counts[[i]], "\n")