根据列中的部分字符串匹配选择数据帧行

我想基于列中字符串的部分匹配从数据框中选择行，例如，列“ x”包含字符串“ hsa”。使用sqldf- 如果有like语法-我会做类似的事情：

select * from <> where x like 'hsa'。

不幸的是，sqldf不支持该语法。

或类似地：

selectedRows <- df[ , df$x %like% "hsa-"]

当然哪个不起作用。

有人可以帮我吗？

string r match

— 阿斯达
source

您能否发布几行数据，最好使用dput(head(conservedData))。

— 2012年

Answers:

147

我注意到您%like%在当前方法中提到了一个函数。我不知道这是否是对%like%“ data.table” 的引用，但是如果是，则可以按如下方式使用它。

请注意，对象不必是a data.table（但还请记住，data.frames和data.tables的子集方法并不相同）：

library(data.table)
mtcars[rownames(mtcars) %like% "Merc", ]
iris[iris$Species %like% "osa", ]

如果那是您所拥有的，那么也许您只是混合了行和列的位置来设置数据。

如果您不想加载程序包，则可以尝试使用grep()来搜索要匹配的字符串。这是mtcars数据集的示例，其中我们匹配所有行名称包含“ Merc”的行：

mtcars[grep("Merc", rownames(mtcars)), ]
             mpg cyl  disp  hp drat   wt qsec vs am gear carb
# Merc 240D   24.4   4 146.7  62 3.69 3.19 20.0  1  0    4    2
# Merc 230    22.8   4 140.8  95 3.92 3.15 22.9  1  0    4    2
# Merc 280    19.2   6 167.6 123 3.92 3.44 18.3  1  0    4    4
# Merc 280C   17.8   6 167.6 123 3.92 3.44 18.9  1  0    4    4
# Merc 450SE  16.4   8 275.8 180 3.07 4.07 17.4  0  0    3    3
# Merc 450SL  17.3   8 275.8 180 3.07 3.73 17.6  0  0    3    3
# Merc 450SLC 15.2   8 275.8 180 3.07 3.78 18.0  0  0    3    3

还有另一个示例，使用iris数据集搜索字符串osa：

irisSubset <- iris[grep("osa", iris$Species), ]
head(irisSubset)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

对于您的问题，请尝试：

selectedRows <- conservedData[grep("hsa-", conservedData$miRNA), ]

— A5C1D2H2I1M1N2O1R2T1
source

+1：还请注意，它grep支持正则表达式，因此您可能需要grep ^hsa-代替。

— nico 2012年

@nico：实际上，grep它来自ed命令g / re / p（全局/正则表达式/打印），它仅向正则表达式的掌握者fu展示其真正的力量；-)：en.wikipedia.org/ Wiki / Grep

— Stephan Kolassa，2012年

该像%%的建议是伟大的！我建议将其放在您的答案之上。

— 阿伦·坎布雷

@ArenCambre，完成了。也许它将帮助我再获得11票，这样我就可以在年底之前

— 换上

@ A5C1D2H2I1M1N2O1R2T1好答案！有没有一种方法可以使用％like％来搜索同时出现的两个字符串（如在“ pet”和“ pip”中作为“ peter piper”出现在数据行中）？

— nigus21

str_detect()从stringr包中尝试一下，该包可以检测字符串中是否存在模式。

下面是还采用了一种方法%>%管和filter()从dplyr包：

library(stringr)
library(dplyr)

CO2 %>%
  filter(str_detect(Treatment, "non"))

   Plant        Type  Treatment conc uptake
1    Qn1      Quebec nonchilled   95   16.0
2    Qn1      Quebec nonchilled  175   30.4
3    Qn1      Quebec nonchilled  250   34.8
4    Qn1      Quebec nonchilled  350   37.2
5    Qn1      Quebec nonchilled  500   35.3
...

这会针对“治疗”变量包含子字符串“ non”的行过滤样本CO2数据集（R随附）。您可以调整是str_detect查找固定匹配项还是使用正则表达式-请参阅stringer软件包的文档。

— 山姆·菲克（Sam Firke）
source

您也可以像这样使用trc_detect函数myDataFrame[str_detect(myDataFrame$key, myKeyPattern),]

— Bemipefe

LIKE 应该在sqlite中工作：

require(sqldf)
df <- data.frame(name = c('bob','robert','peter'),id=c(1,2,3))
sqldf("select * from df where name LIKE '%er%'")
    name id
1 robert  2
2  peter  3

— 用户名
source

SQLDF最适合列出。但是，它不能删除行。

— Suat Atan PhD

为什么require()在这里装载R包

— rgalbo

因为它不是标准的R库，所以您必须手动安装它，然后使用requirefunction 进行加载。

— bartektartanus