在“。”之后删除字符串的一部分


76

我正在使用NCBI参考序列登录号,例如variable a

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")  

要从biomart软件包中获取信息,我需要在登录号后删除.1.2等等。我通常使用以下代码执行此操作:

b <- sub("..*", "", a)

# [1] "" "" "" "" "" ""

但是如您所见,这不是此变量的正确方法。谁能帮我这个?

Answers:


109

您只需要逃脱句点:

a <- c("NM_020506.1","NM_020519.1","NM_001030297.2","NM_010281.2","NM_011419.3", "NM_053155.2")

gsub("\\..*","",a)
[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155" 

说明:具有基本软件包中的功能(即,没有其他软件包,如string r),这些选项如下所示:b1 <-gsub(“ \\ .. *”,“”,a,fixed = FALSE)b2 <-sub(“ \\ .. *“,”“,a,fixed = FALSE)在某些情况下,您可能需要更改fixed参数。但是,这里必须将其设置为FALSE(这是默认设置);否则它将无法正常工作。此外,您需要使用双转义符\,否则会出现错误。
David C.

您不会将其固定为TRUE来使用,因为我们在这里使用正则表达式。
汉西

11

我们可以假装它们是文件名并删除扩展名

tools::file_path_sans_ext(a)
# [1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"

8

您可以这样做:

sub("*\\.[0-9]", "", a)

要么

library(stringr)
str_sub(a, start=1, end=-3)

4
替代品:str_replace(a,"\\.[0-9]","")str_replace(a,"\\..*","")
保罗

5

如果字符串应为固定长度,则可以使用substrfrom base R。但是,我们可以获得.with的位置并在其中regexpr使用substr

substr(a, 1, regexpr("\\.", a)-1)
#[1] "NM_020506"    "NM_020519"    "NM_001030297" "NM_010281"    "NM_011419"    "NM_053155"   
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.