删除方括号之间的所有内容


0

我只想删除以方括号“>”开头的行(包括方括号)在内的所有内容。有sed替代方法吗?另外,想按字母顺序对行进行排序,也就是以“>”开头的行及其下一行。

输入示例:

>ID:000:FLKLNFIA_00192 |[Ignicoccus_hospitalis_KIN4-I.gbfspecies]|strain|Ignicoccus_hospitalis_KIN4-I.gbf|LSU ribosomal protei..|447|FLKLNFIA_1(1297538):162644-163090:1 ^^ Archaeagenomesparanahui Ignicoccus_hospitalis_KIN4-I.gbfspecies strain strain.|neighbours:ID:000:FLKLNFIA_00191(1),ID:000:FLKLNFIA_00193(1)|neighbour_genes:LSU ribosomal protei..,SSU ribosomal protei..| 
ATGAGTGTGACTA---TTT---GCAATCAGCTAGCTACTACGTACTGATCGTAGCTGACG
>ID:000:MGCDKLCO_01184 |[Archaeoglobus_fulgidus_DSM_4304.gbfspecies]|strain|Archaeoglobus_fulgidus_DSM_4304.gbf|50S ribosomal protei..|471|MGCDKLCO_1(2178400):1005279-1005749:1 ^^ Archaeagenomesparanahui Archaeoglobus_fulgidus_DSM_4304.gbfspecies strain strain.|neighbours:ID:000:MGCDKLCO_01183(1),ID:000:MGCDKLCO_01185(1)|neighbour_genes:LSU ribosomal protei..,SSU ribosomal protei..|
ATGCGCGCGATAGCTAGCTAGCTAGCTTTAGGGGGATTAGCTA----ACTCTGATTCGGA

预期产量:

>Archaeoglobus_fulgidus_DSM_4304.gbfspecies
ATGCGCGCGATAGCTAGCTAGCTAGCTTTAGGGGGATTAGCTA----ACTCTGATTCGGA
>Ignicoccus_hospitalis_KIN4-I.gbfspecies
ATGAGTGTGACTA---TTT---GCAATCAGCTAGCTACTACGTACTGATCGTAGCTGACG

谢谢

Answers:


1

perl

perl -ne 'push @l, ">" . join("", /\[(.*?)\]/g) . "\n" . <>;
          END{print for sort @l}' your-file

sed

<your-file sed 's/^[^[]*\[/>/
                s/\][^[]*\[\{0,1\}//g
                N;s/\n/\[/' |
  sort |
  tr '[' '\n'

最终使用了perl解决方案。谢谢!
曼努埃尔

1

我的建议(令人费解):

cat file | grep -Po "^[CGTA-]*$|^>.*$" | grep -Po "(?<=\[).*(?=])|^[ACGT-]*$" | awk '{printf (NR%2==0) ? $0 "\n" : ">"$0"::"}' | sort | sed 's/#/\n/'

仅Grep包含字符CGTA- 的行和以开头的行>

grep -Po "^[CGTA-]*$|^>.*$"

仅对括号内的内容(不包括括号)和与模式匹配的行进行Grep ACGT-

| grep -Po "(?<=\[).*(?=])|^[ACGT-]*$"

每两行连接一次,添加一个分隔符#和一个initial >,然后排序

| awk '{printf (NR%2==0) ? $0 "\n" : ">"$0"#"}' | sort

最后用#新行替换分隔符

| sed 's/#/\n/'

输出:

>Archaeoglobus_fulgidus_DSM_4304.gbfspecies
ATGCGCGCGATAGCTAGCTAGCTAGCTTTAGGGGGATTAGCTA----ACTCTGATTCGGA
>Ignicoccus_hospitalis_KIN4-I.gbfspecies
ATGAGTGTGACTA---TTT---GCAATCAGCTAGCTACTACGTACTGATCGTAGCTGACG

漂亮的分隔符技巧。谢谢!
曼努埃尔
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.