我有一个prova.txt
像这样的文件:
Start to grab from here: 1
fix1
fix2
fix3
fix4
random1
random2
random3
random4
extra1
extra2
bla
Start to grab from here: 2
fix1
fix2
fix3
fix4
random1546
random2561
extra2
bla
bla
Start to grab from here: 1
fix1
fix2
fix3
fix4
random1
random22131
我需要从“开始抢在这里”到第一个空白行。输出应如下所示:
Start to grab from here: 1
fix1
fix2
fix3
fix4
random1
random2
random3
random4
Start to grab from here: 2
fix1
fix2
fix3
fix4
random1546
random2561
Start to grab from here: 1
fix1
fix2
fix3
fix4
random1
random22131
如您所见,“开始抓住这里”之后的行是随机的,因此-A -B grep标志不起作用:
cat prova.txt | grep "Start to grab from here" -A 15 | grep -B 15 "^$" > output.txt
您能帮助我找到一种方法来捕获将要抓取的第一行(如“从此处开始抓取”),直到出现空白行。我无法预测“从这里开始抓取”之后我会有多少条随机线。
任何与unix兼容的解决方案都值得赞赏(grep,sed,awk比perl或类似的更好)。
编辑:@ john1024的出色回应后,我想知道是否有可能:
1°对块进行排序(根据从此处开始抓取:1然后1然后2)
2°删除4条(按字母顺序随机排列)的线fix1,fix2,fix3,fix4,但始终为4
3°最终删除了随机的重复项,例如sort -u命令
最终输出应如下所示:
# fix lines removed - match 1 first time
Start to grab from here: 1
random1
random2
random3
random4
#fix lines removed - match 1 second time
Start to grab from here: 1
#random1 removed cause is a dupe
random22131
#fix lines removed - match 2 that comes after 1
Start to grab from here: 2
random1546
random2561
要么
# fix lines removed - match 1 first time and the second too
Start to grab from here: 1
random1
random2
random3
random4
#random1 removed cause is a dupe
random22131
#fix lines removed - match 2 that comes after 1
Start to grab from here: 2
random1546
random2561
第二个输出比第一个更好。还需要其他一些Unix命令魔术。