我如何获得将一个特定单词精确重复N次的行？

8

对于此给定的输入：

How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this

我想要这个输出：

How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one

获取整行仅包含三个重复的“ this”字样。（不区分大小写的匹配）

text-processing

— αғsнιη
source

4

对于过于广泛的选民：一个问题如何变得更加具体？

— Jacob Vlijm

@JacobVlijm那里有“太多可能的答案”。选择$RANDOM_LANGUAGE-有人将能够提出解决方案。

— muru 2015年

@muru我会说相反的话，将其限制为一种语言将使它成为一个以编程（语言）为中心的问题。现在，这是一个以问题为中心的问题。可能有很多可能的解决方案（语言），但没有那么多显而易见的解决方案。

— Jacob Vlijm 2015年

13

在中perl，this不区分大小写地替换为自己，并计算替换次数：

$ perl -ne 's/(this)/$1/ig == 3 && print' <<EOF
How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this
EOF
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one

使用匹配计数代替：

perl -ne 'my $c = () = /this/ig; $c == 3 && print'

如果您有GNU awk，一种非常简单的方法：

gawk -F'this' -v IGNORECASE=1 'NF == 4'

字段数将比分隔符数多一。

— uru
source

为什么要更换？我们不能直接计算而不更换吗？

— αғsнιη

事实上，我们可以指望，代码稍长：stackoverflow.com/questions/9538542/...

— 穆鲁

支持gawk命令。

— 斯里兰卡

9

假设您的源文件是tmp.txt，

grep -iv '.*this.*this.*this.*this' tmp.txt | grep -i '.*this.*this.*this.*'

左grep输出在tmp.txt中没有出现4个或多个不区分大小写的“ this”的所有行。

结果通过管道传递到右grep，后者输出在左grep结果中出现3次或更多次的所有行。

更新：感谢@Muru，这是此解决方案的更好版本，

grep -Eiv '(.*this){4,}' tmp.txt | grep -Ei '(.*this){3}'

用n + 1替换4，用n替换3。

— 斯里
source

如果N> 4，这将失败。第一个grep需要以结尾*。

— ps95 2015年

1

我的意思是您不能为N = 50编写此代码。问题恰好是三个，因此您需要另一个grep，该grep会丢弃所有小于或等于2的输出this。grep -iv '.*this.*this.*this.*this.*' tmp.txt | grep -i '.*this.*this.*this.* |grep -iv '.*this.*this.'

— ps95 2015年

@ prakharsingh95它在n> 4时没有失败，并且在第一个grep中不需要*。

— 斯里兰卡

1

@KasiyA您对我的回答有何看法？

— 2015年

5

稍微简化一下：grep -Eiv '(.*this){4,}' | grep -Ei '(.*this){3}'-这可能使其在N = 50时可行。

— muru

9

在python中，这可以完成工作：

#!/usr/bin/env python3

s = """How to get This line that this word repeated 3 times in THIS line?
But not this line which is THIS word repeated 2 times.
And I will get This line with this here and This one
A test line with four this and This another THIS and last this"""

for line in s.splitlines():
    if line.lower().count("this") == 3:
        print(line)

输出：

How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one

或以文件作为参数从文件中读取：

#!/usr/bin/env python3
import sys

file = sys.argv[1]

with open(file) as src:
    lines = [line.strip() for line in src.readlines()]

for line in lines:
    if line.lower().count("this") == 3:
        print(line)

将脚本粘贴到一个空文件中，另存为find_3.py，然后通过以下命令运行该脚本：
```
python3 /path/to/find_3.py <file_withlines>
```

当然，单词“ this”可以用任何其他单词（或其他字符串或行部分）代替，并且每行出现的次数可以设置为该行中的任何其他值：

    if line.lower().count("this") == 3:

编辑

如果文件很大（数十万/百万行），则下面的代码会更快；它每行读取一次文件，而不是一次加载文件：

#!/usr/bin/env python3
import sys
file = sys.argv[1]

with open(file) as src:
    for line in src:
        if line.lower().count("this") == 3:
            print(line.strip())

— 雅各布·弗利姆
source

我不是python专家，如何读取文件？感谢

— αғsнιη

1

@KasiyA编辑为使用文件作为参数。

— Jacob Vlijm

只是好奇：为什么您没有在第二个代码片段中使用生成器？

— muru 2015年

6

您可以awk为此玩一些：

awk -F"this" 'BEGIN{IGNORECASE=1} NF==4' file

返回：

How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one

说明

我们要做的是为其this自身定义字段分隔符。这样，该行将具有与单词this出现次数一样多的+1字段。
为了使其不区分大小写，我们使用IGNORECASE = 1。请参阅参考：匹配中的区分大小写。
然后，只需要说一遍就NF==4可以使所有这些行this精确地重复三遍。不再需要代码，因为{print $0}（即打印当前行）是awk表达式计算为时的默认行为True。

— 费多基
source

已发布，但很好的解释。

— muru 2015年

@muru哦，我没看到！我很抱歉，为您+1。

— fedorqui 2015年

5

假设这些行存储在名为的文件中FILE：

while read line; do 
    if [ $(grep -oi "this" <<< "$line" | wc -w)  = 3 ]; then 
        echo "$line"; 
    fi  
done  <FILE

— PS95
source

1

谢谢，您可以删除sed ...命令并添加-o选项grep -oi ...。

— αғsнιη

更简单：$(grep -ic "this" <<<"$line")

— 大师

2

@muru不，该-c选项将计算与“ this”匹配的行数，而不是每行中“ this”字词的数量。

— αғsнιη

1

@KasiyA啊，是的。我的错。

— muru

@KasiyA，不会-l和-w在这种情况下，等同？

— ps95 2015年

4

如果您在Vim中：

g/./if len(split(getline('.'), 'this\c', 1)) == 4 | print | endif

这只会打印匹配的行。

— 玻尔
source

使用Vim时搜索具有n个单词出现的行的好例子。

— 斯里兰卡

0

Ruby一线解决方案：

$ ruby -ne 'print $_ if $_.chomp.downcase.scan(/this/).count == 3' < input.txt                                    
How to get This line that this word repeated 3 times in THIS line?
And I will get This line with this here and This one

工作方式非常简单：我们将文件重定向到ruby的stdin中，ruby从stdin中获取行，并使用chomp和清理它downcase，并scan().count提供子字符串出现的次数。

— 塞尔吉·科洛季亚兹尼（Sergiy Kolodyazhnyy）
source