Grep在一行中搜索两个单词

46

我一直在尝试寻找一种方法来过滤其中包含单词“ lemon”和“ rice”的行。我知道如何找到“柠檬”或“大米”，但找不到两者。它们不需要紧挨着另一行，只需一行相同的文本即可。

text-processing grep

— 塞巴斯蒂安
source

1

要查找文件中的所有字符串，可以在FOR循环中运行grep：unix.stackexchange.com/a/462445/43233

— Noam Manos，

62

“在同一行上”是指“大米”，后跟随机字符，后跟“柠檬”或相反。

在正则表达式中是rice.*lemon或lemon.*rice。您可以使用组合|：

grep -E 'rice.*lemon|lemon.*rice' some_file

如果您想使用普通的正则表达式而不是扩展的正则表达式（-E），则需要在反斜杠前加反斜杠|：

grep 'rice.*lemon\|lemon.*rice' some_file

要获取更多的单词，这些单词很快就会变得冗长，通常更容易使用多次调用grep，例如：

grep rice some_file | grep lemon | grep chicken

— 弗洛里安·迪切
source

您的最后一行是合取词，不是析取词吗？智慧：grep rice查找包含的行rice。它被喂入grep lemon其中只会发现包含柠檬的线..依此类推。而

— OP-

脚本版本：askubuntu.com/a/879253/5696

— 杰夫（Jeff）

@Florian Diesch-Mind解释了为什么|需要逃脱grep？谢谢！

— 逃亡者

1

@fugitive egrep使用扩展的正则表达式，|可以理解为OR逻辑。grep默认为基本正则表达式，\|或为

— Sergiy Kolodyazhnyy

如grep的联机帮助页所述，egrep已弃用，应改为grep -E。我可以自由地相应地编辑答案。

— 甜点，

26

您可以将第一个grep命令的输出通过管道传输到另一个grep命令，该命令将同时匹配这两种模式。因此，您可以执行以下操作：

grep <first_pattern> <file_name> | grep <second_pattern>

要么，

cat <file_name> | grep <first_pattern> | grep <second_pattern>

例：

让我们向文件添加一些内容：

$ echo "This line contains lemon." > test_grep.txt
$ echo "This line contains rice." >> test_grep.txt
$ echo "This line contains both lemon and rice." >> test_grep.txt
$ echo "This line doesn't contain any of them." >> test_grep.txt
$ echo "This line also contains both rice and lemon." >> test_grep.txt

该文件包含什么：

$ cat test_grep.txt 
This line contains lemon.
This line contains rice.
This line contains both lemon and rice.
This line doesn't contain any of them.
This line also contains both rice and lemon.

现在，让我们grep我们想要什么：

$ grep rice test_grep.txt | grep lemon
This line contains both lemon and rice.
This line also contains both rice and lemon.

我们只会得到两个模式都匹配的行。您可以扩展它，并将输出通过管道传递到另一个grep命令，以进行进一步的“ AND”匹配。

— 阿迪亚
source

21

尽管这个问题要求“ grep”，但我认为发布一个简单的“ awk”解决方案可能会有所帮助：

awk '/lemon/ && /rice/'

可以用更多的单词或除“ and”之外的其他布尔表达式轻松扩展此功能。

— 大卫·B
source

11

以任何顺序查找匹配项的另一个想法是使用：

具有-P （Perl-Compatibility）选项和正向超前正则表达式的(?=(regex)) grep ：

grep -P '(?=.*?lemon)(?=.*?rice)' infile

或者您可以使用以下内容代替：

grep -P '(?=.*?rice)(?=.*?lemon)' infile

的.*?任何字符匹配手段.即出现零次或多次*，而它们是可选的，随后的图案（rice或lemon）。该?让一切可选的它（指零或一切的一次匹配之前.*）

(?=pattern)：正向超前：正向超前构造是一对圆括号，括号内是一个问号和一个等号。

因此，这将返回所有包含lemon且rice以随机顺序包含的行。同样，这将避免使用|s和doubleed greps。

外部链接： _{高级Grep主题}_{积极前瞻–适用于设计师的GREP}

— αғsнιη
source

5

grep -e foo -e goo

将返回foo或goo的匹配项

— 网skink
source

1

如果我们承认提供不grep基于答案的答案是可以接受的，例如上述基于的答案awk，我将提出一条简单的代码，perl例如：

$ perl -ne 'print if /lemon/ and /rice/' my_text_file

搜索可以忽略某些单词（例如所有单词）的大小写/lemon/i and /rice/i。无论如何，在大多数Unix / Linux机器上都安装了perl和awk。

— 吉尔·迈森纽夫（Gilles Maisonneuve）
source

拒绝！！！;）因为没有意义.. :)

— An0n

0

这是一个自动化grep管道解决方案的脚本：

#!/bin/bash

# Use filename if provided as environment variable, or "foo" as default
filename=${filename-foo}

grepand () {
# disable word splitting and globbing
IFS=
set -f
if [[ -n $1 ]]
then
grep -i "$1" ${filename} | filename="" grepand "${@:2}"
else
# If there are no arguments, assume last command in pipe and print everything
cat
fi
}

grepand "$@"

— 杰夫
source

1

这可能应该使用递归函数来实现，而不是构建并eval输入一个容易中断的命令字符串

— muru

@muru随时提出修改建议。我对此表示赞赏。

— 杰夫（Jeff）

1

对其进行编辑将需要大量重写，因此我不会这样做。如果您想添加它，这就是我想象的样子：paste.ubuntu.com/23915379

— muru