如何防止grep多次打印相同的字符串?


15

如果我grep包含以下内容的文件:

These are words
These are words
These are words
These are words

...对于单词These,它将打印字符串These are words四次。

如何防止grep打印多次重复字符串?否则,如何处理grep的输出以删除重复的行?


比赛的顺序是否应该保留在输出中?否则,John1024发布的命令将起作用。
kos 2015年

Answers:


23

Unix的哲学是拥有可以做一件事并且做好事的工具。在这种情况下,grep是从文件中选择文本的工具。要找出是否存在重复项,可以对文本进行排序。要删除重复项,请使用的-u选项sort。从而:

grep These filename | sort -u

sort有很多选择:请参阅man sort。如果要计算重复项或使用更复杂的方案来确定什么是重复项或不重复项,则将排序输出通过管道传递给uniq:, grep These filename | sort | uniq然后查看manuniq`中的选项。


2

grep如果只查找单个字符串,请使用和一个附加的开关

grep -m1 'These' filename

man grep

-m NUM, --max-count=NUM
        Stop reading a file after NUM matching lines.  If the input is
        standard input from a regular file, and NUM matching lines are
        output, grep ensures that the standard input is positioned  to
        just  after  the  last matching  line  before exiting, regardless
        of the presence of trailing context lines.  This enables a calling
        process to resume a search.  When grep stops after NUM matching
        lines, it outputs any trailing context lines.  When the -c or
        --count option is also used, grep does not output a count greater
        than NUM.  When the -v or --invert-match option is also used, grep
        stops after outputting NUM non-matching lines.

或使用awk ;)

awk '/These/ {print; exit}' foo

恕我直言,最合适的答案是-m标志。我建议您将其放在答案的顶部。很好的答案!
Sergiy Kolodyazhnyy 2015年

3
如果您使用的是正则表达式,则此方法将不起作用-它会在第一个匹配项后立即停止,请确保每个匹配项中只有一个匹配项,并且不会确保只有一个匹配项。
csvan
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.