您如何计算当前目录中所有文件中术语的每次出现次数？

10

您如何计算当前目录中所有文件中术语的每次出现次数？-和子目录（？）

我已经读过，要做到这一点，您会使用grep；确切的命令是什么？

另外，是否可以通过其他命令执行以上操作？

— 告诉我为什么
source

12

使用grep+ wc（这将满足该术语在同一行上的多次出现）：

grep -rFo foo | wc -l

-rin grep：在当前目录层次结构中递归搜索；
-Fin grep：匹配固定字符串而不是模式；
-oin grep：仅打印匹配项；
-lin wc：打印行数；

% tree                 
.
├── dir
│   └── file2
└── file1

1 directory, 2 files
% cat file1 
line1 foo foo
line2 foo
line3 foo
% cat dir/file2 
line1 foo foo
line2 foo
line3 foo
% grep -rFo foo | wc -l
8

— 科斯
source

我认为最好的。

— Jacob Vlijm

1

@JacobVlijm谢谢！我也喜欢你太（和upvoted它已经）

— 科斯

我认为PCREs不应该使用它们，因为它们是实验性的

— 爱德华·托瓦尔兹

2

PCRE并不是“实验性的”，但是它们也不总是被编译到grep中（这就是为什么在需要它们时使用pcregrep的原因）。但是，在这种情况下，它们是不必要的，因为问题会询问可能是固定字符串而不是任何类型的“项”。因此，-F可能会更快。

— dannysauer 2015年

2

@dannysauer我之所以使用PCRE，是因为出于某些（错误）原因，我认为需要它们来匹配同一行上的多次出现，但实际上并非如此。我只是没有尝试使用-F而不是-P。感谢您的伟大建议，使用进行更新-F，确实更适合此处。

— kos 2015年

8

grep -Rc [term] *会做到的。该-R标志意味着您要递归搜索当前目录及其所有子目录。该*是一个文件选择的意义：所有文件。该-c标志使grep输出仅出现次数。但是，如果单词在一行上出现多次，则仅计数一次。

来自man grep：

  -r, --recursive
          Read all files under each directory, recursively, following symbolic links only if they are on the command line.
          This is equivalent to the -d recurse option.

   -R, --dereference-recursive
          Read all files under each directory, recursively.  Follow all symbolic links, unlike -r.

如果目录中没有符号链接，则没有区别。

— 乔斯
source

您可以将-c标志添加到grep。然后grep会自我计数，您不需要wc

— Wayne_Yux

您可能想摆--在前面*

— 爱德华·托瓦尔兹

2

该*只扩展到非点文件，让你不错过那些。仅使用“”更有意义。因为您无论如何都要递归处理参数-这将得到点文件。这里更大的问题是，这将可能使行数而不是单词出现的次数。如果该术语多次出现在一行上，则只会由“ grep -c”计数一次

— dannysauer 2015年

2

在一个小的python脚本中：

#!/usr/bin/env python3
import os
import sys

s = sys.argv[1]
n = 0
for root, dirs, files in os.walk(os.getcwd()):
    for f in files:
        f = root+"/"+f      
        try:
            n = n + open(f).read().count(s)
        except:
            pass
print(n)

另存为count_string.py。

使用以下命令从目录运行它：

python3 /path/to/count_string.py <term>

笔记

如果该术语包含空格，请使用引号。
它会递归计算该术语的每次出现，如果一行中多次出现也是如此。

说明：

# get the current working directory
currdir = os.getcwd()
# get the term as argument
s = sys.argv[1]
# count occurrences, set start to 0 
n = 0
# use os.walk() to read recursively
for root, dirs, files in os.walk(currdir):
    for f in files:
        # join the path(s) above the file and the file itself
        f = root+"/"+f
        # try to read the file (will fail if the file is unreadable for some reason)
        try:
            # add the number of found occurrences of <term> in the file
            n = n + open(f).read().count(s)
        except:
            pass
print(n)

— 雅各布·弗利姆
source

2

python家伙;） +1

— TellMeWhy 2015年

1

顺便说一句什么是root和f呢？

— TellMeWhy 2015年

1

root是文件的路径，包括当前目录“之上”，f是文件。或者， os.path.join()可以使用，但是更详细。

— Jacob Vlijm

1

还有n = n + open(f).read().count(s)呢

— TellMeWhy 2015年

2

这似乎是唯一将所有出现的次数都计入OP要求的答案。AFAIK，所有使用grep的解决方案都将对出现该术语的所有行进行计数，因此包含该术语三次的行将仅算作一次。

— 乔

2

作为@kos好的答案的一种变体，如果您希望逐项计数，则可以使用grep的-c开关来计算出现次数：

$ grep -rFoc foo
file1:3
dir/file2:3

— emacs_ftw
source