如何使用grep将输出分为两个文件？

14

我有一个mycommand.sh无法两次运行的脚本。我想将输出分为两个不同的文件，一个文件包含与正则表达式匹配的行，另一个文件包含与正则表达式不匹配的行。我希望拥有的基本上是这样的：

./mycommand.sh | grep -E 'some|very*|cool[regex].here;)' --match file1.txt --not-match file2.txt

我知道我可以将输出重定向到一个文件，然后将其重定向到两个带有-v选项和不带-v选项的不同版本，并将它们的输出重定向到两个不同的文件。但是我想知道是否可以用一个grep做到这一点。

那么，是否可以在一行中实现我想要的？

grep io-redirection

— yukashima huksay
source

20

有很多方法可以完成此任务。

使用awk

以下命令发送coolregex与file1 匹配的所有行。所有其他行转到file2：

./mycommand.sh | awk '/[coolregex]/{print>"file1";next} 1' >file2

怎么运行的：

/[coolregex]/{print>"file1";next}

与正则表达式匹配的所有行都coolregex打印到file1。然后，我们跳过所有剩余的命令并跳转到该next行上重新开始。
1

所有其他行都发送到stdout。 1是awk的在线打印的隐秘速记。

也可以分为多个流：

./mycommand.sh | awk '/regex1/{print>"file1"} /regex2/{print>"file2"} /regex3/{print>"file3"}'

使用流程替代

这不像awk解决方案那样优雅，但是为了完整起见，我们还可以结合使用多次抓取和流程替换：

./mycommand.sh | tee >(grep 'coolregex' >File1) | grep -v 'coolregex' >File2

我们还可以分为多个流：

./mycommand.sh | tee >(grep 'coolregex' >File1) >(grep 'otherregex' >File3) >(grep 'anotherregex' >File4) | grep -v 'coolregex' >File2

— 约翰1024
source

太酷了！是否还可以将其拆分为多个文件，而无需执行另一个awk而不是file2？我的意思是正则表达式可以重叠。

— yukashima huksay

1

@aran是的，awk非常灵活。确切的说，如何做取决于正则表达式的重叠方式。

— John1024 '17

我很想看到一个解决方案，即使它不支持重叠的正则表达式。通过重叠，我的意思是希望子集的交点不会毫无生气地空着。

— yukashima huksay

1

@aran我已经为两种方法的多个流添加了答案示例。

— John1024 '17

8

sed -n -e '/pattern_1/w file_1' -e '/pattern_2/w file_2' input.txt

w filename -将当前模式空间写入文件名。

如果您希望所有匹配的行都去，file_1而所有不匹配的行都去file_2，则可以执行以下操作：

sed -n -e '/pattern/w file_1' -e '/pattern/!w file_2' input.txt

要么

sed -n '/pattern/!{p;d}; w file_1' input.txt > file_2

说明

/pattern/!{p;d};
- /pattern/!-否定-如果一行不包含pattern。
- p -打印当前图案空间。
- d-删除图案空间。开始下一个周期。
- 因此，如果一行不包含模式，它将把该行打印到标准输出并选择下一行。file_2在我们的情况下，标准输出重定向到。当行与模式不匹配时，未到达sed脚本的下一部分（w file_1）。
w file_1-如果一行包含模式，则该/pattern/!{p;d};部分将被跳过（因为仅当模式不匹配时才执行该部分），因此该行转到file_1。

— 最小最大
source

您能否在最后一个解决方案中添加更多说明？

— yukashima huksay

@aran添加了解释。另外，指令被修正- file_1和file_2被交换到正确的顺序。

— MiniMax

0

我喜欢该sed解决方案，因为它不依赖bashisms，并且将输出文件放在相同的位置。AFAIK，没有独立的Unix工具可以满足您的需求，因此您需要自己进行编程。如果我们放弃瑞士军刀的方法，则可以使用任何脚本语言（Perl，Python，NodeJS）。

这是在NodeJS中完成的方式

  #!/usr/bin/env node

  const fs = require('fs');
  const {stderr, stdout, argv} = process;

  const pattern = new RegExp(argv[2] || '');
  const yes = argv[3] ? fs.createWriteStream(argv[3]) : stdout;
  const no = argv[4] ? fs.createWriteStream(argv[4]) : stderr;

  const out = [no, yes];

  const partition = predicate => e => {
    const didMatch = Number(!!predicate(e));
    out[didMatch].write(e + '\n');
  };

  fs.readFileSync(process.stdin.fd)
    .toString()
    .split('\n')
    .forEach(partition(line => line.match(pattern)));

用法示例

# Using designated files
./mycommand.sh | partition.js pattern file1.txt file2.txt

# Using standard output streams
./partition.js pattern > file1.txt 2> file2.txt

— 埃里亚斯
source

0

如果您不介意使用Python和其他正则表达式语法：

#!/usr/bin/env python3
import sys, re

regex, os1, os2 = sys.argv[1:]
regex = re.compile(regex)
with open(os1, 'w') as os1, open(os2, 'w') as os2:
    os = (os1, os2)
    for line in sys.stdin:
        end = len(line) - line.endswith('\n')
        os[regex.search(line, 0, end) is not None].write(line)

用法

./match-split.py PATTERN FILE-MATCH FILE-NOMATCH

例

printf '%s\n' foo bar baz | python3 match-split.py '^b' b.txt not-b.txt

— 大卫·福斯特
source