为什么命令uniq -c开头放置空格？

11

我在shell脚本中有以下代码：

sort input | uniq -c | sort -nr > output

输入文件没有前面的空格，但是输出有。我该如何解决？这是bash

command-line uniq

— 杰里米·维克（Jeremy Wik）
source

13

uniq的默认行为是右对齐宽度为7个空格的行中的频率，然后用一个空格将频率与项分开。

来源：https : //www.thelinuxrain.com/articles/tweaking-uniq-c

用sed除去前导空格：

$ sort input | uniq -c | sort -nr | sed 's/^\s*//' > output

— 古努
source

2

7个空格，也就是“少于一个制表符”。

— 克莱里斯

然后，您可以使用类似选项卡的选项卡将其分开perl -pe 's/ *(\d+) /$1\t/'（此处是一些替代方法）。还可通过管道将其xclip -selection c直接粘贴到电子表格中。

— Pablo Bianchi

5

uniq -c添加领先的空格。例如

$ echo test
test
$ echo test | uniq -c
      1 test

您可以在管道的末尾添加命令以将其删除。例如

$ echo test | uniq -c | sed 's/^\s*//'
1 test

— wjandrea
source

1

FWIW您可以使用其他排序工具来获得更大的灵活性。Python就是这样一种工具。

资源

#!/usr/bin/python3
import sys, operator, collections

counter = collections.Counter(map(operator.methodcaller('rstrip', '\n'), sys.stdin))
for item, count in counter.most_common():
    print(count, item)

从理论上讲，这甚至比sort用于大型输入的工具还要快，因为上述程序使用哈希表来标识重复行而不是排序列表。（可惜的是，相同数目的行以任意顺序而不是自然顺序放置；可以对其进行修改，但仍比两次sort调用要快。）

输出格式

如果要在输出格式上有更大的灵活性，可以查看print()和format()内置函数。

例如，如果要以八进制打印计数编号（最多7个前导零），然后打印制表符而不是带有NUL行终止符的空格字符，则将最后一行替换为：

    print(format(count, '08o'), item, sep='\t', end='\0')

用法

将脚本存储在一个文件中，例如sort_count.py，然后使用Python调用它：

python3 sort_count.py < input

— 大卫·福斯特
source

0

uniq -c -i | tr -s ' ' | cut -c 2-

用tr -s将前导空格转换为单个空格，然后使用cut -c打印第二个字符的输出。

— 凯坦·盖德瓦勒
source

您的解决方案将压缩所有出现的空白序列。这是理想的效果。

— Marc Vanhoomissen