Answers:
与awk
:
awk '{total += $0; $0 = total}1'
$0
是当前行。因此,对于每行,我将其添加到中total
,将行设置为new total
,然后尾部1
是awk快捷方式-它为每个真实条件打印当前行,并且1
条件的计算结果为true。
print
也可以使用吗?
print total}
而不是$0 = total}1
{print(total += $0)}
在python脚本中:
#!/usr/bin/env python3
import sys
f = sys.argv[1]; out = sys.argv[2]
n = 0
with open(out, "wt") as wr:
with open(f) as read:
for l in read:
n = n + int(l); wr.write(str(n)+"\n")
add_last.py
使用源文件和目标输出文件作为参数运行它:
python3 /path/to/add_last.py <input_file> <output_file>
该代码可读性强,但详细说明:
打开输出文件以写入结果
with open(out, "wt") as wr:
打开输入文件以按行读取
with open(f) as read:
for l in read:
阅读各行,将新行的值添加到总计中:
n = n + int(l)
将结果写入输出文件:
wr.write(str(n)+"\n")
纯娱乐
$ sed 'a+p' file | dc -e0 -
3
7
12
20
这是通过一个 ppending +p
到输入的各行,然后传递结果给dc
计算器,其中
+ Pops two values off the stack, adds them, and pushes the result.
The precision of the result is determined only by the values of
the arguments, and is enough to be exact.
然后
p Prints the value on the top of the stack, without altering the
stack. A newline is printed after the value.
该-e0
参数将压0
入dc
堆栈以初始化总和。
real 0m4.234s
在Bash中:
#! /bin/bash
file="YOUR_FILE.txt"
TOTAL=0
while IFS= read -r line
do
TOTAL=$(( TOTAL + line ))
echo $TOTAL
done <"$file"
real 0m53.116s
几乎一分钟,在130万行上:)
要在标准输入上每行打印部分给定的整数和:
#!/usr/bin/env python3
import sys
partial_sum = 0
for n in map(int, sys.stdin):
partial_sum += n
print(partial_sum)
如果由于某种原因该命令太慢;您可以使用C程序:
#include <stdint.h>
#include <ctype.h>
#include <stdio.h>
int main(void)
{
uintmax_t cumsum = 0, n = 0;
for (int c = EOF; (c = getchar()) != EOF; ) {
if (isdigit(c))
n = n * 10 + (c - '0');
else if (n) { // complete number
cumsum += n;
printf("%ju\n", cumsum);
n = 0;
}
}
if (n)
printf("%ju\n", cumsum + n);
return feof(stdin) ? 0 : 1;
}
要构建并运行,请键入:
$ cc cumsum.c -o cumsum
$ ./cumsum < input > output
UINTMAX_MAX
是18446744073709551615
。
对于以下情况生成的输入文件,C代码比我的机器上的awk命令快几倍:
#!/usr/bin/env python3
import numpy.random
print(*numpy.random.random_integers(100, size=2000000), sep='\n')
accumulate()
itertool
您可能想要这样的东西:
sort -n <filename> | uniq -c | awk 'BEGIN{print "Number\tFrequency"}{print $2"\t"$1}'
命令说明:
sort -n <filename> | uniq -c
对输入进行排序并返回频率表| awk 'BEGIN{print "Number\tFrequency"}{print $2"\t"$1}'
将输出变成更好的格式示例:
输入文件list.txt
:
4
5
3
4
4
2
3
4
5
命令:
$ sort -n list.txt | uniq -c | awk 'BEGIN{print "Number\tFrequency"}{print $2"\t"$1}'
Number Frequency
2 1
3 2
4 4
5 2
您可以在vim中执行此操作。打开文件并键入以下按键:
qaqqayiwj@"<C-a>@aq@a:wq<cr>
注意,<C-a>
实际上是ctrl-a,<cr>
是回车符,即enter按钮。
这是这样的。首先,我们要清除寄存器“ a”,以使其在第一次使用时没有副作用。这很简单qaq
。然后,我们执行以下操作:
qa " Start recording keystrokes into register 'a'
yiw " Yank this current number
j " Move down one line. This will break the loop on the last line
@" " Run the number we yanked as if it was typed, and then
<C-a> " increment the number under the cursor *n* times
@a " Call macro 'a'. While recording this will do nothing
q " Stop recording
@a " Call macro 'a', which will call itself creating a loop
该递归宏运行完毕后,我们只需调用:wq<cr>
保存并退出即可。
Perl一线:
$ perl -lne 'print $sum+=$_' input.txt
3
7
12
20
拥有250万行数字,处理大约需要6.6秒:
$ time perl -lne 'print $sum+=$_' large_input.txt > output.txt
0m06.64s real 0m05.42s user 0m00.09s system
$ wc -l large_input.txt
2500000 large_input.txt
real 0m0.908s
, 相当不错。