Git Blame提交统计

198

我如何“滥用”责备（或某些更合适的功能，和/或与shell命令结合使用），以统计当前存储库中来自每个提交者的行数（代码）？

示例输出：

Committer 1: 8046 Lines
Committer 2: 4378 Lines

git

— 埃里克·艾格纳
source

11

确实应该为此使用一个内置命令...有些命令不常用。

— Ciro Santilli郝海东冠状病六四事件法轮功2014年

@CiroSantilli，但添加从git可以调用的shellscript很容易。

— Alex

如何计算Git存储库中特定作者更改的总行数的可能重复项？因为它可以很容易地减少到一个：只是遍历所有的作者

— 西罗桑蒂利郝海东冠状病六四事件法轮功

1

这是非常棒的code.google.com/p/gitinspector，特别是如果您

— 要按

166

更新资料

git ls-tree -r -z --name-only HEAD -- */*.c | xargs -0 -n1 git blame \
--line-porcelain HEAD |grep  "^author "|sort|uniq -c|sort -nr

我在路上更新了一些东西。

为了方便起见，您还可以将其放入自己的命令中：

#!/bin/bash

# save as i.e.: git-authors and set the executable flag
git ls-tree -r -z --name-only HEAD -- $1 | xargs -0 -n1 git blame \
 --line-porcelain HEAD |grep  "^author "|sort|uniq -c|sort -nr

将此存储在您的路径中的某处或修改您的路径并像使用它一样

git authors '*/*.c' # look for all files recursively ending in .c
git authors '*/*.[ch]' # look for all files recursively ending in .c or .h
git authors 'Makefile' # just count lines of authors in the Makefile

原始答案

虽然可接受的答案可以完成工作，但速度非常慢。

$ git ls-tree --name-only -z -r HEAD|egrep -z -Z -E '\.(cc|h|cpp|hpp|c|txt)$' \
  |xargs -0 -n1 git blame --line-porcelain|grep "^author "|sort|uniq -c|sort -nr

几乎是瞬时的。

要获取当前跟踪的文件列表，可以使用

git ls-tree --name-only -r HEAD

此解决方案避免调用file来确定文件类型，并出于性能原因使用grep匹配所需的扩展名。如果应包括所有文件，只需将其从行中删除。

grep -E '\.(cc|h|cpp|hpp|c)$' # for C/C++ files
grep -E '\.py$'               # for Python files

如果文件可以包含空格，这对shell不利，则可以使用：

git ls-tree -z --name-only -r HEAD | egrep -Z -z '\.py'|xargs -0 ... # passes newlines as '\0'

给出文件列表（通过管道），可以使用xargs调用命令并分发参数。允许处理多个文件的命令忽略-n1。在这种情况下，我们调用，git blame --line-porcelain并且每次调用都使用1个参数。

xargs -n1 git blame --line-porcelain

然后，我们针对出现的“作者”过滤输出进行排序，并通过以下方式对重复的行进行计数：

grep "^author "|sort|uniq -c|sort -nr

注意

其他答案实际上过滤掉仅包含空格的行。

grep -Pzo "author [^\n]*\n([^\n]*\n){10}[\w]*[^\w]"|grep "author "

上面的命令将打印包含至少一个非空白字符的行的作者。您还可以使用match \w*[^\w#]，它还将排除第一个非空白字符不是a的行#（在许多脚本语言中为注释）。

— 亚历克斯
source

2

@nilbus：你不能。echo "a\nb\nc"|xargs -n1 cmd将会扩展为cmd a; cmd b; cmd d

— Alex

2

--line-porcelain似乎不再起作用（git 1.7.5.4），而是使用--porcelain

— isoiphone 2013年

4

OSX用户，请尝试以下操作（仍然不能使用名称中包含换行符的文件）：

git ls-tree --name-only -r HEAD | grep -E '\.(cc|h|m|hpp|c)$' | xargs -n1 git blame --line-porcelain | grep "^author "|sort|uniq -c|sort -nr

— Wayne

3

如果你只是想在当前路径下的一切，任何深度，使用“./”作为路径过滤器（其中回答者把‘ / .c’的）。

— Ben Dilts 2014年

2

也许用“怪-w”，以获得更好的代码的所有权时，代码才被重新格式化stackoverflow.com/questions/4112410/...

— sleeplessnerd

124

我写了一个叫做git-fame的gem 可能会有用。

安装和使用：

$ gem install git_fame
$ cd /path/to/gitdir
$ git fame

输出：

Statistics based on master
Active files: 21
Active lines: 967
Total commits: 109

Note: Files matching MIME type image, binary has been ignored

+----------------+-----+---------+-------+---------------------+
| name           | loc | commits | files | distribution (%)    |
+----------------+-----+---------+-------+---------------------+
| Linus Oleander | 914 | 106     | 21    | 94.5 / 97.2 / 100.0 |
| f1yegor        | 47  | 2       | 7     |  4.9 /  1.8 / 33.3  |
| David Selassie | 6   | 1       | 2     |  0.6 /  0.9 /  9.5  |
+----------------+-----+---------+-------+---------------------+

— 莱纳斯·夹竹桃
source

5

最后+1正常工作，看起来好像给出了合理的数字，其余命令行中的1个由于公用程序不兼容而无法在OSX上运行，或者在我的仓库中给出了小数字。这是在OSX和红宝石1.9.3（BREW）

— KARTHIKŤ

9

别傻了，@ tcaswell。指向有用的内容并不是垃圾邮件，即使您碰巧是写些有用的东西。

— 韦恩

5

回答我自己的问题：git fame --exclude = paths / to / files，paths / to / other / files

— Maciej Swic 2014年

2

@Adam：你还有这个问题吗？在OS X 10.9.5上对我来说效果很好。

— Sam Dutton 2015年

2

对于任何大于几个提交的回购，这个宝石需要做的时间都是天文数字

— Erik Aigner

48

git ls-tree -r HEAD|sed -re 's/^.{53}//'|while read filename; do file "$filename"; done|grep -E ': .*text'|sed -r -e 's/: .*//'|while read filename; do git blame -w "$filename"; done|sed -r -e 's/.*\((.*)[0-9]{4}-[0-9]{2}-[0-9]{2} .*/\1/' -e 's/ +$//'|sort|uniq -c

逐步说明：

列出所有受版本控制的文件

git ls-tree -r HEAD|sed -re 's/^.{53}//'

将列表修剪为仅文本文件

|while read filename; do file "$filename"; done|grep -E ': .*text'|sed -r -e 's/: .*//'

Git指责所有文本文件，忽略空格更改

|while read filename; do git blame -w "$filename"; done

拔出作者姓名

|sed -r -e 's/.*\((.*)[0-9]{4}-[0-9]{2}-[0-9]{2} .*/\1/' -e 's/ +$//'

对作者列表进行排序，并让uniq计算连续重复的行数

|sort|uniq -c

输出示例：

   1334 Maneater
   1924 Another guy
  37195 Brian Ruby
   1482 Anna Lambda

— 零
source

1

似乎我使用的是其他sed版本，我的语言不了解该-r标志，并且正则表达式存在问题（即使我删除了多余的内容，抱怨不平衡的括号(）。

— Erik Aigner'1

7

没关系，sudo brew install gnu-sed解决了。奇迹般有效！

— Erik Aigner'1

5

或port install gsed对于MacPorts用户。

— 加文·布罗克

我做了一个sudo brew install gnu-sed（奏效了），但仍然收到sed无法识别-r的错误。:(

— 亚当·塔特尔

1

通过macports安装gsed之后，在OSX上我运行了以下命令以使其工作（用gsed替换sed）：

git ls-tree -r HEAD|gsed -re 's/^.{53}//'|while read filename; do file "$filename"; done|grep -E ': .*text'|gsed -r -e 's/: .*//'|while read filename; do git blame -w "$filename"; done|gsed -r -e 's/.*\((.*)[0-9]{4}-[0-9]{2}-[0-9]{2} .*/\1/' -e 's/ +$//'|sort|uniq -c

— nerdherd

38

git summary通过所提供的混帐额外包正是你所需要的。在git-extras-git-summary上查看文档：

git summary --line

提供如下所示的输出：

project  : TestProject
lines    : 13397
authors  :
8927 John Doe            66.6%
4447 Jane Smith          33.2%
  23 Not Committed Yet   0.2%

— 阿迪乌斯
source

1

很好，但是似乎不支持路径过滤器，或者至少不支持子目录参数。会更好。

— spinkus

1

干净的解决方案。由于某种原因，@ Alex的答案产生的行数非常少。这只是开箱即用。花了大约30秒钟的时间，大约200K行分布在数百个文件中。

— fgblomqvist

6

Erik的解决方案很棒，但是我在变音符号方面遇到了一些问题（尽管我的LC_*环境变量表面上正确设置了），并且在实际上带有日期的代码行上泄漏了噪声。我的sed-fu很差，所以我最终得到了这个带有红宝石的科学怪人的代码段，但是它在200,000 LOC以上对我来说是完美的，并且对结果进行了排序：

git ls-tree -r HEAD | gsed -re 's/^.{53}//' | \
while read filename; do file "$filename"; done | \
grep -E ': .*text' | gsed -r -e 's/: .*//' | \
while read filename; do git blame "$filename"; done | \
ruby -ne 'puts $1.strip if $_ =~ /^\w{8} \((.*?)\s*\d{4}-\d{2}-\d{2}/' | \
sort | uniq -c | sort -rg

还要注意，gsed而不是sed因为这是二进制自制程序安装，而使系统sed完整无缺。

— gtd
source

4

git shortlog -sn

这将显示每个作者的提交列表。

— 莫尼丁
source

17

这将返回每个作者的提交次数，而不是行数。

— v64 2011年

在确定项目/目录/文件的主要贡献者时非常有帮助

— Ares

4

这是@Alex答案的主要片段，它实际上是汇总非议行的操作。我已将其缩减为可用于单个文件而不是一组文件。

git blame --line-porcelain path/to/file.txt | grep  "^author " | sort | uniq -c | sort -nr

我将其发布在这里是因为我经常回到这个答案，并重新阅读该帖子并重新消化这些示例以提取我认为有价值的部分。对于我的用例来说，它的通用性还不够。它的范围适用于整个C项目。

我喜欢列出每个文件的统计信息，是通过bash for迭代器实现的，而不是xargs因为我发现xargs的可读性差，难以使用/存储，xargs vs for的优缺点应在其他地方讨论。

这是一个实用的代码段，将分别显示每个文件的结果：

for file in $(git ls-files); do \
    echo $file; \
    git blame --line-porcelain $file \
        | grep  "^author " | sort | uniq -c | sort -nr; \
    echo; \
done

我测试过，在bash shell中运行此命令是ctrl + c安全的，如果需要将其放在bash脚本中，则可能需要在SIGINT和SIGTERM上进行陷阱，如果希望用户能够中断for循环。

— 雷神召唤师
source

1

git blame -w -M -C -C --line-porcelain path/to/file.txt | grep -I '^author ' | sort | uniq -ic | sort -nr在git blame 此处进行了一些微调，可以更准确地描绘我想要的统计信息。具体来说，-M和-C -C选项（故意使用两个C）。-M检测文件中的移动，-C -C检测其他文件中的复制行。在这里查看文档。为了完整起见，-w忽略空格。

— John Lee

3

查看gitstats命令，该命令可从http://gitstats.sourceforge.net/获得。

— 伊万
source

1

我有这个解决方案，它计算所有文本文件（二进制文件，甚至是版本控制文件除外）中的归咎行：

IFS=$'\n'
for file in $(git ls-files); do
    git blame `git symbolic-ref --short HEAD` --line-porcelain "$file" | \
        grep  "^author " | \
        grep -v "Binary file (standard input) matches" | \
        grep -v "Not Committed Yet" | \
        cut -d " " -f 2-
    done | \
        sort | \
        uniq -c | \
        sort -nr

— 加布里埃尔·迭戈（Gabriel Diego）
source

1

如果您想检查某个源模块，则此方法可在存储库源结构的任何目录中使用。

find . -name '*.c' | xargs -n1 git blame --line-porcelain | grep "^author "|sort|uniq -c|sort -nr

— 马丁·G
source

0

我采用了Powershell 的最佳答案：

(git ls-tree -rz --name-only HEAD).Split(0x00) | where {$_ -Match '.*\.py'} |%{git blame -w --line-porcelain HEAD $_} | Select-String -Pattern '^author ' | Group-Object | Select-Object -Property Count, Name | Sort-Object -Property Count -Descending

关于是否git blame使用该-w开关，它是可选的，我添加了它，因为它忽略空格更改。

尽管Bash解决方案在WSL2下运行，但我的机器上的性能还是偏爱Powershell（同一仓库大约为50s，而相同仓库为65s）。

— 马特·M。
source

-1

制作了自己的脚本，该脚本是@nilbus和@Alex的组合

#!/bin/sh

for f in $(git ls-tree -r  --name-only HEAD --);
do
    j=$(file "$f" | grep -E ': .*text'| sed -r -e 's/: .*//');
    if [ "$f" != "$j" ]; then
        continue;
    fi
    git blame -w --line-porcelain HEAD "$f" | grep  "^author " | sed 's/author //'`enter code here`
done | sort | uniq -c | sort -nr

— 沃斯曼77
source

对我来说，您的事情enter code here引起了问题。

— Menios

-1

针对MacOS上运行的单个源文件的Bash函数。

function glac {
    # git_line_author_counts
    git blame -w "$1" |  sed -E "s/.*\((.*) +[0-9]{4}-[0-9]{2}.*/\1/g" | sort | uniq -c | sort -nr
}

— jxramos
source