78

我需要在文件夹中找到最大的文件。
如何递归扫描文件夹并按大小对内容排序？

我尝试使用ls -R -S，但这也列出了目录。
我也尝试使用find。

— 用户名
source

1

是否要分别列出每个子目录中的文件，还是要查找所有子目录中的所有文件并按大小列出它们，而不管它们位于哪个子目录中？另外，“目录”和“文件夹”是什么意思？您似乎正在使用它们来描述不同的事物。

— terdon

您是在说只想列出给定目录中的文件及其子目录中的文件，而不只是显示子目录吗？请尝试清理您的问题，目前尚不清楚。

— slm

相关unix.stackexchange.com/questions/158289/...

— 西罗桑蒂利新疆改造中心法轮功六四事件

92

您也可以只用du。为了安全起见，我正在使用以下版本du：

$ du --version
du (GNU coreutils) 8.5

该方法：

$ du -ah ..DIR.. | grep -v "/$" | sort -rh

方式细目

该命令du -ah DIR将生成给定目录中所有文件和目录的列表DIR。该-h会产生人类可读的大小，我喜欢。如果您不希望它们，请放弃该开关。我使用head -6公正来限制输出量！

$ du -ah ~/Downloads/ | head -6
4.4M    /home/saml/Downloads/kodak_W820_wireless_frame/W820_W1020_WirelessFrames_exUG_GLB_en.pdf
624K    /home/saml/Downloads/kodak_W820_wireless_frame/easyshare_w820.pdf
4.9M    /home/saml/Downloads/kodak_W820_wireless_frame/W820_W1020WirelessFrameExUG_GLB_en.pdf
9.8M    /home/saml/Downloads/kodak_W820_wireless_frame
8.0K    /home/saml/Downloads/bugs.xls
604K    /home/saml/Downloads/netgear_gs724t/GS7xxT_HIG_5Jan10.pdf

轻松将其最小到最大排序：

$ du -ah ~/Downloads/ | sort -h | head -6
0   /home/saml/Downloads/apps_archive/monitoring/nagios/nagios-check_sip-1.3/usr/lib64/nagios/plugins/check_ldaps
0   /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/0/index/write.lock
0   /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/0/translog/translog-1365292480753
0   /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/1/index/write.lock
0   /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/1/translog/translog-1365292480946
0   /home/saml/Downloads/data/elasticsearch/nodes/0/indices/logstash-2013.04.06/2/index/write.lock

从大到小颠倒它：

$ du -ah ~/Downloads/ | sort -rh | head -6
10G /home/saml/Downloads/
3.8G    /home/saml/Downloads/audible/audio_books
3.8G    /home/saml/Downloads/audible
2.3G    /home/saml/Downloads/apps_archive
1.5G    /home/saml/Downloads/digital_blasphemy/db1440ppng.zip
1.5G    /home/saml/Downloads/digital_blasphemy

不要显示目录，只显示文件：

$ du -ah ~/Downloads/ | grep -v "/$" | sort -rh | head -6 
3.8G    /home/saml/Downloads/audible/audio_books
3.8G    /home/saml/Downloads/audible
2.3G    /home/saml/Downloads/apps_archive
1.5G    /home/saml/Downloads/digital_blasphemy/db1440ppng.zip
1.5G    /home/saml/Downloads/digital_blasphemy
835M    /home/saml/Downloads/apps_archive/cad_cam_cae/salome/Salome-V6_5_0-LGPL-x86_64.run

如果您只希望列出从最小到最大，但到前6个有问题的文件，则可以反转排序开关，拖放（-r），然后使用tail -6代替head -6。

$ du -ah ~/Downloads/ | grep -v "/$" | sort -h | tail -6
835M    /home/saml/Downloads/apps_archive/cad_cam_cae/salome/Salome-V6_5_0-LGPL-x86_64.run
1.5G    /home/saml/Downloads/digital_blasphemy
1.5G    /home/saml/Downloads/digital_blasphemy/db1440ppng.zip
2.3G    /home/saml/Downloads/apps_archive
3.8G    /home/saml/Downloads/audible
3.8G    /home/saml/Downloads/audible/audio_books

— slm
source

14

该grep -v "/$"部分似乎没有按照您的预期做，因为目录没有附加斜杠。有谁知道如何从结果中排除目录？

— JanWarchoł2015年

@JanekWarchol-您使用的是哪个版本的coreutils？

— slm

我在8.13。但是无论如何，答案中的输出也没有尾随/s-例如，/home/saml/Downloads/audible似乎是一个目录，但没有斜杠。仅/home/saml/Downloads/带有斜杠，但这可能是因为在为initial指定参数时用斜杠写了它du。

— JanWarchoł2015年

1

这也发现了迪尔斯

— ekerner '17

1

这不仅列出文件，还列出目录:(

— Roman Gaufman

20

如果要使用GNU在当前目录及其子目录中查找所有文件并根据其大小（不考虑其路径）列出它们，并假设没有文件名包含换行符find，则可以执行以下操作：

find . -type f -printf "%s\t%p\n" | sort -n

从man find一个GNU系统上：

   -printf format
          True; print format  on  the  standard  output,
          interpreting  `\'  escapes and `%' directives.
          Field widths and precisions can  be  specified
          as  with the `printf' C function.  Please note
          that many of the  fields  are  printed  as  %s
          rather  than  %d, and this may mean that flags
          don't work as you  might  expect.   This  also
          means  that  the `-' flag does work (it forces
          fields to be  left-aligned).   Unlike  -print,
          -printf  does  not add a newline at the end of
          the string.  The escapes and directives are:

          %p     File's name.
          %s     File's size in bytes.

来自man sort：

   -n, --numeric-sort
          compare according to string numerical value

— Terdon
source

不幸的是，不适用于Mac，显示：查找：-printf：未知的主操作符或运算符

— Roman Gaufman

@RomanGaufman是的，这就是答案指定GNU查找的原因。如果您在Mac上安装了GNU工具，它也将在那里工作。

— terdon

11

尝试以下命令：

ls -1Rhs | sed -e "s/^ *//" | grep "^[0-9]" | sort -hr | head -n20

它将递归列出当前目录中的前20大文件。

注：该选项-h为sort不可用在OSX / BSD，所以你已经安装sort的coreutils（例如，通过brew）和本地容器路径适用于PATH，例如

export PATH="/usr/local/opt/coreutils/libexec/gnubin:$PATH" # Add a "gnubin" for coreutils.

或者使用：

ls -1Rs | sed -e "s/^ *//" | grep "^[0-9]" | sort -nr | head -n20

对于最大的目录，请使用du，例如：

du -ah . | sort -rh | head -20

要么：

du -a . | sort -rn | head -20

— Kenorb
source

3

完美，这是第一个可在Mac上运行且不显示目录的解决方案:)-谢谢！

— 罗曼·高夫曼

如何过滤以仅显示行数> = X的文件？（例如X = 0）

— 矩阵

7

这将递归查找所有文件，并按大小对其进行排序。它打印出所有文件大小（以kb为单位），并四舍五入，因此您可能会看到0 KB文件，但它足够接近我的使用范围，并且可以在OSX上使用。

find . -type f -print0 | xargs -0 ls -la | awk '{print int($5/1000) " KB\t" $9}' | sort -n -r -k1

— 布莱德公园
source

也可以在Ubuntu 14.04上工作！

— David Lam

这列出了目录，而不仅仅是文件:(

— Roman Gaufman

@RomanGaufman-感谢您的反馈！从我的测试中，find . -type f发现文件...它可以递归工作，您是对的，但是它列出了它找到的所有文件，而不是目录本身

— Brad Parks

Xargs已在1980年代使用。自1989年David Korn引入execplus以来，这是一个坏主意。

— schily

5

使用zsh，您将找到最大的文件（以表观大小（如ls -l输出中的“大小”列，而不是磁盘使用情况））：

ls -ld -- **/*(DOL[1])

对于最大的6个：

ls -ld -- **/*(DOL[1,6])

要按文件大小排序，可以使用ls的-S选项。一些ls实现还具有不对列表进行排序的-U选项ls（因为这里已经按大小对列表进行了排序zsh）。

— StéphaneChazelas
source

3

Mac / Linux跳过目录的简单解决方案：

find . -type f -exec du -h {} \; | sort -h

— 姆佩切拉
source

2

BSD或中的等效项OSX是

$ du -ah simpl | sort -dr | head -6

— 韩雪
source

0

由于多种原因，这是一个令人难以置信的共同需求（我喜欢在目录中找到最新的备份），并且这是一个令人惊讶的简单任务。

我将提供一个使用find，xargs，stat，tail，awk和sort实用程序的Linux解决方案。

大多数人提供了一些独特的答案，但是我更喜欢我的，因为它可以正确处理文件名，并且用例可以轻松更改（修改stat和sort参数）

我还将提供Python解决方案，即使在Windows上也应允许您使用此功能

Linux命令行解决方案

递归返回目录中仅文件的整个列表，按文件大小排序

find . -type f -print0 | xargs -0 -I{} stat -c '%s %n' {} | sort -n

与以前相同，但是这次返回最大的文件。

# Each utility is split on a new line to help 
# visualize the concept of transforming our data in a stream
find . -type f -print0 | 
xargs -0 -I{} stat -c '%s %n' {} | 
sort -n | 
tail -n 1 |
awk '{print $2}'

完全相同的模式，但现在选择最新文件而不是最大文件

# (Notice only the first argument of stat changed for new functionality!)
find . -type f -print0 | xargs -0 -I{} stat -c '%Y %n' {} | 
sort -n | tail -n 1 | awk '{print $2}'

说明：

find：递归地查找当前目录中的所有文件，并使用空字符将其打印出来
xargs：使用标准输入提供的参数执行命令的实用程序。对于输出的每一行，我们要在该文件上运行stat实用程序
stat：Stat是一个非常出色的命令，其中包含许多用例。我正在打印两列，第一列是块大小（％s），第二列是文件名（％n）
sort：使用数字开关对结果进行排序。由于第一个参数是整数，因此我们的结果将正确排序
tail：仅选择输出的最后一行（由于列表已排序，因此这是最大的文件！）
awk：选择第二列，其中包含文件名，并且是递归目录中最大的文件。

Python解决方案

#!/usr/bin/env python
import os, sys
files = list()
for dirpath, dirname, filenames in os.walk(sys.argv[1]):
    for filename in filenames:
        realpath = os.path.join(dirpath, filename)
        files.append(realpath)
files_sorted_by_size = sorted(files, key = lambda x: os.stat(x).st_size)
largest_file = files_sorted_by_size[-1]
print(largest_file)

该脚本需要花费更长的时间来解释，但是基本上，如果将其另存为脚本，它将搜索命令行上提供的第一个参数，并返回该目录中最大的文件。该脚本不进行错误检查，但是应该为您提供一个如何在Python中进行处理的想法，这为您提供了一个很好的平台无关的解决此问题的方法。

— 卢克·帕福德
source

0

变种这个答案来自一个类似的问题

find . -type f -exec du -ah {} + | sort -rh | more

— 克里兹·克雷格
source

0

尝试使用带有排序选项的以下命令使文件夹的大小按升序排列

du -sh * | sort -sh

— 达瓦尔·H·内娜
source

-1

在AIX和HP-UX以外的任何平台上都可以使用的功能是：

find . -ls | sort +6 | tail

— chi地
source

递归地根据大小排序文件

方式细目

Linux命令行解决方案

递归返回目录中仅文件的整个列表，按文件大小排序

与以前相同，但是这次返回最大的文件。

完全相同的模式，但现在选择最新文件而不是最大文件

Python解决方案