首先复制最小的文件？

15

我有一个很大的目录，其中包含我希望递归复制的子目录和文件。

有什么方法可以告诉cp它应该按文件大小顺序执行复制操作，以便最小的文件首先被复制？

shell cp file-copy size

— 布比
source

1

只是为了确保不涉及XY问题，您能否解释为什么要这样做？

— goldilocks 2014年

4

@ TAFKA'goldilocks'-我有很多视频文件，并且我想对每个目录进行质量测试。最小的视频可以让我快速了解其余文件是否也损坏。

— nbubis

10

这样一来就可以完成整个工作-在所有子目录中，都在一个流中，而没有任何文件名问题。它将从最小的文件复制到最大的每个文件。mkdir ${DESTINATION}如果尚不存在，则需要。

find . ! -type d -print0 |
du -b0 --files0-from=/dev/stdin |
sort -zk1,1n | 
sed -zn 's/^[^0-9]*[0-9]*[^.]*//p' |
tar --hard-dereference --null -T /dev/stdin -cf - |
    tar -C"${DESTINATION}" --same-order -xvf -

你知道吗？这不做的是空子目录。我可以对该管道进行一些重定向，但这只是一个等待发生的竞争条件。最简单可能是最好的。因此，请稍后执行以下操作：

find . -type d -printf 'mkdir -p "'"${DESTINATION}"'/%p"\n' |
    . /dev/stdin

或者，由于Gilles在回答中保留目录权限非常重要，因此我也应该尝试。我认为这可以做到：

find . -type d -printf '[ -d "'"${DESTINATION}"'/%p" ] || 
    cp "%p" -t "'"${DESTINATION}"'"\n' |
. /dev/stdin

我愿意打赌这比mkdir任何时候都快。

— 麦克维
source

1

该死的mikeserv！+1

— goldilocks 2014年

3

@ TAFKA'goldilocks'我会称赞一下。非常感谢。

— mikeserv

15

这是使用的一种快速而肮脏的方法rsync。对于此示例，我认为10 MB以下的内容都是“小”的。

首先只传输小文件：

rsync -a --max-size=10m srcdir dstdir

然后传输剩余的文件。除非已修改，否则先前传输的小文件将不会重新复制。

rsync -a srcdir dstdir

从 man 1 rsync

   --max-size=SIZE
          This  tells  rsync to avoid transferring any file that is larger
          than the specified SIZE. The SIZE value can be suffixed  with  a
          string  to  indicate  a size multiplier, and may be a fractional
          value (e.g. "--max-size=1.5m").

          This option is a transfer rule, not an exclude,  so  it  doesn’t
          affect  the  data  that  goes  into  the file-lists, and thus it
          doesn’t affect deletions.  It just limits  the  files  that  the
          receiver requests to be transferred.

          The  suffixes  are  as  follows:  "K"  (or  "KiB") is a kibibyte
          (1024), "M" (or "MiB") is a mebibyte (1024*1024),  and  "G"  (or
          "GiB")  is  a gibibyte (1024*1024*1024).  If you want the multi‐
          plier to be 1000 instead of  1024,  use  "KB",  "MB",  or  "GB".
          (Note: lower-case is also accepted for all values.)  Finally, if
          the suffix ends in either "+1" or "-1", the value will be offset
          by one byte in the indicated direction.

          Examples:    --max-size=1.5mb-1    is    1499999    bytes,   and
          --max-size=2g+1 is 2147483649 bytes.

当然，逐个文件传输的顺序并不是严格按照从小到大的顺序排列，但是我认为这可能是满足您要求的最简单的解决方案。

— cpugeniusmv
source

在这里，您可以获得硬链接的2个副本，而软链接则转换为实际文件，每个都有2个副本。--copy-dest=DIR和/或--compare-dest=DIR我认为，您会做得更好。我只知道原因是我在发布自己的答案后不得不添加--hard-dereference自己，tar因为我缺少链接。我认为rsync实际上与其他文件相比，本地文件系统的行为更具体-我以前将其与USB密钥一起使用，除非设置带宽限制，否则它将淹没总线。我想我应该改用其他任何一种。

— mikeserv

1

为“快速而肮脏的方法” +1。通常，至少出于自动化目的和将来的可维护性，更简单通常更好。我认为这实际上很干净。作为设计目标，“优雅”与“笨拙”与“健壮”与“不稳定”有时可能会冲突，但可以达到很好的平衡，我认为这是优雅且相当稳健的。

— 2015年

4

不cp直接，这远远超出了它的能力范围。但是您可以安排以cp正确的顺序调用文件。

Zsh方便地允许使用glob限定符按大小对文件排序。这是一个zsh片段，该片段按从下/path/to/source-directory到下的大小递增顺序复制文件/path/to/destination-directory。

cd /path/to/source-directory
for x in **/*(.oL); do
  mkdir -p /path/to/destination-directory/$x:h
  cp $x /path/to/destination-directory/$x:h
done

您可以使用zcp函数来代替循环。但是，您需要首先创建目标目录，这可以在一个神秘的oneliner中完成。

autoload -U zmv; alias zcp='zmv -C'
cd /path/to/source-directory
mkdir **/*(/e\''REPLY=/path/to/destination-directory/$REPLY'\')
zcp -Q '**/*(.oL)' '/path/to/destination-directory/$f'

这不会保留源目录的所有权。如果需要，您需要注册一个合适的复制程序，例如cpio或pax。如果您这样做，则无需致电cp或zcp额外付费。

cd /path/to/source-directory
print -rN **/*(^.) **/*(.oL) | cpio -0 -p /path/to/destination-directory

— 吉勒斯“别再邪恶了”
source

2

我认为没有任何方法可以cp -r直接做到这一点。由于您可能需要一段不确定的时间才能获得向导find/ awk解决方案，因此这里有一个快速的perl脚本：

#!/usr/bin/perl
use strict;
use warnings FATAL => qw(all);

use File::Find;
use File::Basename;

die "No (valid) source directory path given.\n"
    if (!$ARGV[0] || !-d -r "/$ARGV[0]");

die "No (valid) destination directory path given.\n"
    if (!$ARGV[1] || !-d -w "/$ARGV[1]");

my $len = length($ARGV[0]);
my @files;
find (
    sub {
        my $fpath = $File::Find::name;
        return if !-r -f $fpath;
        push @files, [
            substr($fpath, $len),
            (stat($fpath))[7],
        ]
    }, $ARGV[0]
);

foreach (sort { $a->[1] <=> $b->[1] } @files) {
    if ($ARGV[2]) {
        print "$_->[1] $ARGV[0]/$_->[0] -> $ARGV[1]/$_->[0]\n";
    } else {
        my $dest = "$ARGV[1]/$_->[0]";
        my $dir = dirname($dest);
        mkdir $dir if !-e $dir;
        `cp -a "$ARGV[0]/$_->[0]" $dest`;
    }
}

用这个： ./whatever.pl /src/path /dest/path
参数都应该是绝对路径 ; ~，或者将Shell扩展到绝对路径的其他任何方法都可以。
如果添加第三个参数（文字常量除外0），它会复制而不是复制，以打印出将要执行的操作的报告，并带有以字节为单位的文件大小，例如
```
4523 /src/path/file.x -> /dest/path/file.x
12124 /src/path/file.z -> /dest/path/file.z
```
请注意，这些文件的大小按升序排列。
第cp34行上的命令是一个文字shell命令，因此您可以使用开关进行任何操作（我只是用来-a保留所有特征）。
File::Find并且File::Basename都是核心模块，即它们在perl的所有安装中都可用。

— 金发姑娘
source

可以说，这是这里唯一的正确答案。或者是...标题-刚刚更改了...？我的浏览器窗口被调用，cp - copy smallest files first?但帖子的标题只是“ copy smallest files first?反正”，选择永远不会受到伤害是我的理念，但是，您和David是唯一使用过的人，cp而您是唯一将其发布的人。

— mikeserv 2014年

@mikeserv我使用的唯一原因cp是因为它是在（面向跨平台的）perl中保留* nix文件特征的最简单方法。浏览器栏上显示的原因cp - 是（IMO高飞）SE功能所致，在该功能中，最流行的所选标签显示在实际标题的前面。

— goldilocks 2014年

好吧，那我就恭维了。并非如此，您pearl在这里周围很少看到木制品出来的东西。

— mikeserv

1

另一个选择是将cp与du的输出一起使用：

oldIFS=$IFS
IFS=''
for i in $(du -sk *mpg | sort -n | cut -f 2)
do
    cp $i destination
done
IFS=$oldIFS

仍然可以在一行上完成此操作，但我将其拆分，以便您可以阅读

— 戴维·威尔金斯
source

您是否至少不需要对$ IFS做些事情？

— mikeserv 2014年

是的...我一直假设文件名中没有换行符

— David Wilkins 2014年

1

这似乎也无法处理OP描述的目录层次结构中的递归。

— cpugeniusmv 2014年

1

@cpugeniusmv正确。。。我以某种方式错过了递归部分。我将其留在此处，以防它帮助看到问题的人。

— David Wilkins 2014年

1

@DavidWilkins-这很有帮助。

— nbubis 2014年