删除除最新文件以外的所有内容

8

可以说我有一个目录ḟoo/，其中包含许多采用某种目录结构的文件。我需要保留其中一些，但不是全部。

是否有一种方法可以（删除）除（例如）最新的500个以外的所有内容？

linux filesystems

11

我定期执行此任务，并使用以下内容的变体。这是一个结合了各种简单工具的管道：查找所有文件，添加文件修改时间，排序，删除文件修改时间，首先显示除500之外的所有行，然后删除它们：

find foo/ -type f | perl -wple 'printf "%12u ", (stat)[9]' | \
    sort -r | cut -c14- | tail -n +501 | \
    while read file; do rm -f -- "$file"; done

一些评论：

如果您正在使用“ bash”，则应该使用“ read -r file”，而不仅仅是“ read file”。
使用“ perl”删除文件的速度更快（并且比使用while循环处理文件名中的“怪异”字符更好，除非您使用的是“ read -r文件”）：
```
... | tail -n +501 | perl -wnle 'unlink() or warn "$_: unlink failed: $!\n"'
```
某些版本的“ tail”不支持“ -n”选项，因此必须使用“ tail +501”。跳过500首行的一种便携式方法是
```
 ... | perl -wnle 'print if $. > 500' | ...
```
如果您的文件名包含换行符，它将不起作用。
它不需要GNU查找。

结合以上内容即可：

find foo/ -type f | perl -wple 'printf "%12u ", (stat)[9]' | \
    sort -r | cut -c14- | perl -wnle 'print if $. > 500' | \
    perl -wnle 'unlink() or warn "$_: unlink failed: $!\n"'

— 彼得·约翰·阿克兰
source

我会小心的rm -f。

— 的CVn

奇迹般有效！这应该作为$ path和$ count参数的别名使用。非常感谢！

— 达利博尔·卡洛维奇（DallibKarlović），

4

这就是我将在Python 3中执行的方法。Python3也应可在其他操作系统上使用。测试完之后，请确保取消注释实际删除文件的行。

import os,os.path
from collections import defaultdict

FILES_TO_KEEP = 500
ROOT_PATH = r'/tmp/'

tree = defaultdict(list)

# create a dictionary containing file names with their date as the key
for root, dirs, files in os.walk(ROOT_PATH):
    for name in files:
        fname = os.path.join(root,name)
        fdate = os.path.getmtime( fname )
        tree[fdate].append(fname)

# sort this dictionary by date
# locate where the newer files (that you want to keep) end
count = 0
inorder = sorted(tree.keys(),reverse=True)
for key in inorder:
    count += len(tree[key])
    if count >= FILES_TO_KEEP:
        last_key = key
        break

# now you know where the newer files end, older files begin within the dict
# act accordingly
for key in inorder:
    if key < last_key:
        for f in tree[key]:
            print("remove ", f)
            # uncomment this next line to actually remove files
            #os.remove(f)
    else:
        for f in tree[key]:
            print("keep    ", f)

— ft
source

4

我不知道“最新的500个”，但是通过查找，您可以删除X分钟/天以上的旧内容。超过2天的文件示例：

find foo/ -mtime +2 -a -type f -exec rm -fv \{\} \;

首先测试：

find foo/ -mtime +2 -a -type f -exec ls -al \{\} \;

注意反斜杠和“ \;”之前的空格。有关更多信息，请参见查找手册页。

— 安德烈亚斯
source

“（说）最新的500个”是这里的本质，因此我不认为这如何回答原始问题。

— 彼得·约翰·阿克兰

对不起，我不清楚。

— AndreasM 2011年

3

如果您可以将文件保留x天/小时而不是最新的x号，则可以使用 tmpwatch --ctime 7d

— 西雷克斯
source

2

我认为命令的-mtime和-newer选项find对您有用。您可以查看man find更多信息。

— 哈立德
source

0

为什么不使用这个简单的代码：

$ ls -t1 foo/| xargs -d '\n' rm --

— eppesuig
source

1

如何删除除500个最新文件以外的所有文件？以及如何处理子目录？我认为您可能误解了原始帖子。

— 彼得·约翰·阿克兰