Bash-根据部分文件名列表检查目录中的文件

8

我有一台服务器，每天每个客户端将一个文件接收到目录中。文件名的构造如下：

uuid_datestring_other-data

例如：

d6f60016-0011-49c4-8fca-e2b3496ad5a7_20160204_023-ERROR

uuid 是标准格式的uuid。
datestring是的输出date +%Y%m%d。
other-data 长度可变，但绝不包含下划线。

我有一个格式的文件：

#
d6f60016-0011-49c4-8fca-e2b3496ad5a7    client1
d5873483-5b98-4895-ab09-9891d80a13da    client2
be0ed6a6-e73a-4f33-b755-47226ff22401    another_client
...

我需要使用bash检查文件中列出的每个uuid在目录中是否都有相应的文件。

我已经走了很远，但是感觉到我通过使用if语句走错了方向，并且我需要遍历源目录中的文件。

source_directory和uuid_list变量已在脚本的前面分配：

# Check the entries in the file list

while read -r uuid name; do
# Ignore comment lines
   [[ $uuid = \#* ]] && continue
   if [[ -f "${source_directory}/${uuid}*" ]]
   then
      echo "File for ${name} has arrived"
   else
      echo "PANIC! - No File for ${name}"
   fi
done < "${uuid_list}"

如何检查列表中的文件是否存在于目录中？我想尽可能地使用bash功能，但是如果需要的话，我也不反对使用命令。

command-line bash scripts

— Arronical
source

蟒蛇？服务器目录是否为“扁平”？

— Jacob Vlijm '02

是的，它是平面的，没有子目录。如果可能，我宁愿只使用bash。

— Arronical

1

好的，我不会发布。

— Jacob Vlijm '16

unix.stackexchange.com/q/79301/70524，stackoverflow.com/q/6363441/2072269

— 穆鲁

我真的看不出您的问题出在哪里。您将需要遍历UUID或文件，为什么一个循环比另一个更好？

— terdon '16

5

浏览文件，在名称中包含的uuid上创建关联数组（我使用参数扩展来提取uuid）。然后，读取列表，检查每个uuid的关联数组，并报告是否已记录该文件。

#!/bin/bash
uuid_list=...

declare -A file_for
for file in *_*_* ; do
    uuid=${file%%_*}
    file_for[$uuid]=1
done

while read -r uuid name ; do
    [[ $uuid = \#* ]] && continue
    if [[ ${file_for[$uuid]} ]] ; then
        echo "File for $name has arrived."
    else
        echo "File for $name missing!"
    fi
done < "$uuid_list"

— Choroba
source

1

不错（+1），但是为什么这比OP所做的要好？您似乎正在执行相同的基本操作，但分两步而不是一步。

— terdon '16

1

@terdon：主要区别是这样：-)通配符扩展仅执行一次，而不是每次您从列表中读取一行时进行，这可能也会更快。

— choroba

是的，那是一个重要的区别。足够公平:)

— terdon '16

真是太棒了，谢谢，让我+1。有没有办法包含保存文件的目录的路径？我知道我可以cd进入脚本内的目录，但是只是想知道如何获取知识。

— Arronical

@Arronical：可以，但是您必须从字符串中删除路径，可以使用file=${file##*/}。

— choroba

5

这是一种更加“轻松”和简洁的方法：

#!/bin/bash

## Read the UUIDs into the array 'uuids'. Using awk
## lets us both skip comments and only keep the UUID
mapfile -t uuids < <(awk '!/^\s*#/{print $1}' uuids.txt)

## Iterate over each UUID
for uuid in ${uuids[@]}; do
        ## Set the special array $_ (the positional parameters: $1, $2 etc)
        ## to the glob matching the UUID. This will be all file/directory
        ## names that start with this UUID.
        set -- "${source_directory}"/"${uuid}"*
        ## If no files matched the glob, no file named $1 will exist
        [[ -e "$1" ]] && echo "YES : $1" || echo  "PANIC $uuid" 
done

请注意，尽管上面的命令很漂亮，并且可以很好地处理几个文件，但是它的速度取决于UUID的数量，如果需要处理多个文件，速度会非常慢。如果是这种情况，请使用@choroba的解决方案，或者使用真正快速的方法，避免使用shell并调用perl：

#!/bin/bash

source_directory="."
perl -lne 'BEGIN{
            opendir(D,"'"$source_directory"'"); 
            foreach(readdir(D)){ /((.+?)_.*)/; $f{$2}=$1; }
           } 
           s/\s.*//; $f{$_} ? print "YES: $f{$_}" : print "PANIC: $_"' uuids.txt

为了说明时间差异，我在具有20000 UUID的文件中测试了我的bash方法，choroba和perl，其中UUID为18001，具有相应的文件名。请注意，通过将脚本的输出重定向到来运行每个测试/dev/null。

我的重击（〜3.5分钟）

real   3m39.775s
user   1m26.083s
sys    2m13.400s

乔罗巴（重击，〜0.7秒）

real   0m0.732s
user   0m0.697s
sys    0m0.037s

我的perl（〜0.1秒）：

real   0m0.100s
user   0m0.093s
sys    0m0.013s

— 特登
source

+1是一种非常简洁的方法，必须在包含文件的目录中执行。我知道我可以cd进入脚本的目录，但是有没有一种方法可以在搜索中包含文件路径？

— Arronical

@Arronical当然，请参阅更新的答案。您可以${source_directory}像在脚本中一样使用。

— terdon '16

或者使用"$2"并将其作为第二个参数传递给脚本。

— Alexis

检查它是否足够快地满足您的目的-只需一次目录扫描，而不是像这样的大量文件查找，这样做会更快。

— 亚历克西斯

1

@alexis是的，你说的很对。我做了一些测试，如果UUID /文件数量增加，这将变得非常缓慢。我添加了一种perl方法（可以在bash脚本中作为一个内衬运行，因此从技术上讲，如果您愿意接受某些创造性的命名，请仍然使用bash），这种方法要快得多。

— terdon '16

3

这是纯Bash（即没有外部命令），这是我能想到的最简洁的方法。

但是从性能角度来看，实际上并没有比您现在拥有的更好。

它将从中读取每一行path/to/file；对每一行，将第一场存储$uuid和如果匹配图案的文件打印一条消息path/to/directory/$uuid*被未找到：

#! /bin/bash
[ -z "$2" ] && printf 'Not enough arguments.\n' && exit

while read uuid; do
    [ ! -f "$2/$uuid"* ] && printf '%s missing in %s\n' "$uuid" "$2"
done <"$1"

用调用path/to/script path/to/file path/to/directory。

在包含问题中示例文件的测试目录层次结构上，使用问题中的示例输入文件的示例输出：

% tree
.
├── path
│   └── to
│       ├── directory
│       │   └── d6f60016-0011-49c4-8fca-e2b3496ad5a7_20160204_023-ERROR
│       └── file
└── script.sh

3 directories, 3 files
% ./script.sh path/to/file path/to/directory
d5873483-5b98-4895-ab09-9891d80a13da* missing in path/to/directory
be0ed6a6-e73a-4f33-b755-47226ff22401* missing in path/to/directory

— 科斯
source

3

unset IFS
set -f
set +f -- $(<uuid_file)
while  [ "${1+:}" ]
do     : < "$source_directory/$1"*  &&
       printf 'File for %s has arrived.\n' "$2"
       shift 2
done

这里的想法是不必担心报告外壳程序将为您报告的错误。如果您尝试<打开一个不存在的文件，您的外壳会抱怨。实际上，它将$0在发生错误时将脚本和错误发生所在的行号放在错误输出之前...这是默认情况下已经提供的好的信息-因此请不要打扰。

您也不需要像这样逐行读取文件-这可能非常慢。这样可以将整个事件扩展到一个由空格分隔的参数数组中，一次可以处理两个。如果您的数据与您的示例一致，$1则将始终是您的uuid $2并将是您的$name。如果bash可以打开一个与您的uuid的匹配项-并且只有一个这样的匹配项存在-然后printf发生。否则，外壳程序将诊断信息写入stderr，以说明原因。

— 麦克维
source

1

@kos-文件存在吗？如果不是，则其行为符合预期。unset IFS确保$(cat <uuid_file)在空白处分割。$IFS如果仅包含空格或未设置，则壳的分割方式会有所不同。这样的拆分扩展永远不会有任何空字段，因为所有空格序列仅作为单个字段定界符出现。我认为，只要每行上只有两个非空格分隔的字段，它就应该起作用。在中bash，无论如何。set -f确保不对全局注释解释未引用的扩展名，而设置+ f确保后面的全局注释被解释。

— mikeserv '16

@kos-我刚刚修复了它。我不应该一直在使用，<>因为那样会创建一个不存在的文件。<将按照我的意思报告。但是，可能的问题-以及我一开始使用不正确的原因<>-是，如果它是没有阅读器的管道文件，或者像行缓冲char dev一样，它将挂起。可以通过更明确地处理错误输出并执行操作来避免这种情况[ -f "$dir/$1"* ]。我们在这里谈论的是uuid，因此它绝不能扩展到一个文件以上。很好，尽管它是这样将失败的文件名报告给stderr的。

— mikeserv '16

@kos-实际上，我想我可以使用ulimit阻止它创建任何文件，因此<>仍然可以使用这种方式... <>如果将glob扩展到目录，则更好，因为在Linux上，读/写将失败并说-那就是目录。

— mikeserv '16

@kos-哦！对不起-我只是傻瓜-你有两场比赛，所以做对了。我的意思是，如果可能有两个匹配项，将以这种方式出错，这些都应该是uuid-永远不可能有2个相似名称匹配相同的glob。那完全是故意的-它在某种程度上应该是 模棱两可的。你明白我的意思吗？为文件命名一个glob并不是问题所在-与此处相关的特殊字符-问题是，bash如果仅匹配一个文件，则仅接受重定向glob。请参阅man bash“重定向”下的内容。

— mikeserv '16

1

我要使用的方法是先从文件中获取uuid，然后使用 find

awk '{print $1}' listfile.txt  | while read fileName;do find /etc -name "$fileName*" -printf "%p FOUND\n" 2> /dev/null;done

为了便于阅读，

awk '{print $1}' listfile.txt  | \
    while read fileName;do \
    find /etc -name "$fileName*" -printf "%p FOUND\n" 2> /dev/null;
    done

带有中的文件列表的示例/etc/，查找passwd，group，fstab和THISDOESNTEXIST文件名。

$ awk '{print $1}' listfile.txt  | while read fileName;do find /etc -name "$fileName*" -printf "%p FOUND\n" 2> /dev/null; done
/etc/pam.d/passwd FOUND
/etc/cron.daily/passwd FOUND
/etc/passwd FOUND
/etc/group FOUND
/etc/iproute2/group FOUND
/etc/fstab FOUND

既然您提到目录是平坦的，则可以使用该-printf "%f\n"选项仅打印文件名本身

这不做的是列出丢失的文件。find的一个小缺点是，只有找到匹配的文件时，它才会告诉您是否找不到文件。但是，可以做的是检查输出-如果输出为空，则缺少文件

awk '{print $1}' listfile.txt  | while read fileName;do RESULT="$(find /etc -name "$fileName*" -printf "%p\n" 2> /dev/null )"; [ -z "$RESULT"  ] && echo "$fileName not found" || echo "$fileName found"  ;done

更具可读性：

awk '{print $1}' listfile.txt  | \
   while read fileName;do \
   RESULT="$(find /etc -name "$fileName*" -printf "%p\n" 2> /dev/null )"; \
   [ -z "$RESULT"  ] && echo "$fileName not found" || \
   echo "$fileName found"  
   done

这是它作为一个小脚本执行的方式：

skolodya@ubuntu:$ ./listfiles.sh                                               
passwd found
group found
fstab found
THISDONTEXIST not found

skolodya@ubuntu:$ cat listfiles.sh                                             
#!/bin/bash
awk '{print $1}' listfile.txt  | \
   while read fileName;do \
   RESULT="$(find /etc -name "$fileName*" -printf "%p\n" 2> /dev/null )"; \
   [ -z "$RESULT"  ] && echo "$fileName not found" || \
   echo "$fileName found"  
   done

可以使用stat另一种方法，因为它是一个平面目录，但是如果您决定添加以下代码，则以下代码对于子目录将无法递归工作：

$ awk '{print $1}' listfile.txt  | while read fileName;do  stat /etc/"$fileName"* 1> /dev/null ;done        
stat: cannot stat ‘/etc/THISDONTEXIST*’: No such file or directory

如果我们采用这个stat想法并付诸实践，则可以使用stat的退出代码来指示文件是否存在。实际上，我们要这样做：

$ awk '{print $1}' listfile.txt  | while read fileName;do  if stat /etc/"$fileName"* &> /dev/null;then echo "$fileName found"; else echo "$fileName NOT found"; fi ;done

样品运行：

skolodya@ubuntu:$ awk '{print $1}' listfile.txt  | \                                                         
> while read FILE; do                                                                                        
> if stat /etc/"$FILE" &> /dev/null  ;then                                                                   
> echo "$FILE found"                                                                                         
> else echo "$FILE NOT found"                                                                                
> fi                                                                                                         
> done
passwd found
group found
fstab found
THISDONTEXIST NOT found

— 塞尔吉·科洛季娅（Sergiy Kolodyazhnyy）
source