如何在bash中等待几个子进程完成并在任何子进程以代码！= 0结尾时返回退出代码！= 0？

561

如何在bash脚本中等待从该脚本派生的几个子进程完成并返回退出代码！= 0，当任何子进程以代码！= 0结尾时？

简单脚本：

#!/bin/bash
for i in `seq 0 9`; do
  doCalculations $i &
done
wait

上面的脚本将等待所有10个产生的子进程，但始终将退出状态设为0（请参阅参考资料help wait）。如何修改此脚本，以便在子进程中的任何一个以代码！= 0结尾时，它会发现生成的子流程的退出状态并返回退出代码1？

有没有比收集子流程的PID，按顺序等待它们并汇总出口状态更好的解决方案了？

bash process wait

— 特科科什卡
source

1

可以显着改进这一点，以触摸wait -n，现代bash中可用，仅在first / next命令完成时返回。

— Charles Duffy

如果您想使用Bash进行测试，请尝试以下方法：github.com/sstephenson/bats

— Alexander Mills

2

BATS的积极开发已移至github.com/bats-core/bats-core

— Potherca，

3

@CharlesDuffy wait -n有一个小问题：如果没有剩余的子作业（又称竞争条件），它将返回一个非零的退出状态（失败），这与失败的子进程没有区别。

— drevicko

5

@CharlesDuffy-您拥有出色的洞察力，并且通过共享为SO提供了巨大的服务。看来，我阅读的大约80％的SO帖子让您在评论中分享了精彩的小知识，这些评论必须来自丰富的经验。非常感谢！

— 布雷特·霍尔曼

519

wait还（可选）使用$来等待进程的PID。您将获得在后台启动的最后一个命令的PID。修改循环以将每个生成的子过程的PID存储到数组中，然后再次循环以等待每个PID。

# run processes and store pids in array
for i in $n_procs; do
    ./procs[${i}] &
    pids[${i}]=$!
done

# wait for all pids
for pid in ${pids[*]}; do
    wait $pid
done

— 卢卡（Luca Tettamanti）
source

9

Weel，因为您将要等待所有进程，所以例如在第一个等待而第二个已经完成（无论如何，第二个都将在下一次迭代中选择）时等待就没关系。这与在C中使用wait（2）使用的方法相同。

— Luca Tettamanti，

7

啊，我看到了-不同的解释:)我读这个问题的意思是“ 任何子进程退出时立即返回退出代码1 ”。

— Alnitak

56

PID可能确实可以重用，但是您不能等待不是当前进程的子进程的进程（在这种情况下，等待失败）。

— tkokoszka，

12

您还可以使用％n引用第n个后台作业，使用%%引用最新的作业。

— conny 2010年

30

@Nils_M：对，对不起。因此，它将类似于：for i in $n_procs; do ./procs[${i}] & ; pids[${i}]=$!; done; for pid in ${pids[*]}; do wait $pid; done;，对吗？

— synack 2014年

284

http://jeremy.zawodny.com/blog/archives/010717.html：

#!/bin/bash

FAIL=0

echo "starting"

./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &

for job in `jobs -p`
do
echo $job
    wait $job || let "FAIL+=1"
done

echo $FAIL

if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi

— 悬停地狱
source

103

jobs -p提供处于执行状态的子流程的PID。如果进程在jobs -p调用之前完成，它将跳过进程。因此，如果任何子流程在之前结束jobs -p，则该流程的退出状态将丢失。

— tkokoszka，2009年

15

哇，这个答案比评分最高的答案好得多。：/

— e40 2012年

4

@ e40，下面的答案可能更好。甚至更好的办法是使用'（cmd; echo“ $？” >>“ $ tmpfile”）运行每个命令，使用此等待，然后为失败读取文件。还要注释输出。…或在您不太在意时使用此脚本。

— HoverHell 2012年

我想补充一点，这个答案要好于接受的答案

— shurikk

2

准确地说，@ tkokoszka jobs -p并未提供子流程的PID，而是GPID。等待逻辑似乎仍然有效，如果存在这样的组，它总是在组上等待，如果没有，则总是在pid上等待，但是要意识到这一点很好。情况的语法会有所不同，具体取决于您具有PID还是GPID。.即kill -- -$GPIDvskill $PID

— Timo

58

这是使用的简单示例wait。

运行一些过程：

$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &

然后使用wait命令等待它们：

$ wait < <(jobs -p)

^{或者只是wait（不带参数）所有。}

这将等待后台的所有作业完成。

^{如果提供了该-n选项，则等待下一个作业终止并返回其退出状态。}

请参阅：help wait和help jobs语法。

但是不利的是，这只会返回最后一个ID的状态，因此您需要检查每个子进程的状态并将其存储在变量中。

或者使您的计算功能在失败时创建一些文件（空白或带有失败日志），然后检查该文件是否存在，例如

$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.

— Kenorb
source

1

对于那些不熟悉bash的人，此处示例中的两个计算为sleep 20 && true和sleep 20 && false-即：用您的函数替换那些计算。要理解&&和||，请运行man bash并键入'/'（搜索），然后键入'^ * Lists'（正则表达式），然后输入：man将向下滚动到&&and 的描述||

— drevicko

1

您可能应该检查文件“失败”在开始时不存在（或删除它）。根据应用程序的不同，在||捕获STDERR失败之前也添加“ 2>＆1”也是一个好主意。

— drevicko

我喜欢这个，有什么缺点吗？实际上，只有当我想列出所有子流程并采取一些措施时，例如。发送信号，表示我将尝试记帐pid或迭代作业。等待完成，只是wait

— xgwang

这将错过在调用作业-p之前失败的作业的退出状态

— Erik Aronesty

50

如果安装了GNU Parallel，则可以执行以下操作：

# If doCalculations is a function
export -f doCalculations
seq 0 9 | parallel doCalculations {}

GNU Parallel将为您提供退出代码：

0-所有作业均正常运行。
1-253-一些作业失败。退出状态给出失败作业的数量
254-超过253个作业失败。
255-其他错误。

观看介绍性视频以了解更多信息：http : //pi.dk/1

— 奥莱·丹吉（Ole Tange）
source

1

谢谢！但是您忘了提及我随后陷入的“混乱”问题：unix.stackexchange.com/a/35953

— nobar

1

这看起来像是一个很棒的工具，但是我认为以上内容在Bash脚本中并不是按原样工作的，而Bash脚本doCalculations是在同一脚本中定义的功能（尽管OP对此要求尚不清楚）。当我尝试时，parallel说/bin/bash: doCalculations: command not found（seq 0 9上面的示例说了10次）。解决方法请参见此处。

— 2013年

3

另一个有趣的地方是：xargs具有通过-P选件并行启动作业的功能。从这里：export -f doCalculations ; seq 0 9 |xargs -P 0 -n 1 -I{} bash -c "doCalculations {}"。xargs在的手册页中列举了的限制parallel。

— 2013年

并且，如果doCalculations依赖于任何其他脚本内部环境变量（custom PATH等），则可能需要export在启动之前显式对其进行编辑parallel。

— nobar

4

@nobar困惑是由于某些打包程序为用户弄乱了东西。如果您使用进行安装wget -O - pi.dk/3 | sh，则不会感到困惑。如果包装商为您搞砸了，我建议您向包装商提出问题。应该将变量和函数导出（export -f）以便GNU Parallel看到它们（请参阅man parallel：gnu.org/software/parallel/…）

— Ole Tange

46

简单地说：

#!/bin/bash

pids=""

for i in `seq 0 9`; do
   doCalculations $i &
   pids="$pids $!"
done

wait $pids

...code continued here ...

更新：

正如多位评论者所指出的那样，以上内容在继续之前等待所有过程完成，但是如果其中一个失败则不会退出并失败，这可以与@ Bryan，@ SamBrightman等建议的以下修改进行关联：

#!/bin/bash

pids=""
RESULT=0


for i in `seq 0 9`; do
   doCalculations $i &
   pids="$pids $!"
done

for pid in $pids; do
    wait $pid || let "RESULT=1"
done

if [ "$RESULT" == "1" ];
    then
       exit 1
fi

...code continued here ...

— patapouf_ai
source

1

根据等待手册页，具有多个PID的等待仅返回等待的最后一个进程的返回值。因此，您确实需要一个额外的循环并按照接受的答案（在注释中）的建议分别等待每个PID。

— 弗拉德·弗罗洛夫

1

因为它似乎并没有被别的此页面上的任何地方说，我会补充说，循环会for pid in $pids; do wait $pid; done

— 布莱恩

1

@bisounours_tronconneuse是的，您知道。请参阅help wait-具有多个ID，仅wait返回最后一个的退出代码，如@ vlad-frolov所述。

— Sam Brightman

1

布莱恩，@ SamBrightman好。我用您的建议对其进行了修改。

— patapouf_ai 2016年

4

我对此解决方案有一个明显的担忧：如果给定进程在wait调用相应进程之前退出，该怎么办？事实证明，这不是问题：如果您wait已经退出了某个进程，wait则将立即以已退出进程的状态退出。（谢谢您，bash作者！）

— Daniel Griscom，

39

到目前为止，这是我想出的。我想看看如果一个孩子终止了如何中断睡眠命令，这样就不必调整孩子WAITALL_DELAY的用法了。

waitall() { # PID...
  ## Wait for children to exit and indicate whether all exited with 0 status.
  local errors=0
  while :; do
    debug "Processes remaining: $*"
    for pid in "$@"; do
      shift
      if kill -0 "$pid" 2>/dev/null; then
        debug "$pid is still alive."
        set -- "$@" "$pid"
      elif wait "$pid"; then
        debug "$pid exited with zero exit status."
      else
        debug "$pid exited with non-zero exit status."
        ((++errors))
      fi
    done
    (("$#" > 0)) || break
    # TODO: how to interrupt this sleep when a child terminates?
    sleep ${WAITALL_DELAY:-1}
   done
  ((errors == 0))
}

debug() { echo "DEBUG: $*" >&2; }

pids=""
for t in 3 5 4; do 
  sleep "$t" &
  pids="$pids $!"
done
waitall $pids

— 马克·埃德加
source

可能会跳过该WAITALL_DELAY或将其设置得很低，因为在循环内没有启动任何进程，我认为这不会太昂贵。

— 玛丽安（Marian）2010年

21

为了并行化...

for i in $(whatever_list) ; do
   do_something $i
done

翻译成这个...

for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
   (
   export -f do_something ## export functions (if needed)
   export PATH ## export any variables that are required
   xargs -I{} --max-procs 0 bash -c ' ## process in batches...
      {
      echo "processing {}" ## optional
      do_something {}
      }' 
   )

如果一个进程中发生错误，则不会中断其他进程，但会导致整个序列的退出代码为非零。
在任何特定情况下，导出函数和变量可能是必需的，也可能不是。
您可以--max-procs根据所需的并行度进行设置（0意味着“一次全部”）。
GNU并行当使用代替xargs-时，它提供了一些附加功能，但是默认情况下并不总是安装它。
for在此示例中，循环不是严格必需的，因为echo $i基本上只是重新生成了输出$(whatever_list。我只是认为for关键字的使用使查看情况变得容易一些。
Bash字符串处理可能令人困惑-我发现使用单引号最适合包装非平凡的脚本。
您可以轻松地中断整个操作（使用^ C或类似方法），这与更直接的Bash并行性方法不同。

这是一个简化的工作示例...

for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
   {
   echo sleep {}
   sleep 2s
   }'

— 诺巴
source

对于--max-procs：如何从命令行获取Linux中的CPU /内核数？

— nobar

7

我认为Bash的内置功能不可能实现。

当孩子退出时，您会收到通知：

#!/bin/sh
set -o monitor        # enable script job control
trap 'echo "child died"' CHLD

但是，没有明显的方法可以在信号处理程序中获取孩子的退出状态。

wait在较低级别的POSIX API中，获得该子状态通常是功能族的工作。不幸的是，Bash对此的支持是有限的-您可以等待一个特定的子进程（并获取其退出状态），也可以等待所有这些子进程，始终得到0结果。

似乎无法做的等效于waitpid(-1)，它将阻塞，直到任何子进程返回。

— 阿尼塔克
source

7

我在这里看到了很多很好的例子，也想加入我的例子。

#! /bin/bash

items="1 2 3 4 5 6"
pids=""

for item in $items; do
    sleep $item &
    pids+="$! "
done

for pid in $pids; do
    wait $pid
    if [ $? -eq 0 ]; then
        echo "SUCCESS - Job $pid exited with a status of $?"
    else
        echo "FAILED - Job $pid exited with a status of $?"
    fi
done

我使用与并行启动/停止服务器/服务非常相似的方法，并检查每个退出状态。对我来说很棒。希望这可以帮助某人！

— 杰森·斯洛博茨基（Jason Slobotski）
source

当我用Ctrl + CI停止它时，仍然看到进程在后台运行。

— karsten

2

@karsten-这是一个不同的问题。假设您使用的是bash，则可以捕获退出条件（包括Ctrl + C），并使用以下命令杀死当前进程和所有子进程trap "kill 0" EXIT

— Phil

@菲尔是正确的。由于这些是后台进程，因此杀死父进程只会使所有子进程处于运行状态。我的示例没有捕获任何信号，如Phil所述，可以在必要时添加任何信号。

— 詹森·斯洛伯茨基

6

这是我使用的东西：

#wait for jobs
for job in `jobs -p`; do wait ${job}; done

— Jplozier
source

5

以下代码将等待所有计算完成，如果doCalculations中的任何一个失败，则返回退出状态1 。

#!/bin/bash
for i in $(seq 0 9); do
   (doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1

— 错误
source

5

只需将结果存储在外壳程序之外，例如存储在文件中。

#!/bin/bash
tmp=/tmp/results

: > $tmp  #clean the file

for i in `seq 0 9`; do
  (doCalculations $i; echo $i:$?>>$tmp)&
done      #iterate

wait      #wait until all ready

sort $tmp | grep -v ':0'  #... handle as required

— 埃斯塔尼
source

5

这是我的适用于多个pid的版本，如果执行时间过长则记录警告，如果执行时间长于给定值，则停止子进程。

function WaitForTaskCompletion {
    local pids="${1}" # pids to wait for, separated by semi-colon
    local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
    local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
    local caller_name="${4}" # Who called this function
    local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors       

    Logger "${FUNCNAME[0]} called by [$caller_name]."

    local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once 
    local log_ttime=0 # local time instance for comparaison

    local seconds_begin=$SECONDS # Seconds since the beginning of the script
    local exec_time=0 # Seconds since the beginning of this function

    local retval=0 # return value of monitored pid process
    local errorcount=0 # Number of pids that finished with errors

    local pidCount # number of given pids

    IFS=';' read -a pidsArray <<< "$pids"
    pidCount=${#pidsArray[@]}

    while [ ${#pidsArray[@]} -gt 0 ]; do
        newPidsArray=()
        for pid in "${pidsArray[@]}"; do
            if kill -0 $pid > /dev/null 2>&1; then
                newPidsArray+=($pid)
            else
                wait $pid
                result=$?
                if [ $result -ne 0 ]; then
                    errorcount=$((errorcount+1))
                    Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
                fi
            fi
        done

        ## Log a standby message every hour
        exec_time=$(($SECONDS - $seconds_begin))
        if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
            if [ $log_ttime -ne $exec_time ]; then
                log_ttime=$exec_time
                Logger "Current tasks still running with pids [${pidsArray[@]}]."
            fi
        fi

        if [ $exec_time -gt $soft_max_time ]; then
            if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
                Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]."
                soft_alert=1
                SendAlert

            fi
            if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
                Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]. Stopping task execution."
                kill -SIGTERM $pid
                if [ $? == 0 ]; then
                    Logger "Task stopped successfully"
                else
                    errrorcount=$((errorcount+1))
                fi
            fi
        fi

        pidsArray=("${newPidsArray[@]}")
        sleep 1
    done

    Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
    if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
        Logger "Stopping execution."
        exit 1337
    else
        return $errorcount
    fi
}

# Just a plain stupid logging function to replace with yours
function Logger {
    local value="${1}"

    echo $value
}

例如，等待所有三个进程完成，如果执行花费的时间超过5秒，则记录警告，如果执行花费的时间超过120秒，则停止所有进程。不要在失败时退出程序。

function something {

    sleep 10 &
    pids="$!"
    sleep 12 &
    pids="$pids;$!"
    sleep 9 &
    pids="$pids;$!"

    WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting

— 奥西里斯·德·琼
source

4

如果您有bash 4.2或更高版本，以下内容可能对您有用。它使用关联数组存储任务名称及其“代码”以及任务名称及其pid。我还构建了一个简单的速率限制方法，如果您的任务占用大量CPU或I / O时间，并且您想限制并发任务的数量，该方法可能会派上用场。

该脚本在第一个循环中启动所有任务，并在第二个循环中使用结果。

对于简单的情况，这有点过分了，但是它可以提供相当整洁的东西。例如，可以将每个任务的错误消息存储在另一个关联数组中，并在一切解决后将它们打印出来。

#! /bin/bash

main () {
    local -A pids=()
    local -A tasks=([task1]="echo 1"
                    [task2]="echo 2"
                    [task3]="echo 3"
                    [task4]="false"
                    [task5]="echo 5"
                    [task6]="false")
    local max_concurrent_tasks=2

    for key in "${!tasks[@]}"; do
        while [ $(jobs 2>&1 | grep -c Running) -ge "$max_concurrent_tasks" ]; do
            sleep 1 # gnu sleep allows floating point here...
        done
        ${tasks[$key]} &
        pids+=(["$key"]="$!")
    done

    errors=0
    for key in "${!tasks[@]}"; do
        pid=${pids[$key]}
        local cur_ret=0
        if [ -z "$pid" ]; then
            echo "No Job ID known for the $key process" # should never happen
            cur_ret=1
        else
            wait $pid
            cur_ret=$?
        fi
        if [ "$cur_ret" -ne 0 ]; then
            errors=$(($errors + 1))
            echo "$key (${tasks[$key]}) failed."
        fi
    done

    return $errors
}

main

— Stefanct
source

4

我刚刚在修改脚本，以使进程后台和并行化。

我进行了一些试验（在Solaris上同时使用bash和ksh），发现'wait'如果不为0则输出退出状态，或者当没有提供PID参数时返回非零退出的作业列表。例如

重击：

$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]-  Exit 2                  sleep 20 && exit 2
[2]+  Exit 1                  sleep 10 && exit 1

Ksh：

$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]+  Done(2)                  sleep 20 && exit 2
[2]+  Done(1)                  sleep 10 && exit 1

此输出被写入stderr，因此对OP示例的简单解决方案可以是：

#!/bin/bash

trap "rm -f /tmp/x.$$" EXIT

for i in `seq 0 9`; do
  doCalculations $i &
done

wait 2> /tmp/x.$$
if [ `wc -l /tmp/x.$$` -gt 0 ] ; then
  exit 1
fi

虽然这样：

wait 2> >(wc -l)

也会返回一个计数，但没有tmp文件。也可以通过这种方式使用它，例如：

wait 2> >(if [ `wc -l` -gt 0 ] ; then echo "ERROR"; fi)

但这并没有比tmp文件IMO有用得多。我找不到避免tmp文件的有用方法，同时也避免在子外壳程序中运行“等待”，这根本行不通。

— 胡说
source

3

我已经尝试过了，并结合了其他示例中的所有最佳部分。该脚本将checkpids在任何后台进程退出时执行该功能，并输出退出状态而无需进行轮询。

#!/bin/bash

set -o monitor

sleep 2 &
sleep 4 && exit 1 &
sleep 6 &

pids=`jobs -p`

checkpids() {
    for pid in $pids; do
        if kill -0 $pid 2>/dev/null; then
            echo $pid is still alive.
        elif wait $pid; then
            echo $pid exited with zero exit status.
        else
            echo $pid exited with non-zero exit status.
        fi
    done
    echo
}

trap checkpids CHLD

wait

— 迈克尔特
source

3

#!/bin/bash
set -m
for i in `seq 0 9`; do
  doCalculations $i &
done
while fg; do true; done

set -m 允许您在脚本中使用fg＆bg
fg，除了将最后一个进程放到前台之外，其退出状态与其前台的进程相同
while fg当fg退出状态为非零的退出时将停止循环

不幸的是，当后台进程退出且退出状态为非零时，将无法处理这种情况。（循环不会立即终止。它将等待之前的过程完成。）

— 杰恩
source

3

这里已经有很多答案，但是令我惊讶的是，似乎没有人建议使用数组...所以这就是我所做的-这可能对将来的某些人有用。

n=10 # run 10 jobs
c=0
PIDS=()

while true

    my_function_or_command &
    PID=$!
    echo "Launched job as PID=$PID"
    PIDS+=($PID)

    (( c+=1 ))

    # required to prevent any exit due to error
    # caused by additional commands run which you
    # may add when modifying this example
    true

do

    if (( c < n ))
    then
        continue
    else
        break
    fi
done 


# collect launched jobs

for pid in "${PIDS[@]}"
do
    wait $pid || echo "failed job PID=$pid"
done

— 用户名
source

3

如果不比@HoverHell的答案更好，那么它的效果应该很好！

#!/usr/bin/env bash

set -m # allow for job control
EXIT_CODE=0;  # exit code of overall script

function foo() {
     echo "CHLD exit code is $1"
     echo "CHLD pid is $2"
     echo $(jobs -l)

     for job in `jobs -p`; do
         echo "PID => ${job}"
         wait ${job} ||  echo "At least one test failed with exit code => $?" ; EXIT_CODE=1
     done
}

trap 'foo $? $$' CHLD

DIRN=$(dirname "$0");

commands=(
    "{ echo "foo" && exit 4; }"
    "{ echo "bar" && exit 3; }"
    "{ echo "baz" && exit 5; }"
)

clen=`expr "${#commands[@]}" - 1` # get length of commands - 1

for i in `seq 0 "$clen"`; do
    (echo "${commands[$i]}" | bash) &   # run the command via bash in subshell
    echo "$i ith command has been issued as a background job"
done

# wait for all to finish
wait;

echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"

# end

当然，我在NPM项目中使该脚本永生化，该项目使您可以并行运行bash命令，这对于测试非常有用：

https://github.com/ORESoftware/generic-subshell

— 亚历山大·米尔斯
source

trap $? $$似乎每次都将退出代码设置为0，将PID设置为当前正在运行的bash shell

— inetknght

您对此绝对确定吗？不知道这是否有意义。

— 亚历山大·米尔斯

2

陷阱是你的朋友。您可以在许多系统中捕获ERR。您可以捕获EXIT或在DEBUG上捕获每个命令后执行一段代码。

除了所有标准信号之外。

— 保罗·霍奇斯
source

1

请您列举一些例子来详细说明您的答案。

— ϹοδεMεδιϲ 19/12/17

2

set -e
fail () {
    touch .failure
}
expect () {
    wait
    if [ -f .failure ]; then
        rm -f .failure
        exit 1
    fi
}

sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail
expect

的 set -e在顶部，使失败脚本停止。

expect1如果任何子作业失败，将返回。

— 矢城
source

2

正是出于这个目的，我编写了一个bash名为的函数:for。

注意：:for不仅保留并返回失败函数的退出代码，而且终止所有并行运行的实例。在这种情况下可能不需要。

#!/usr/bin/env bash

# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
#
# usage:
# :wait 123 32
function :wait(){
    local pids=("$@")
    [ ${#pids} -eq 0 ] && return $?

    trap 'kill -INT "${pids[@]}" &>/dev/null || true; trap - INT' INT
    trap 'kill -TERM "${pids[@]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERM

    for pid in "${pids[@]}"; do
        wait "${pid}" || return $?
    done

    trap - INT RETURN TERM
}

# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
#
# usage:
# :for func 1 2 3
#
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){
    local f="${1}" && shift

    local i=0
    local pids=()
    for arg in "$@"; do
        ( ${f} "${arg}" ) &
        pids+=("$!")
        if [ ! -z ${FOR_PARALLEL+x} ]; then
            (( i=(i+1)%${FOR_PARALLEL} ))
            if (( i==0 )) ;then
                :wait "${pids[@]}" || return $?
                pids=()
            fi
        fi
    done && [ ${#pids} -eq 0 ] || :wait "${pids[@]}" || return $?
}

用法

for.sh：

#!/usr/bin/env bash
set -e

# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)

msg="You should see this three times"

:(){
  i="${1}" && shift

  echo "${msg}"

  sleep 1
  if   [ "$i" == "1" ]; then sleep 1
  elif [ "$i" == "2" ]; then false
  elif [ "$i" == "3" ]; then
    sleep 3
    echo "You should never see this"
  fi
} && :for : 1 2 3 || exit $?

echo "You should never see this"

$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times
1

参考文献

[1]：博客
[2]：要点

— 恩特
source

1

我最近使用了这个（感谢Alnitak）：

#!/bin/bash
# activate child monitoring
set -o monitor

# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!

# count, and kill when all done
c=0
function kill_on_count() {
    # you could kill on whatever criterion you wish for
    # I just counted to simulate bash's wait with no args
    [ $c -eq 9 ] && kill $pid
    c=$((c+1))
    echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD

function save_status() {
    local i=$1;
    local rc=$2;
    # do whatever, and here you know which one stopped
    # but remember, you're called from a subshell
    # so vars have their values at fork time
}

# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
    (doCalculations $i; save_status $i $?) &
done

# wait for locking subprocess to be killed
wait $pid
echo

从那里可以轻松推断出并触发（触摸文件，发送信号）并更改计数标准（触摸文件的计数或其他方式）以响应该触发。或者，如果您只是想要“任何”非零的rc，只需杀死save_status的锁。

— 洛基
source

1

我需要这个，但是目标进程不是当前shell的子进程，在这种情况下wait $PID不起作用。我确实找到了以下替代方法：

while [ -e /proc/$PID ]; do sleep 0.1 ; done

这取决于procfs的存在，而procfs可能不可用（例如Mac不提供）。因此，为了可移植性，您可以改用以下方法：

while ps -p $PID >/dev/null ; do sleep 0.1 ; done

— Troelskn
source

1

捕获CHLD信号可能不起作用，因为如果同时到达它们，您可能会丢失一些信号。

#!/bin/bash

trap 'rm -f $tmpfile' EXIT

tmpfile=$(mktemp)

doCalculations() {
    echo start job $i...
    sleep $((RANDOM % 5)) 
    echo ...end job $i
    exit $((RANDOM % 10))
}

number_of_jobs=10

for i in $( seq 1 $number_of_jobs )
do
    ( trap "echo job$i : exit value : \$? >> $tmpfile" EXIT; doCalculations ) &
done

wait 

i=0
while read res; do
    echo "$res"
    let i++
done < "$tmpfile"

echo $i jobs done !!!

— 杯子896
source

1

等待多个子流程并在其中任何一个以非零状态代码退出时退出的解决方案是使用“ wait -n”

#!/bin/bash
wait_for_pids()
{
    for (( i = 1; i <= $#; i++ )) do
        wait -n $@
        status=$?
        echo "received status: "$status
        if [ $status -ne 0 ] && [ $status -ne 127 ]; then
            exit 1
        fi
    done
}

sleep_for_10()
{
    sleep 10
    exit 10
}

sleep_for_20()
{
    sleep 20
}

sleep_for_10 &
pid1=$!

sleep_for_20 &
pid2=$!

wait_for_pids $pid2 $pid1

状态代码“ 127”用于不存在的进程，这意味着孩子可能已经退出。

— 视觉艺术
source

1

等待所有作业，然后返回上一个失败的作业的退出代码。与上述解决方案不同，这不需要pid保存。刚走开，然后等待。

function wait_ex {
    # this waits for all jobs and returns the exit code of the last failing job
    ecode=0
    while true; do
        wait -n
        err="$?"
        [ "$err" == "127" ] && break
        [ "$err" != "0" ] && ecode="$err"
    done
    return $ecode
}

— 埃里克·阿隆斯蒂
source

除非碰巧是“找不到命令”（代码127），否则这将起作用并可靠地从您执行的命令中给出第一个错误代码。

— drevicko

0

在某些情况下，该过程可能会在等待该过程之前完成。如果我们触发等待已经完成的进程，它将触发错误，例如pid不是此外壳的子级。为了避免这种情况，可以使用以下函数查找该过程是否完成：

isProcessComplete(){
PID=$1
while [ -e /proc/$PID ]
do
    echo "Process: $PID is still running"
    sleep 5
done
echo "Process $PID has finished"
}

— 安朱·普拉桑南（Anju Prasannan）
source

0

我认为并行运行作业并检查状态的最直接方法是使用临时文件。已经有几个类似的答案（例如Nietzche-jou和mug896）。

#!/bin/bash
rm -f fail
for i in `seq 0 9`; do
  doCalculations $i || touch fail &
done
wait 
! [ -f fail ]

上面的代码不是线程安全的。如果您担心上面的代码将与其本身同时运行，则最好使用更唯一的文件名，例如fail。$$。最后一行是满足要求的：“当任何子进程以代码！= 0结尾时，返回退出代码1”。我在那里提出了一个额外的要求进行清理。这样写可能更清楚了：

#!/bin/bash
trap 'rm -f fail.$$' EXIT
for i in `seq 0 9`; do
  doCalculations $i || touch fail.$$ &
done
wait 
! [ -f fail.$$ ]

这是用于收集多个作业的结果的类似代码段：创建一个临时目录，将所有子任务的输出记录在一个单独的文件中，然后将其转储以进行检查。这与问题不完全匹配-我将其作为奖励：

#!/bin/bash
trap 'rm -fr $WORK' EXIT

WORK=/tmp/$$.work
mkdir -p $WORK
cd $WORK

for i in `seq 0 9`; do
  doCalculations $i >$i.result &
done
wait 
grep $ *  # display the results with filenames and contents

— 标记
source

0

我几乎陷入了使用jobs -p收集PID 的陷阱，如果孩子已经退出，这将不起作用，如下面的脚本所示。我选择的解决方案只是打了wait -nN次电话，其中N是我所拥有的孩子的数量，而我恰好是确定性地知道。

#!/usr/bin/env bash

sleeper() {
    echo "Sleeper $1"
    sleep $2
    echo "Exiting $1"
    return $3
}

start_sleepers() {
    sleeper 1 1 0 &
    sleeper 2 2 $1 &
    sleeper 3 5 0 &
    sleeper 4 6 0 &
    sleep 4
}

echo "Using jobs"
start_sleepers 1

pids=( $(jobs -p) )

echo "PIDS: ${pids[*]}"

for pid in "${pids[@]}"; do
    wait "$pid"
    echo "Exit code $?"
done

echo "Clearing other children"
wait -n; echo "Exit code $?"
wait -n; echo "Exit code $?"

echo "Waiting for N processes"
start_sleepers 2

for ignored in $(seq 1 4); do
    wait -n
    echo "Exit code $?"
done

输出：

Using jobs
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
PIDS: 56496 56497
Exiting 3
Exit code 0
Exiting 4
Exit code 0
Clearing other children
Exit code 0
Exit code 1
Waiting for N processes
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
Exit code 0
Exit code 2
Exiting 3
Exit code 0
Exiting 4
Exit code 0

— 丹尼尔·C·索布拉尔
source