如何监视glusterfs音量

12

Glusterfs虽然是一个不错的分布式文件系统，但几乎无法提供监视其完整性的方法。服务器可以来来去去，砖块可能会陈旧或发生故障，并且我怕在为时已晚时才知道这一点。

最近，当一切似乎都正常运行时，我们遇到了一个奇怪的失败，但是一堆砖从体积中掉了下来（纯粹出于巧合）。

是否有一种简单可靠的方法（cron脚本？）可以让我知道我的GlusterFS 3.2卷的运行状况？

monitoring glusterfs

— 阿里·斯科里亚鲁克（Arie Skliarouk）
source

现在，我们使用基于脏脚本的监视程序：check_gluster.sh

— Arie Skliarouk 2011年

看看glfs-health.sh。

— 量子

1

我检查了glfs-health.sh，它看起来像是对glusterfs的旧版本，它们是受配置文件控制的。我将澄清代表glusterfs 3.2的问题。

— 2011年

3

一段时间以来，这一直是对GlusterFS开发人员的要求，您没有可以使用的即用型解决方案。但是，使用一些脚本并不是没有可能。

几乎整个Gluster系统都由一个gluster命令管理，并且有几个选项，您可以编写自己的健康监控脚本。有关砖块和体积的列表信息，请参见此处-http://gluster.org/community/documentation/index.php/Gluster_3.2: _Displaying_Volume_Information

要监视性能，请查看此链接-http://gluster.org/community/documentation/index.php/Gluster_3.2:_Monitoring_your_GlusterFS_Workload

更新：请考虑升级到http://gluster.org/community/documentation/index.php/About_GlusterFS_3.3

始终使用最新版本会更好，因为它们似乎具有更多的错误修复且受到良好的支持。当然，在升级到新版本之前，请运行您自己的测试-http : //vbellur.wordpress.com/2012/05/31/upgrading-to-glusterfs-3-3/ :)

有一个管理指南与特定部分用于监视GlusterFS在第10章3.3安装- http://www.gluster.org/wp-content/uploads/2012/05/Gluster_File_System-3.3.0-Administration_Guide-en-US .pdf

请参阅此处以获取其他nagios脚本-http: //code.google.com/p/glusterfs-status/

— 千田
source

谢谢Chida，我想让我感到困惑的是，有些人（github.com/semiosis/puppet-gluster）正在通过proc表（'--with-brick'等）和日志文件（egrep'E'）监视gluster。错误），有些人正在使用CLI，但我不知道哪个更可能准确报告gluster的状态。

— r_2

我建议使用CLI，因为那是GlusterFS推荐的一个，而且注定是最新的。

— Chida 2012年

2

有一个nagios插件可用于监视。不过，您可能需要为您的版本进行编辑。

— 尚丹克
source

2

请检查https://www.gluster.org/pipermail/gluster-users/2012-June/010709.html随附的脚本，以获取gluster 3.3；它可能很容易适应gluster 3.2。

#!/bin/bash

# This Nagios script was written against version 3.3 of Gluster.  Older
# versions will most likely not work at all with this monitoring script.
#
# Gluster currently requires elevated permissions to do anything.  In order to
# accommodate this, you need to allow your Nagios user some additional
# permissions via sudo.  The line you want to add will look something like the
# following in /etc/sudoers (or something equivalent):
#
# Defaults:nagios !requiretty
# nagios ALL=(root) NOPASSWD:/usr/sbin/gluster peer status,/usr/sbin/gluster volume list,/usr/sbin/gluster volume heal [[\:graph\:]]* info
#
# That should give us all the access we need to check the status of any
# currently defined peers and volumes.

# define some variables
ME=$(basename -- $0)
SUDO="/usr/bin/sudo"
PIDOF="/sbin/pidof"
GLUSTER="/usr/sbin/gluster"
PEERSTATUS="peer status"
VOLLIST="volume list"
VOLHEAL1="volume heal"
VOLHEAL2="info"
peererror=
volerror=

# check for commands
for cmd in $SUDO $PIDOF $GLUSTER; do
    if [ ! -x "$cmd" ]; then
        echo "$ME UNKNOWN - $cmd not found"
        exit 3
    fi
done

# check for glusterd (management daemon)
if ! $PIDOF glusterd &>/dev/null; then
    echo "$ME CRITICAL - glusterd management daemon not running"
    exit 2
fi

# check for glusterfsd (brick daemon)
if ! $PIDOF glusterfsd &>/dev/null; then
    echo "$ME CRITICAL - glusterfsd brick daemon not running"
    exit 2
fi

# get peer status
peerstatus="peers: "
for peer in $(sudo $GLUSTER $PEERSTATUS | grep '^Hostname: ' | awk '{print $2}'); do
    state=
    state=$(sudo $GLUSTER $PEERSTATUS | grep -A 2 "^Hostname: $peer$" | grep '^State: ' | sed -nre 's/.* \(([[:graph:]]+)\)$/\1/p')
    if [ "$state" != "Connected" ]; then
        peererror=1
    fi
    peerstatus+="$peer/$state "
done

# get volume status
volstatus="volumes: "
for vol in $(sudo $GLUSTER $VOLLIST); do
    thisvolerror=0
    entries=
    for entries in $(sudo $GLUSTER $VOLHEAL1 $vol $VOLHEAL2 | grep '^Number of entries: ' | awk '{print $4}'); do
        if [ "$entries" -gt 0 ]; then
            volerror=1
            let $((thisvolerror+=entries))
        fi
    done
    volstatus+="$vol/$thisvolerror unsynchronized entries "
done

# drop extra space
peerstatus=${peerstatus:0:${#peerstatus}-1}
volstatus=${volstatus:0:${#volstatus}-1}

# set status according to whether any errors occurred
if [ "$peererror" ] || [ "$volerror" ]; then
    status="CRITICAL"
else
    status="OK"
fi

# actual Nagios output
echo "$ME $status $peerstatus $volstatus"

# exit with appropriate value
if [ "$peererror" ] || [ "$volerror" ]; then
    exit 2
else
    exit 0
fi

— S19N
source

1

我能够为glusterfs配置nagios监视，如下所述：

http://gopukrish.wordpress.com/2014/11/16/monitor-glusterfs-using-nagios-plugin/

— 用户173141
source

1

由于链接会随着时间的流逝而失效，因此，如果您可以在ServerFault上包含答案的实质，我们将更愿意这样做。

— Ladadadada 2014年

1

@Arie Skliarouk，您check_gluster.sh有错字-在最后一行，您使用grep exitst代替exist。我继续进行了改写，使其更加紧凑，并删除了对临时文件的要求。

#!/bin/bash

# Ensure that all peers are connected
gluster peer status | grep -q Disconnected && echo "Peer disconnected." && exit 1

# Ensure that all bricks have a running log file (i.e., are sending/receiving)
for vol in $(gluster volume list); do
  for brick in $(gluster volume info "$vol" | awk '/^Brick[0-9]*:/ {print $2}'); do
    gluster volume log locate "$vol" "$brick";
  done;
done |
 grep -qE "does not (exist|exitst)" &&
 echo "Log file missing - $vol/$brick ." &&
 exit 1

— 班丹
source

1

日志中写着“ exitst”错字。我不购买“紧凑”的优势-当行超载时，脚本很难理解。临时文件是为易于理解的代码付出的便宜价格。

— Arie Skliarouk

@ArieSkliarouk：已更新，以涵盖这两种情况，但请注意，相关消息已于2011年11月删除；参见git.gluster.org/…。因此，这可能不适用于更新的Gluster。如果您发现较短的代码更难理解，那很好，但是它比使用临时文件强得多，因此请考虑对其进行重构以提高可读性，而不是因为感觉不到该属性而将其解雇。

— BMDan 2013年

1

一位匿名编辑指出，gluster volume info | awk ...可以缩写为gluster volume list。

— Lekensteyn '16