确定PostgreSQL数据库的最新更改时间


10

我正在查看如何更改备份方式,并且想知道是否有一种方法可以确定postgreql集群中的哪些数据库最近没有更改?

我不想使用pg_dumpall,而是想使用pg_dump并仅转储自上次备份以来已更改的数据库(某些数据库不会经常更新),其想法是,如果没有任何更改,则当前备份应还是不错。

有谁知道一种确定特定数据库上次更新/更改时间的方法?

谢谢...

更新:

我希望不必在所有地方编写触发器,因为我无法控制一个特定集群中数据库的创建(更不用说在数据库中创建db对象了)。

进一步挖掘,似乎$ PGDATA / global / pg_database文件的内容(特别是第二个字段)与$ PGDATA / base下的目录名称之间存在关联。

一路走来,我猜想pg_database文件的第二个字段是数据库oid,每个数据库在$ PGDATA / base下都有自己的子目录(子目录名带有oid)。那是对的吗?如果是这样,将$ PGDATA / base / *下文件中的文件时间戳用作触发备份是否合理?

...或者,还有更好的方法?

再次感谢...



永远不要以为当前备份是好的。您始终希望按常规时间表进行新备份。
mrdenny

Sonu Singh-我无法控制数据库的添加,更不用说向该集群添加表了,因此触发器将不起作用-加(据我所知)触发器将无法捕获ddl更改。mrdenny♦-正确。但是,我想避免在定期完整备份之间生成冗余增量备份。

Answers:


9

虽然select datname, xact_commit from pg_stat_database;按@Jack Douglas的建议使用时效果不佳(显然是由于自动真空),select datname, tup_inserted, tup_updated, tup_deleted from pg_stat_database但确实可以使用。DML和DDL更改都会更改tup_ *列的值,而a vacuum不会(vacuum analyze另一方面...)。

如果这可能对其他人有用,那么我会包括已经放置的备份脚本。这适用于Pg 8.4.x,但不适用于8.2.x-YMMV,具体取决于所用Pg的版本。

#!/usr/bin/env perl
=head1 Synopsis

pg_backup -- selectively backup a postgresql database cluster

=head1 Description

Perform backups (pg_dump*) of postgresql databases in a cluster on an
as needed basis.

For some database clusters, there may be databases that are:

 a. rarely updated/changed and therefore shouldn't require dumping as 
    often as those databases that are frequently changed/updated.

 b. are large enough that dumping them without need is undesirable.

The global data is always dumped without regard to whether any 
individual databses need backing up or not.

=head1 Usage

pg_backup [OPTION]...

General options:

  -F, --format=c|t|p    output file format for data dumps 
                          (custom, tar, plain text) (default is custom)
  -a, --all             backup (pg_dump) all databases in the cluster 
                          (default is to only pg_dump databases that have
                          changed since the last backup)
  --backup-dir          directory to place backup files in 
                          (default is ./backups)
  -v, --verbose         verbose mode
  --help                show this help, then exit

Connection options:

  -h, --host=HOSTNAME   database server host or socket directory
  -p, --port=PORT       database server port number
  -U, --username=NAME   connect as specified database user
  -d, --database=NAME   connect to database name for global data

=head1 Notes

This utility has been developed against PostgreSQL version 8.4.x. Older 
versions of PostgreSQL may not work.

`vacuum` does not appear to trigger a backup unless there is actually 
something to vacuum whereas `vacuum analyze` appears to always trigger a 
backup.

=head1 Copyright and License

Copyright (C) 2011 by Gregory Siems

This library is free software; you can redistribute it and/or modify it 
under the same terms as PostgreSQL itself, either PostgreSQL version 
8.4 or, at your option, any later version of PostgreSQL you may have 
available.

=cut

use strict;
use warnings;
use Getopt::Long;
use Data::Dumper;
use POSIX qw(strftime);

my %opts = get_options();

my $connect_options = '';
$connect_options .= "--$_=$opts{$_} " for (qw(username host port));

my $shared_dump_args = ($opts{verbose})
    ? $connect_options . ' --verbose '
    : $connect_options;

my $backup_prefix = (exists $opts{host} && $opts{host} ne 'localhost')
    ? $opts{backup_dir} . '/' . $opts{host} . '-'
    : $opts{backup_dir} . '/';

do_main();


########################################################################
sub do_main {
    backup_globals();

    my $last_stats_file = $backup_prefix . 'last_stats';

    # get the previous pg_stat_database data
    my %last_stats;
    if ( -f $last_stats_file) {
        %last_stats = parse_stats (split "\n", slurp_file ($last_stats_file));
    }

    # get the current pg_stat_database data
    my $cmd = 'psql ' . $connect_options;
    $cmd .= " $opts{database} " if (exists $opts{database});
    $cmd .= "-Atc \"
        select date_trunc('minute', now()), datid, datname, 
            xact_commit, tup_inserted, tup_updated, tup_deleted 
        from pg_stat_database 
        where datname not in ('template0','template1','postgres'); \"";
    $cmd =~ s/\ns+/ /g;
    my @stats = `$cmd`;
    my %curr_stats = parse_stats (@stats);

    # do a backup if needed
    foreach my $datname (sort keys %curr_stats) {
        my $needs_backup = 0;
        if ($opts{all}) {
            $needs_backup = 1;
        }
        elsif ( ! exists $last_stats{$datname} ) {
            $needs_backup = 1;
            warn "no last stats for $datname\n" if ($opts{debug});
        }
        else {
            for (qw (tup_inserted tup_updated tup_deleted)) {
                if ($last_stats{$datname}{$_} != $curr_stats{$datname}{$_}) {
                    $needs_backup = 1;
                    warn "$_ stats do not match for $datname\n" if ($opts{debug});
                }
            }
        }
        if ($needs_backup) {
            backup_db ($datname);
        }
        else {
            chitchat ("Database \"$datname\" does not currently require backing up.");
        }
    }

    # update the pg_stat_database data
    open my $fh, '>', $last_stats_file || die "Could not open $last_stats_file for output. !$\n";
    print $fh @stats;
    close $fh;
}

sub parse_stats {
    my @in = @_;
    my %stats;
    chomp @in;
    foreach my $line (@in) {
        my @ary = split /\|/, $line;
        my $datname = $ary[2];
        next unless ($datname);
        foreach my $key (qw(tmsp datid datname xact_commit tup_inserted tup_updated tup_deleted)) {
            my $val = shift @ary;
            $stats{$datname}{$key} = $val;
        }
    }
    return %stats;
}

sub backup_globals {
    chitchat ("Backing up the global data.");

    my $backup_file = $backup_prefix . 'globals-only.backup.gz';
    my $cmd = 'pg_dumpall --globals-only ' . $shared_dump_args;
    $cmd .= " --database=$opts{database} " if (exists $opts{database});

    do_dump ($backup_file, "$cmd | gzip");
}

sub backup_db {
    my $database = shift;
    chitchat ("Backing up database \"$database\".");

    my $backup_file = $backup_prefix . $database . '-schema-only.backup.gz';
    do_dump ($backup_file, "pg_dump --schema-only --create --format=plain $shared_dump_args $database | gzip");

    $backup_file = $backup_prefix . $database . '.backup';
    do_dump ($backup_file, "pg_dump --format=". $opts{format} . " $shared_dump_args $database");
}

sub do_dump {
    my ($backup_file, $cmd) = @_;

    my $temp_file = $backup_file . '.new';
    warn "Command is: $cmd > $temp_file" if ($opts{debug});

    chitchat (`$cmd > $temp_file`);
    if ( -f $temp_file ) {
        chitchat (`mv $temp_file $backup_file`);
    }
}

sub chitchat {
    my @ary = @_;
    return unless (@ary);
    chomp @ary;
    my $first   = shift @ary;
    my $now     = strftime "%Y%m%d-%H:%M:%S", localtime;
    print +(join "\n                  ", "$now $first", @ary), "\n";
}

sub get_options {
    Getopt::Long::Configure('bundling');

    my %opts = ();
    GetOptions(
        "a"             => \$opts{all},
        "all"           => \$opts{all},
        "p=s"           => \$opts{port},
        "port=s"        => \$opts{port},
        "U=s"           => \$opts{username},
        "username=s"    => \$opts{username},
        "h=s"           => \$opts{host},
        "host=s"        => \$opts{host},
        "F=s"           => \$opts{format},
        "format=s"      => \$opts{format},
        "d=s"           => \$opts{database},
        "database=s"    => \$opts{database},
        "backup-dir=s"  => \$opts{backup_dir},
        "help"          => \$opts{help},
        "v"             => \$opts{verbose},
        "verbose"       => \$opts{verbose},
        "debug"         => \$opts{debug},
        );

    # Does the user need help?
    if ($opts{help}) {
        show_help();
    }

    $opts{host}         ||= $ENV{PGHOSTADDR} || $ENV{PGHOST}     || 'localhost';
    $opts{port}         ||= $ENV{PGPORT}     || '5432';
    $opts{host}         ||= $ENV{PGHOST}     || 'localhost';
    $opts{username}     ||= $ENV{PGUSER}     || $ENV{USER}       || 'postgres';
    $opts{database}     ||= $ENV{PGDATABASE} || $opts{username};
    $opts{backup_dir}   ||= './backups';

    my %formats = (
        c       => 'custom',
        custom  => 'custom',
        t       => 'tar',
        tar     => 'tar',
        p       => 'plain',
        plain   => 'plain',
    );
    $opts{format} = (defined $opts{format})
        ? $formats{$opts{format}} || 'custom'
        : 'custom';

    warn Dumper \%opts if ($opts{debug});
    return %opts;
}

sub show_help {
    print `perldoc -F $0`;
    exit;
}

sub slurp_file { local (*ARGV, $/); @ARGV = shift; <> }

__END__

更新:脚本已放在github 此处


非常好的代码,谢谢分享。顺便说一句,它可能是github'ed,你不这样认为吗?:-)
poige 2012年

2

看来您可以pg_stat_database用来获取事务计数并检查此计数是否从一次备份运行更改为下一次备份:

select datname, xact_commit from pg_stat_database;

  datname  | xact_commit 
-----------+-------------
 template1 |           0
 template0 |           0
 postgres  |      136785

如果有人打电话 pg_stat_reset您,您将无法确定数据库是否已更改,但是您可能会认为不太可能发生这种情况,然后再进行正确数量的事务处理才能匹配您的上一次阅读。

- 编辑

看到这个问题,为什么这可能行不通。不知道为什么会发生这种情况,但是启用日志记录可能会带来一些启发。


如果有人打电话,pg_stat_reset那么xact_commit值与前一个匹配的概率很低,不是吗?因此,肯定可以捕捉到DML更改的存在。现在,我所需要的只是了解是否发生了DDL更改。
gsiems

DDL在postgres中是事务性的-在这种情况下,我希望提交计数也会增加。尚未检查...
杰克说请尝试topanswers.xyz 2011年

主席先生,是的。我忘记了Pg DDL是事务性的,并且快速create table ...测试似乎确实可以增加xact_commit。
gsiems

1
进一步的测试表明,即使没有用户活动在进行,xact_commit也在增加-也许是自动清空?
gsiems

这绝对不能用于备份目的。即使没有人连接到数据库,xact_commit也会非常频繁地增加。
mivk

1

通过挖掘postgres文档和新闻组:

txid_current()会给您一个新的东西xid-如果您以后再调用一次该函数,如果您再次调用该函数,那么您将xid知道两次调用之间没有事务提交。不过,您可能会得到误报-例如,如果有人打电话给您txid_current()


感谢您的建议。我不相信这会起作用,但是因为txid_current()似乎在集群级别而不是数据库级别运行。
gsiems

我正在寻找有关此文档的文档,但找不到-您是否有链接?
杰克说请尝试topanswers.xyz 2011年

1
没有链接。我通过在数据库之间切换并运行“ select current_database(),txid_current();”进行测试。并比较结果。
gsiems

0

记录包含DB数据的文件的时间戳,并查看它们是否已更改。如果他们这样做了,那就写了。

WAL提示后进行编辑:仅应在刷新未完成的写入后执行此操作。


2
那不是可靠的。可能有尚未写入(丢弃)到数据文件中的更改,即,它们仅被写入WAL。
a_horse_with_no_name 2011年

By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.