我如何找到最后一次更新PostgreSQL数据库的时间?

时间:2021-03-22 07:33:01

I am working with a postgreSQL database that gets updated in batches. I need to know when the last time that the database (or a table in the database)has been updated or modified, either will do.

我正在使用一个可以批量更新的postgreSQL数据库。我需要知道最后一次数据库(或数据库中的表)被更新或修改是什么时候,这两种方法都可以。

I saw that someone on the postgeSQL forum had suggested that to use logging and query your logs for the time. This will not work for me as that I do not have control over the clients codebase.

我看到postgeSQL论坛上有人建议暂时使用日志记录和查询日志。这对我来说行不通,因为我不能控制客户端代码库。

5 个解决方案

#1


22  

You can write a trigger to run every time an insert/update is made on a particular table. The common usage is to set a "created" or "last_updated" column of the row to the current time, but you could also update the time in a central location if you don't want to change the existing tables.

每次在特定表上进行插入/更新时,都可以编写一个触发器来运行。通常的用法是将行的“创建”或“last_updates”列设置为当前时间,但如果不希望更改现有表,也可以在中心位置更新时间。

So for example a typical way is the following one:

例如,一个典型的方法是

CREATE FUNCTION stamp_updated() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
  NEW.last_updated := now();
  RETURN NEW;
END
$$;
-- repeat for each table you need to track:
ALTER TABLE sometable ADD COLUMN last_updated TIMESTAMP;
CREATE TRIGGER sometable_stamp_updated
  BEFORE INSERT OR UPDATE ON sometable
  FOR EACH ROW EXECUTE PROCEDURE stamp_updated();

Then to find the last update time, you need to select "MAX(last_updated)" from each table you are tracking and take the greatest of those, e.g.:

然后要找到最近的更新时间,您需要从跟踪的每个表中选择“MAX(last_updates)”,然后选择其中最大的一个,例如:

SELECT MAX(max_last_updated) FROM (
  SELECT MAX(last_updated) AS max_last_updated FROM sometable
  UNION ALL
  SELECT MAX(last_updated) FROM someothertable
) updates

For tables with a serial (or similarly-generated) primary key, you can try avoid the sequential scan to find the latest update time by using the primary key index, or you create indices on last_updated.

对于具有串行(或类似生成的)主键的表,可以使用主键索引避免顺序扫描,以查找最新的更新时间,或者在last_updates上创建索引。

-- get timestamp of row with highest id
SELECT last_updated FROM sometable ORDER BY sometable_id DESC LIMIT 1

Note that this can give slightly wrong results in the case of IDs not being quite sequential, but how much accuracy do you need? (Bear in mind that transactions mean that rows can become visible to you in a different order to them being created.)

请注意,在IDs的情况下,这可能会产生一些错误的结果,但是您需要多少精度呢?(请记住,事务意味着可以以不同的顺序显示正在创建的行。)

An alternative approach to avoid adding 'updated' columns to each table is to have a central table to store update timestamps in. For example:

避免向每个表添加“更新”列的另一种方法是使用一个*表来存储更新时间戳。例如:

CREATE TABLE update_log(table_name text PRIMARY KEY, updated timestamp NOT NULL DEFAULT now());
CREATE FUNCTION stamp_update_log() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
  INSERT INTO update_log(table_name) VALUES(TG_TABLE_NAME);
  RETURN NEW;
END
$$;
-- Repeat for each table you need to track:
CREATE TRIGGER sometable_stamp_update_log
 AFTER INSERT OR UPDATE ON sometable
 FOR EACH STATEMENT EXECUTE stamp_update_log();

This will give you a table with a row for each table update: you can then just do:

这将为每个表更新提供一个表,您可以这样做:

SELECT MAX(updated) FROM update_log

To get the last update time. (You could split this out by table if you wanted). This table will of course just keep growing: either create an index on 'updated' (which should make getting the latest one pretty fast) or truncate it periodically if that fits with your use case, (e.g. take an exclusive lock on the table, get the latest update time, then truncate it if you need to periodically check if changes have been made).

获取最后的更新时间。(如果你愿意,你可以按表格来分)。这个表当然会继续增长:要么上创建一个索引的更新(这应该得到最新的一个非常快)或截断它如果符合你的使用情况,定期(如独占锁在桌子上,获得最新更新时间,然后截断它如果需要定期进行检查是否更改)。

An alternative approach- which might be what the folks on the forum meant- is to set 'log_statement = mod' in the database configuration (either globally for the cluster, or on the database or user you need to track) and then all statements that modify the database will be written to the server log. You'll then need to write something outside the database to scan the server log, filtering out tables you aren't interested in, etc.

另一种方法——这可能是论坛上的人是什么意思,是设置“log_statement =国防部”数据库配置(集群在全球范围内,或在数据库或用户需要跟踪),然后所有语句修改数据库将被写入到服务器日志。然后,您需要在数据库之外编写一些东西来扫描服务器日志、筛选您不感兴趣的表等等。

#2


4  

It looks like you can use pg_stat_database to get a transaction count and check if this changes from one backup run to the next - see this dba.se answer and comments for more details

看起来您可以使用pg_stat_database来获取事务计数,并检查从一个备份运行到下一个备份运行的更改——请参阅dba。se回答并评论更多细节

#3


3  

I like Jack's approach. You can query the table stats and know the number of inserts, updates, deletes and so:

我喜欢杰克的方法。您可以查询表的统计数据,并知道插入、更新、删除等的数量:

select n_tup_upd from pg_stat_user_tables  where relname = 'YOUR_TABLE';

every update will increase the count by 1.

每次更新都会增加1的计数。

bare in mind this method is viable when you have a single DB. multiple instances will require different approach probably.

当您有一个DB时,这个方法是可行的。多个实例可能需要不同的方法。

#4


2  

See the following article:

看到下面的文章:

MySQL versus PostgreSQL: Adding a 'Last Modified Time' Column to a Table http://www.pointbeing.net/weblog/2008/03/mysql-versus-postgresql-adding-a-last-modified-column-to-a-table.html

MySQL与PostgreSQL:将“最后修改的时间”列添加到一个表http://www.pointbeing.net/weblog/2008/03/mysql-versus- PostgreSQL - Adding - Last - Modified - Column -to-a-table.html。

#5


0  

You can write a stored procedure in an "untrusted language" (e.g. plpythonu): This allows access to the files in the postgres "base" directory. Return the larges mtime of these files in the stored procedure.

您可以使用“不受信任的语言”(例如plpythonu)编写一个存储过程:这允许访问postgres“base”目录中的文件。在存储过程中返回这些文件的最大时间。

But this is only vague, since vacuum will change these files and the mtime.

但是这仅仅是模糊的,因为真空将改变这些文件和时间。

#1


22  

You can write a trigger to run every time an insert/update is made on a particular table. The common usage is to set a "created" or "last_updated" column of the row to the current time, but you could also update the time in a central location if you don't want to change the existing tables.

每次在特定表上进行插入/更新时,都可以编写一个触发器来运行。通常的用法是将行的“创建”或“last_updates”列设置为当前时间,但如果不希望更改现有表,也可以在中心位置更新时间。

So for example a typical way is the following one:

例如,一个典型的方法是

CREATE FUNCTION stamp_updated() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
  NEW.last_updated := now();
  RETURN NEW;
END
$$;
-- repeat for each table you need to track:
ALTER TABLE sometable ADD COLUMN last_updated TIMESTAMP;
CREATE TRIGGER sometable_stamp_updated
  BEFORE INSERT OR UPDATE ON sometable
  FOR EACH ROW EXECUTE PROCEDURE stamp_updated();

Then to find the last update time, you need to select "MAX(last_updated)" from each table you are tracking and take the greatest of those, e.g.:

然后要找到最近的更新时间,您需要从跟踪的每个表中选择“MAX(last_updates)”,然后选择其中最大的一个,例如:

SELECT MAX(max_last_updated) FROM (
  SELECT MAX(last_updated) AS max_last_updated FROM sometable
  UNION ALL
  SELECT MAX(last_updated) FROM someothertable
) updates

For tables with a serial (or similarly-generated) primary key, you can try avoid the sequential scan to find the latest update time by using the primary key index, or you create indices on last_updated.

对于具有串行(或类似生成的)主键的表,可以使用主键索引避免顺序扫描,以查找最新的更新时间,或者在last_updates上创建索引。

-- get timestamp of row with highest id
SELECT last_updated FROM sometable ORDER BY sometable_id DESC LIMIT 1

Note that this can give slightly wrong results in the case of IDs not being quite sequential, but how much accuracy do you need? (Bear in mind that transactions mean that rows can become visible to you in a different order to them being created.)

请注意,在IDs的情况下,这可能会产生一些错误的结果,但是您需要多少精度呢?(请记住,事务意味着可以以不同的顺序显示正在创建的行。)

An alternative approach to avoid adding 'updated' columns to each table is to have a central table to store update timestamps in. For example:

避免向每个表添加“更新”列的另一种方法是使用一个*表来存储更新时间戳。例如:

CREATE TABLE update_log(table_name text PRIMARY KEY, updated timestamp NOT NULL DEFAULT now());
CREATE FUNCTION stamp_update_log() RETURNS TRIGGER LANGUAGE 'plpgsql' AS $$
BEGIN
  INSERT INTO update_log(table_name) VALUES(TG_TABLE_NAME);
  RETURN NEW;
END
$$;
-- Repeat for each table you need to track:
CREATE TRIGGER sometable_stamp_update_log
 AFTER INSERT OR UPDATE ON sometable
 FOR EACH STATEMENT EXECUTE stamp_update_log();

This will give you a table with a row for each table update: you can then just do:

这将为每个表更新提供一个表,您可以这样做:

SELECT MAX(updated) FROM update_log

To get the last update time. (You could split this out by table if you wanted). This table will of course just keep growing: either create an index on 'updated' (which should make getting the latest one pretty fast) or truncate it periodically if that fits with your use case, (e.g. take an exclusive lock on the table, get the latest update time, then truncate it if you need to periodically check if changes have been made).

获取最后的更新时间。(如果你愿意,你可以按表格来分)。这个表当然会继续增长:要么上创建一个索引的更新(这应该得到最新的一个非常快)或截断它如果符合你的使用情况,定期(如独占锁在桌子上,获得最新更新时间,然后截断它如果需要定期进行检查是否更改)。

An alternative approach- which might be what the folks on the forum meant- is to set 'log_statement = mod' in the database configuration (either globally for the cluster, or on the database or user you need to track) and then all statements that modify the database will be written to the server log. You'll then need to write something outside the database to scan the server log, filtering out tables you aren't interested in, etc.

另一种方法——这可能是论坛上的人是什么意思,是设置“log_statement =国防部”数据库配置(集群在全球范围内,或在数据库或用户需要跟踪),然后所有语句修改数据库将被写入到服务器日志。然后,您需要在数据库之外编写一些东西来扫描服务器日志、筛选您不感兴趣的表等等。

#2


4  

It looks like you can use pg_stat_database to get a transaction count and check if this changes from one backup run to the next - see this dba.se answer and comments for more details

看起来您可以使用pg_stat_database来获取事务计数,并检查从一个备份运行到下一个备份运行的更改——请参阅dba。se回答并评论更多细节

#3


3  

I like Jack's approach. You can query the table stats and know the number of inserts, updates, deletes and so:

我喜欢杰克的方法。您可以查询表的统计数据,并知道插入、更新、删除等的数量:

select n_tup_upd from pg_stat_user_tables  where relname = 'YOUR_TABLE';

every update will increase the count by 1.

每次更新都会增加1的计数。

bare in mind this method is viable when you have a single DB. multiple instances will require different approach probably.

当您有一个DB时,这个方法是可行的。多个实例可能需要不同的方法。

#4


2  

See the following article:

看到下面的文章:

MySQL versus PostgreSQL: Adding a 'Last Modified Time' Column to a Table http://www.pointbeing.net/weblog/2008/03/mysql-versus-postgresql-adding-a-last-modified-column-to-a-table.html

MySQL与PostgreSQL:将“最后修改的时间”列添加到一个表http://www.pointbeing.net/weblog/2008/03/mysql-versus- PostgreSQL - Adding - Last - Modified - Column -to-a-table.html。

#5


0  

You can write a stored procedure in an "untrusted language" (e.g. plpythonu): This allows access to the files in the postgres "base" directory. Return the larges mtime of these files in the stored procedure.

您可以使用“不受信任的语言”(例如plpythonu)编写一个存储过程:这允许访问postgres“base”目录中的文件。在存储过程中返回这些文件的最大时间。

But this is only vague, since vacuum will change these files and the mtime.

但是这仅仅是模糊的,因为真空将改变这些文件和时间。