不使用pg_dump从PostgreSQL DB中提取数据

时间:2021-08-19 02:53:27

There is a PostgreSQL database on which I only have limited access (e.g, I can't use pg_dump). I am trying to create a local "mirror" by exporting certain tables from the database. I do not have the permissions needed to just dump a table as SQL from within psql. Right now, I just have a Python script that iterates through my table_names, selects all fields and then exports them as a CSV:

有一个PostgreSQL数据库,我只有有限的访问权限(e)。g,我不能使用pg_dump)。我试图通过从数据库导出某些表来创建本地“镜像”。我没有从psql中直接将表作为SQL转储所需的权限。现在,我有一个Python脚本,它遍历我的table_names,选择所有字段,然后将它们导出为CSV:

for table_name, file_name in zip(table_names, file_names):
    cmd = """echo "\\\copy (select * from %s)" to stdout WITH CSV HEADER | psql -d remote_db | gzip > ./%s/%s.gz"""%(table_name,dir_name,file_name)
    os.system(cmd)

I would like to not use CSV if possible, as I lose the field types and the encoding can get messed up. First best would probably be some way of getting the generating SQL code for the table using \copy. Next best would be XML, ideally with some way of preserving the field types. If that doesn't work, I think the final option might be two queries---one to get the field data types, the other to get the actual data.

如果可能的话,我不想使用CSV,因为我丢失了字段类型,编码可能会出错。最好的方法可能是使用\copy获取表的SQL代码。下一个最佳选择是XML,理想的情况是通过某种方式保存字段类型。如果不行,我认为最后的选项可能是两个查询——一个是获取字段数据类型,另一个是获取实际数据。

Any thoughts or advice would be greatly appreciated - thanks!

如有任何想法或建议,我们将不胜感激——谢谢!

3 个解决方案

#1


3  

It puzzles me the bit about "I do not have the permissions needed to just dump a table as SQL from within psql." pg_dump runs standalone, outside psql (both are clients) and if you have permission to connect to the database and select a table, I'd guess you'd also be able to dump it using pg_dump -t <table>. Am I missing something?

我很困惑一下“我没有权限从内部需要将一个表作为SQL psql。”pg_dump独立运行,外面psql(客户)和如果你有允许连接到数据库并选择一个表,我猜你也可以使用pg_dump转储它- t <表> 。我遗漏了什么东西?

#2


2  

If you use psycopg2 you can use cursor.description to check column names, and use fetched data type to convert it to required string like data to acceptable format.

如果您使用psycopg2,您可以使用cursor.description来检查列名,并使用获取的数据类型将其转换为所需的字符串,如数据到可接受的格式。

This code creates INSERT statements that you can use not only with PostgreSQL, but also with other databases (then you probably will have to change date format):

此代码创建的INSERT语句不仅可以用于PostgreSQL,还可以用于其他数据库(那么您可能需要更改日期格式):

cursor.execute("SELECT * FROM %s" % (table_name))
column_names = []
columns_descr = cursor.description
for c in columns_descr:
    column_names.append(c[0])
insert_prefix = 'insert into %s (%s) values ' % (table_name, ', '.join(column_names))
rows = cursor.fetchall()
for row in rows:
    row_data = []
    for rd in row:
        if rd is None:
            row_data.append('NULL')
        elif isinstance(rd, datetime.datetime):
            row_data.append("'%s'" % (rd.strftime('%Y-%m-%d %H:%M:%S') ))
        else:
            row_data.append(repr(rd))
    print('%s (%s);' % (insert_prefix, ', '.join(row_data)))

In psycopg2 there is even support for COPY. Look at: COPY-related methods on their docs

在psycopg2中甚至还支持复制。看看:复制相关的方法在他们的文档上

If you prefer using metadata then you can use my recipe: Dump PostgreSQL db schema to text. It is based on Extracting META information from PostgreSQL by Lorenzo Alberton

如果您更喜欢使用元数据,那么您可以使用我的配方:将PostgreSQL db模式转储为文本。它基于从Lorenzo Alberton的PostgreSQL中提取元信息

#3


1  

You could use these queries (gotten by using "psql --echo-hidden" and "\d ") to get the base metadata:

您可以使用这些查询(通过使用“psql—echo-hidden”和“\d”获取基本元数据:

-- GET OID
SET oid FROM pg_class WHERE relname = <YOUR_TABLE_NAME>

-- GET METADATA
SELECT a.attname,
  pg_catalog.format_type(a.atttypid, a.atttypmod),
  (SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid) for 128)
   FROM pg_catalog.pg_attrdef d
   WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
   a.attnotnull, a.attnum
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = <YOUR_TABLES_OID_FROM_PG_CLASS> AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;

This gives you the name, data type, default, null flag and field order within the row. To get the actual data, your best bet is still CSV--the built in COPY table TO STDOUT WITH CSV HEADER is very robust. But if you are worried about encoding, be sure to get the value of server_encoding and client_encoding just before dumping the CSV data. That combined with the metadata from the above query should give enough information to properly interpret a CSV dump.

这将为您提供行中的名称、数据类型、默认值、空标志和字段顺序。要获取实际数据,最好的选择还是CSV——内置的复制表到带有CSV头的STDOUT非常健壮。但是,如果您担心编码,请确保在转储CSV数据之前获得server_encoding和client_encoding的值。与上述查询的元数据相结合,应该提供足够的信息来正确解释CSV转储。

#1


3  

It puzzles me the bit about "I do not have the permissions needed to just dump a table as SQL from within psql." pg_dump runs standalone, outside psql (both are clients) and if you have permission to connect to the database and select a table, I'd guess you'd also be able to dump it using pg_dump -t <table>. Am I missing something?

我很困惑一下“我没有权限从内部需要将一个表作为SQL psql。”pg_dump独立运行,外面psql(客户)和如果你有允许连接到数据库并选择一个表,我猜你也可以使用pg_dump转储它- t <表> 。我遗漏了什么东西?

#2


2  

If you use psycopg2 you can use cursor.description to check column names, and use fetched data type to convert it to required string like data to acceptable format.

如果您使用psycopg2,您可以使用cursor.description来检查列名,并使用获取的数据类型将其转换为所需的字符串,如数据到可接受的格式。

This code creates INSERT statements that you can use not only with PostgreSQL, but also with other databases (then you probably will have to change date format):

此代码创建的INSERT语句不仅可以用于PostgreSQL,还可以用于其他数据库(那么您可能需要更改日期格式):

cursor.execute("SELECT * FROM %s" % (table_name))
column_names = []
columns_descr = cursor.description
for c in columns_descr:
    column_names.append(c[0])
insert_prefix = 'insert into %s (%s) values ' % (table_name, ', '.join(column_names))
rows = cursor.fetchall()
for row in rows:
    row_data = []
    for rd in row:
        if rd is None:
            row_data.append('NULL')
        elif isinstance(rd, datetime.datetime):
            row_data.append("'%s'" % (rd.strftime('%Y-%m-%d %H:%M:%S') ))
        else:
            row_data.append(repr(rd))
    print('%s (%s);' % (insert_prefix, ', '.join(row_data)))

In psycopg2 there is even support for COPY. Look at: COPY-related methods on their docs

在psycopg2中甚至还支持复制。看看:复制相关的方法在他们的文档上

If you prefer using metadata then you can use my recipe: Dump PostgreSQL db schema to text. It is based on Extracting META information from PostgreSQL by Lorenzo Alberton

如果您更喜欢使用元数据,那么您可以使用我的配方:将PostgreSQL db模式转储为文本。它基于从Lorenzo Alberton的PostgreSQL中提取元信息

#3


1  

You could use these queries (gotten by using "psql --echo-hidden" and "\d ") to get the base metadata:

您可以使用这些查询(通过使用“psql—echo-hidden”和“\d”获取基本元数据:

-- GET OID
SET oid FROM pg_class WHERE relname = <YOUR_TABLE_NAME>

-- GET METADATA
SELECT a.attname,
  pg_catalog.format_type(a.atttypid, a.atttypmod),
  (SELECT substring(pg_catalog.pg_get_expr(d.adbin, d.adrelid) for 128)
   FROM pg_catalog.pg_attrdef d
   WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef),
   a.attnotnull, a.attnum
FROM pg_catalog.pg_attribute a
WHERE a.attrelid = <YOUR_TABLES_OID_FROM_PG_CLASS> AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;

This gives you the name, data type, default, null flag and field order within the row. To get the actual data, your best bet is still CSV--the built in COPY table TO STDOUT WITH CSV HEADER is very robust. But if you are worried about encoding, be sure to get the value of server_encoding and client_encoding just before dumping the CSV data. That combined with the metadata from the above query should give enough information to properly interpret a CSV dump.

这将为您提供行中的名称、数据类型、默认值、空标志和字段顺序。要获取实际数据,最好的选择还是CSV——内置的复制表到带有CSV头的STDOUT非常健壮。但是,如果您担心编码,请确保在转储CSV数据之前获得server_encoding和client_encoding的值。与上述查询的元数据相结合,应该提供足够的信息来正确解释CSV转储。