I'm using Postgres and would like to make a big update query that would pick up from a CSV file, lets say I got a table that's got (id, banana, apple)
.
我使用的是Postgres,我想做一个大的更新查询,它会从CSV文件中获取,假设我有一个表(id, banana, apple)
I'd like to run an update that changes the Bananas and not the Apples, each new Banana and their ID would be in a CSV file.
我想运行一个更新来改变香蕉而不是苹果,每个新的香蕉和它们的ID会在CSV文件中。
I tried looking at the Postgres site but the examples are killing me.
我试着去看Postgres网站,但是这些例子让我受不了。
1 个解决方案
#1
100
I would COPY
the file to a temporary table and update the actual table from there. Could look like this:
我将把文件复制到临时表中,并从那里更新实际的表。可能像这样:
CREATE TEMP TABLE tmp_x (id int, apple text, banana text); -- but see below
COPY tmp_x FROM '/absolute/path/to/file' (FORMAT csv);
UPDATE tbl
SET banana = tmp_x.banana
FROM tmp_x
WHERE tbl.id = tmp_x.id;
DROP TABLE tmp_x; -- else it is dropped at end of session automatically
If the imported table matches the table to be updated exactly, this may be convenient:
如果导入的表与要更新的表完全匹配,这可能比较方便:
CREATE TEMP TABLE tmp_x AS SELECT * FROM tbl LIMIT 0;
Creates an empty temporary table matching the structure of the existing table, without constraints.
创建一个与现有表结构匹配的空临时表,不受约束。
Privileges
SQL COPY
requires superuser privileges for this. (The manual):
SQL COPY为此需要超级用户特权。(手动):
COPY
naming a file or command is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.只有数据库超级用户才允许复制命名文件或命令,因为它允许读取或写入服务器有权访问的任何文件。
The psql meta-command \copy
works for any db role. The manual:
psql元命令\拷贝适用于任何db角色。手册:
Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.
执行前端(客户端)拷贝。这是一个运行SQL COPY命令的操作,但是与读取或写入指定文件的服务器不同,psql读取或写入文件,并在服务器和本地文件系统之间路由数据。这意味着文件可访问性和特权是本地用户的,而不是服务器的,并且不需要SQL超级用户特权。
The scope of temporary tables is limited to a single session of a single role, so the above has to be executed in the same psql session:
临时表的范围仅限于单个角色的一个会话,因此以上内容必须在同一个psql会话中执行:
CREATE TEMP TABLE ...;
\copy tmp_x FROM '/absolute/path/to/file' (FORMAT csv);
UPDATE ...;
If you are scripting this in a bash command, be sure to wrap it all in a single psql call. Like:
如果您在bash命令中编写此脚本,请确保将其全部打包到一个psql调用中。如:
echo 'CREATE TEMP TABLE tmp_x ...; \copy tmp_x FROM ...; UPDATE ...;' | psql
Normally, you need the meta-command \\
to switch between psql meta commands and SQL comands in psql, but \copy
is an exception to this rule. The manual again:
通常,您需要元命令\\在psql元命令和SQL comand之间切换,但是这个规则的一个例外是\copy。手册:
special parsing rules apply to the
\copy
meta-command. Unlike most other meta-commands, the entire remainder of the line is always taken to be the arguments of\copy
, and neither variable interpolation nor backquote expansion are performed in the arguments.特殊的解析规则适用于\的元命令。与大多数其他的元命令不同,这一行的其余部分始终作为\copy的参数,并且在参数中不执行变量插补或回引扩展。
Big tables
If the import-table is big it may pay to increase temp_buffers
temporarily for the session (first thing in the session):
如果导入表较大,则可以为会话临时增加temp_buffers(会话中的第一件事):
SET temp_buffers = '500MB'; -- example value
Add an index to the temporary table:
向临时表添加索引:
CREATE INDEX tmp_x_id_idx ON tmp_x(id);
And run ANALYZE
manually, since temporary tables are not covered by autovacuum / auto-analyze.
并且手动运行分析,因为临时表不包含自动真空/自动分析。
ANALYZE tmp_x;
Related answers:
相关的答案:
- Best way to delete millions of rows by ID
- 通过ID删除数百万行的最佳方式
- How can I insert common data into a temp table from disparate schemas?
- 如何将公共数据从不同的模式插入到临时表中?
- How to delete duplicate entries?
- 如何删除重复的条目?
#1
100
I would COPY
the file to a temporary table and update the actual table from there. Could look like this:
我将把文件复制到临时表中,并从那里更新实际的表。可能像这样:
CREATE TEMP TABLE tmp_x (id int, apple text, banana text); -- but see below
COPY tmp_x FROM '/absolute/path/to/file' (FORMAT csv);
UPDATE tbl
SET banana = tmp_x.banana
FROM tmp_x
WHERE tbl.id = tmp_x.id;
DROP TABLE tmp_x; -- else it is dropped at end of session automatically
If the imported table matches the table to be updated exactly, this may be convenient:
如果导入的表与要更新的表完全匹配,这可能比较方便:
CREATE TEMP TABLE tmp_x AS SELECT * FROM tbl LIMIT 0;
Creates an empty temporary table matching the structure of the existing table, without constraints.
创建一个与现有表结构匹配的空临时表,不受约束。
Privileges
SQL COPY
requires superuser privileges for this. (The manual):
SQL COPY为此需要超级用户特权。(手动):
COPY
naming a file or command is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.只有数据库超级用户才允许复制命名文件或命令,因为它允许读取或写入服务器有权访问的任何文件。
The psql meta-command \copy
works for any db role. The manual:
psql元命令\拷贝适用于任何db角色。手册:
Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.
执行前端(客户端)拷贝。这是一个运行SQL COPY命令的操作,但是与读取或写入指定文件的服务器不同,psql读取或写入文件,并在服务器和本地文件系统之间路由数据。这意味着文件可访问性和特权是本地用户的,而不是服务器的,并且不需要SQL超级用户特权。
The scope of temporary tables is limited to a single session of a single role, so the above has to be executed in the same psql session:
临时表的范围仅限于单个角色的一个会话,因此以上内容必须在同一个psql会话中执行:
CREATE TEMP TABLE ...;
\copy tmp_x FROM '/absolute/path/to/file' (FORMAT csv);
UPDATE ...;
If you are scripting this in a bash command, be sure to wrap it all in a single psql call. Like:
如果您在bash命令中编写此脚本,请确保将其全部打包到一个psql调用中。如:
echo 'CREATE TEMP TABLE tmp_x ...; \copy tmp_x FROM ...; UPDATE ...;' | psql
Normally, you need the meta-command \\
to switch between psql meta commands and SQL comands in psql, but \copy
is an exception to this rule. The manual again:
通常,您需要元命令\\在psql元命令和SQL comand之间切换,但是这个规则的一个例外是\copy。手册:
special parsing rules apply to the
\copy
meta-command. Unlike most other meta-commands, the entire remainder of the line is always taken to be the arguments of\copy
, and neither variable interpolation nor backquote expansion are performed in the arguments.特殊的解析规则适用于\的元命令。与大多数其他的元命令不同,这一行的其余部分始终作为\copy的参数,并且在参数中不执行变量插补或回引扩展。
Big tables
If the import-table is big it may pay to increase temp_buffers
temporarily for the session (first thing in the session):
如果导入表较大,则可以为会话临时增加temp_buffers(会话中的第一件事):
SET temp_buffers = '500MB'; -- example value
Add an index to the temporary table:
向临时表添加索引:
CREATE INDEX tmp_x_id_idx ON tmp_x(id);
And run ANALYZE
manually, since temporary tables are not covered by autovacuum / auto-analyze.
并且手动运行分析,因为临时表不包含自动真空/自动分析。
ANALYZE tmp_x;
Related answers:
相关的答案:
- Best way to delete millions of rows by ID
- 通过ID删除数百万行的最佳方式
- How can I insert common data into a temp table from disparate schemas?
- 如何将公共数据从不同的模式插入到临时表中?
- How to delete duplicate entries?
- 如何删除重复的条目?