如何使用CSV文件中的标题从CSV文件复制到PostgreSQL表?

时间:2021-08-09 23:26:01

I want to copy a CSV file to a Postgres table. There are about 100 columns in this table, so I do not want to rewrite them if I don't have to.

我想将CSV文件复制到Postgres表。这个表中大约有100列,所以如果不需要,我不想重写它们。

I am using the \copy table from 'table.csv' delimiter ',' csv; command but without a table created I get ERROR: relation "table" does not exist. If I add a blank table I get no error, but nothing happens. I tried this command two or three times and there was no output or messages, but the table was not updated when I checked it through PGAdmin.

我正在使用'table.csv'delimiter','csv;中的\ copy表。命令,但没有创建表我得到错误:关系“表”不存在。如果我添加一个空白表我没有错误,但没有任何反应。我尝试了两次或三次这个命令并且没有输出或消息,但是当我通过PGAdmin检查时表没有更新。

Is there a way to import a table with headers included like I am trying to do?

有没有办法导入包含标题的表,就像我想要的那样?

4 个解决方案

#1


76  

This worked. The first row had column names in it.

这很有效。第一行中包含列名。

COPY wheat FROM 'wheat_crop_data.csv' DELIMITER ';' CSV HEADER

#2


25  

With the Python library pandas, you can easily create column names and infer data types from a csv file.

使用Python库pandas,您可以轻松地从csv文件创建列名和推断数据类型。

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('postgresql://user:pass@localhost/db_name')
df = pd.read_csv('/path/to/csv_file')
df.to_sql('pandas_db', engine)

The if_exists parameter can be set to replace or append to an existing table, e.g. df.to_sql('pandas_db', engine, if_exists='replace'). This works for additional input file types as well, docs here and here.

if_exists参数可以设置为替换或附加到现有表,例如df.to_sql('pandas_db',engine,if_exists ='replace')。这适用于其他输入文件类型,这里和这里的文档。

#3


6  

Alternative by terminal with no permission

The pg documentation at NOTES say

NOTES的pg文档说

The path will be interpreted relative to the working directory of the server process (normally the cluster's data directory), not the client's working directory.

该路径将相对于服务器进程的工作目录(通常是集群的数据目录)进行解释,而不是客户端的工作目录。

So, gerally, using psql or any client, even in a local server, you have problems ... And, if you're expressing COPY command for other users, eg. at a Github README, the reader will have problems ...

因此,从字面上看,使用psql或任何客户端,即使在本地服务器中,也存在问题......并且,如果您正在为其他用户表达COPY命令,例如。在Github自述文件中,读者会遇到问题......

The only way to express relative path with client permissions is using STDIN,

使用客户端权限表达相对路径的唯一方法是使用STDIN,

When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.

指定STDIN或STDOUT时,数据通过客户端和服务器之间的连接传输。

as remembered here:

记得在这里:

psql -h remotehost -d remote_mydb -U myuser -c \
   "copy mytable (column1, column2) from STDIN with delimiter as ','" \
   < ./relative_path/file.csv

#4


3  

I have been using this function for a while with no problems. You just need to provide the number columns there are in the csv file, and it will take the header names from the first row and create the table for you:

我已经使用这个功能一段时间没有问题。您只需要提供csv文件中的数字列,它将从第一行获取标题名称并为您创建表:

create or replace function data.load_csv_file
    (
        target_table  text, -- name of the table that will be created
        csv_file_path text,
        col_count     integer
    )

    returns void

as $$

declare
    iter      integer; -- dummy integer to iterate columns with
    col       text; -- to keep column names in each iteration
    col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet

begin
    set schema 'data';

    create table temp_table ();

    -- add just enough number of columns
    for iter in 1..col_count
    loop
        execute format ('alter table temp_table add column col_%s text;', iter);
    end loop;

    -- copy the data from csv file
    execute format ('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_file_path);

    iter := 1;
    col_first := (select col_1
                  from temp_table
                  limit 1);

    -- update the column names based on the first row which has the column names
    for col in execute format ('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
    loop
        execute format ('alter table temp_table rename column col_%s to %s', iter, col);
        iter := iter + 1;
    end loop;

    -- delete the columns row // using quote_ident or %I does not work here!?
    execute format ('delete from temp_table where %s = %L', col_first, col_first);

    -- change the temp table name to the name given as parameter, if not blank
    if length (target_table) > 0 then
        execute format ('alter table temp_table rename to %I', target_table);
    end if;
end;

$$ language plpgsql;

#1


76  

This worked. The first row had column names in it.

这很有效。第一行中包含列名。

COPY wheat FROM 'wheat_crop_data.csv' DELIMITER ';' CSV HEADER

#2


25  

With the Python library pandas, you can easily create column names and infer data types from a csv file.

使用Python库pandas,您可以轻松地从csv文件创建列名和推断数据类型。

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('postgresql://user:pass@localhost/db_name')
df = pd.read_csv('/path/to/csv_file')
df.to_sql('pandas_db', engine)

The if_exists parameter can be set to replace or append to an existing table, e.g. df.to_sql('pandas_db', engine, if_exists='replace'). This works for additional input file types as well, docs here and here.

if_exists参数可以设置为替换或附加到现有表,例如df.to_sql('pandas_db',engine,if_exists ='replace')。这适用于其他输入文件类型,这里和这里的文档。

#3


6  

Alternative by terminal with no permission

The pg documentation at NOTES say

NOTES的pg文档说

The path will be interpreted relative to the working directory of the server process (normally the cluster's data directory), not the client's working directory.

该路径将相对于服务器进程的工作目录(通常是集群的数据目录)进行解释,而不是客户端的工作目录。

So, gerally, using psql or any client, even in a local server, you have problems ... And, if you're expressing COPY command for other users, eg. at a Github README, the reader will have problems ...

因此,从字面上看,使用psql或任何客户端,即使在本地服务器中,也存在问题......并且,如果您正在为其他用户表达COPY命令,例如。在Github自述文件中,读者会遇到问题......

The only way to express relative path with client permissions is using STDIN,

使用客户端权限表达相对路径的唯一方法是使用STDIN,

When STDIN or STDOUT is specified, data is transmitted via the connection between the client and the server.

指定STDIN或STDOUT时,数据通过客户端和服务器之间的连接传输。

as remembered here:

记得在这里:

psql -h remotehost -d remote_mydb -U myuser -c \
   "copy mytable (column1, column2) from STDIN with delimiter as ','" \
   < ./relative_path/file.csv

#4


3  

I have been using this function for a while with no problems. You just need to provide the number columns there are in the csv file, and it will take the header names from the first row and create the table for you:

我已经使用这个功能一段时间没有问题。您只需要提供csv文件中的数字列,它将从第一行获取标题名称并为您创建表:

create or replace function data.load_csv_file
    (
        target_table  text, -- name of the table that will be created
        csv_file_path text,
        col_count     integer
    )

    returns void

as $$

declare
    iter      integer; -- dummy integer to iterate columns with
    col       text; -- to keep column names in each iteration
    col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet

begin
    set schema 'data';

    create table temp_table ();

    -- add just enough number of columns
    for iter in 1..col_count
    loop
        execute format ('alter table temp_table add column col_%s text;', iter);
    end loop;

    -- copy the data from csv file
    execute format ('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_file_path);

    iter := 1;
    col_first := (select col_1
                  from temp_table
                  limit 1);

    -- update the column names based on the first row which has the column names
    for col in execute format ('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
    loop
        execute format ('alter table temp_table rename column col_%s to %s', iter, col);
        iter := iter + 1;
    end loop;

    -- delete the columns row // using quote_ident or %I does not work here!?
    execute format ('delete from temp_table where %s = %L', col_first, col_first);

    -- change the temp table name to the name given as parameter, if not blank
    if length (target_table) > 0 then
        execute format ('alter table temp_table rename to %I', target_table);
    end if;
end;

$$ language plpgsql;