在插入错误上的Postgres错误:编码“UTF8”的无效字节序列:0x00。

时间:2021-11-16 11:46:19

I get the following error when inserting data from mysql into postgres.

在将数据从mysql插入到postgres时,我得到了以下错误。

Do I have to manually remove all null characters from my input data? Is there a way to get postgres to do this for me?

我必须手动删除输入数据中的所有空字符吗?有没有办法让postgres帮我做这件事?

ERROR: invalid byte sequence for encoding "UTF8": 0x00

4 个解决方案

#1


40  

PostgreSQL doesn't support storing NULL (\0x00) characters in text fields (this is obviously different from the database NULL value, which is fully supported).

PostgreSQL不支持在文本字段中存储NULL (\0x00)字符(这与完全支持的数据库空值明显不同)。

Source: http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE

来源:http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html SQL-SYNTAX-STRINGS-UESCAPE

If you need to store the NULL character, you must use a bytea field - which should store anything you want, but won't support text operations on it.

如果需要存储NULL字符,则必须使用bytea字段——该字段应该存储您想要的任何内容,但不支持对其进行文本操作。

Given that PostgreSQL doesn't support it in text values, there's no good way to get it to remove it. You could import your data into bytea and later convert it to text using a special function (in perl or something, maybe?), but it's likely going to be easier to do that in preprocessing before you load it.

由于PostgreSQL不支持文本值,因此没有什么好办法来删除它。您可以将您的数据导入到bytea中,然后使用特殊的函数(可能是perl或其他东西)将数据转换为文本,但是在加载之前进行预处理可能会更容易。

#2


15  

Just regex out null bytes:

只需要regex输出空字节:

s/\x00//g;

#3


4  

If you are using Java, you could just replace the x00 characters before the insert like following:

如果您正在使用Java,您可以在插入之前替换x00字符如下:

myValue.replaceAll("\u0000", "")

The solution was provided and explained by Csaba in following post:

该解决方案由Csaba提供并解释如下:

https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de

https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de

Respectively:

分别为:

in Java you can actually have a "0x0" character in your string, and that's valid unicode. So that's translated to the character 0x0 in UTF8, which in turn is not accepted because the server uses null terminated strings... so the only way is to make sure your strings don't contain the character '\u0000'.

在Java中,您实际上可以在字符串中有一个“0x0”字符,这是有效的unicode。因此,这将转换为UTF8中的字符0x0,因为服务器使用了空终止字符串,因此不被接受。所以唯一的方法是确保你的字符串不包含字符'\u0000'。

#4


1  

You can first insert data into blob field and then copy to text field with the folloing function

您可以首先将数据插入blob字段,然后使用folloing函数复制到文本字段。

CREATE OR REPLACE FUNCTION blob2text() RETURNS void AS $$
Declare
    ref record;
    i integer;
Begin
    FOR ref IN SELECT id, blob_field FROM table LOOP

          --  find 0x00 and replace with space    
      i := position(E'\\000'::bytea in ref.blob_field);
      WHILE i > 0 LOOP
        ref.bob_field := set_byte(ref.blob_field, i-1, 20);
        i := position(E'\\000'::bytea in ref.blobl_field);
      END LOOP

    UPDATE table SET field = encode(ref.blob_field, 'escape') WHERE id = ref.id;
    END LOOP;

End; $$ LANGUAGE plpgsql; 

--

- - -

SELECT blob2text();

#1


40  

PostgreSQL doesn't support storing NULL (\0x00) characters in text fields (this is obviously different from the database NULL value, which is fully supported).

PostgreSQL不支持在文本字段中存储NULL (\0x00)字符(这与完全支持的数据库空值明显不同)。

Source: http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE

来源:http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html SQL-SYNTAX-STRINGS-UESCAPE

If you need to store the NULL character, you must use a bytea field - which should store anything you want, but won't support text operations on it.

如果需要存储NULL字符,则必须使用bytea字段——该字段应该存储您想要的任何内容,但不支持对其进行文本操作。

Given that PostgreSQL doesn't support it in text values, there's no good way to get it to remove it. You could import your data into bytea and later convert it to text using a special function (in perl or something, maybe?), but it's likely going to be easier to do that in preprocessing before you load it.

由于PostgreSQL不支持文本值,因此没有什么好办法来删除它。您可以将您的数据导入到bytea中,然后使用特殊的函数(可能是perl或其他东西)将数据转换为文本,但是在加载之前进行预处理可能会更容易。

#2


15  

Just regex out null bytes:

只需要regex输出空字节:

s/\x00//g;

#3


4  

If you are using Java, you could just replace the x00 characters before the insert like following:

如果您正在使用Java,您可以在插入之前替换x00字符如下:

myValue.replaceAll("\u0000", "")

The solution was provided and explained by Csaba in following post:

该解决方案由Csaba提供并解释如下:

https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de

https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de

Respectively:

分别为:

in Java you can actually have a "0x0" character in your string, and that's valid unicode. So that's translated to the character 0x0 in UTF8, which in turn is not accepted because the server uses null terminated strings... so the only way is to make sure your strings don't contain the character '\u0000'.

在Java中,您实际上可以在字符串中有一个“0x0”字符,这是有效的unicode。因此,这将转换为UTF8中的字符0x0,因为服务器使用了空终止字符串,因此不被接受。所以唯一的方法是确保你的字符串不包含字符'\u0000'。

#4


1  

You can first insert data into blob field and then copy to text field with the folloing function

您可以首先将数据插入blob字段,然后使用folloing函数复制到文本字段。

CREATE OR REPLACE FUNCTION blob2text() RETURNS void AS $$
Declare
    ref record;
    i integer;
Begin
    FOR ref IN SELECT id, blob_field FROM table LOOP

          --  find 0x00 and replace with space    
      i := position(E'\\000'::bytea in ref.blob_field);
      WHILE i > 0 LOOP
        ref.bob_field := set_byte(ref.blob_field, i-1, 20);
        i := position(E'\\000'::bytea in ref.blobl_field);
      END LOOP

    UPDATE table SET field = encode(ref.blob_field, 'escape') WHERE id = ref.id;
    END LOOP;

End; $$ LANGUAGE plpgsql; 

--

- - -

SELECT blob2text();