Redshift中的多字节字符问题

时间:2022-03-26 23:10:13

I am unable to convert multibyte characters in Redshift.

我无法在Redshift中转换多字节字符。

create table temp2 (city varchar);

insert into temp2 values('г. красноярск');  // lower value

insert into temp2 values('Г. Красноярск'); //upper value

select * from temp2 where city ilike 'Г. Красноярск'

city          
------------- 
Г. Красноярск 

I tried like below, UTF-8 characters are converting into lower.

我尝试过如下,UTF-8字符转换为更低。

select lower('Г. Красноярск')

lower         
------------- 
г. красноярск 

In vertica it is working fine with using lowerb() function.

在vertica中,使用lowerb()函数可以正常工作。

1 个解决方案

#1


Internally the LIKE and ILIKE operators use PostgreSQL's regular expression support.

LIKE和ILIKE运算符在内部使用PostgreSQL的正则表达式支持。

Support for proper handling of utf-8 multibyte chars in regular expressions was added in PostgreSQL 9.2. Redshift is based on PostgreSQL 8.2 (?) and it looks like they haven't backported that support into their forked version.

PostgreSQL 9.2中添加了对正则表达式中正确处理utf-8多字节字符的支持。 Redshift基于PostgreSQL 8.2(?),看起来他们没有将这种支持反向移植到他们的分叉版本中。

See Postgresql regex to match uppercase, Unicode-aware

请参阅Postgresql正则表达式以匹配大写,支持Unicode

You can work around this, with limitations, by using LIKE lower('Г. Красноярск') instead. An expression index may be useful.

你可以通过使用LIKE lower('Г.Красноярск')来解决这个问题。表达式索引可能很有用。

#1


Internally the LIKE and ILIKE operators use PostgreSQL's regular expression support.

LIKE和ILIKE运算符在内部使用PostgreSQL的正则表达式支持。

Support for proper handling of utf-8 multibyte chars in regular expressions was added in PostgreSQL 9.2. Redshift is based on PostgreSQL 8.2 (?) and it looks like they haven't backported that support into their forked version.

PostgreSQL 9.2中添加了对正则表达式中正确处理utf-8多字节字符的支持。 Redshift基于PostgreSQL 8.2(?),看起来他们没有将这种支持反向移植到他们的分叉版本中。

See Postgresql regex to match uppercase, Unicode-aware

请参阅Postgresql正则表达式以匹配大写,支持Unicode

You can work around this, with limitations, by using LIKE lower('Г. Красноярск') instead. An expression index may be useful.

你可以通过使用LIKE lower('Г.Красноярск')来解决这个问题。表达式索引可能很有用。