在amazon redshift中,SQL Server的isNumeric()等效。

时间:2021-08-16 23:08:03
  • I'm using amazon redshift as my data warehouse
  • 我使用amazon redshift作为我的数据仓库
  • I have a field (field1)of type string. Some of the strings start with four numbers and others with letters:
  • 我有一个类型为string的字段(field1)。有些字符串以4个数字开头,有些以字母开头:

'test alpha'
'1382 test beta'

测试alpha 1382测试beta

  • I want to filter out rows where the string does not start with four numbers
  • 我想要过滤出字符串不从四个数字开始的行。
  • Looking at the redshift documentation, I don't believe isnumber or isnumeric are functions. It seems that the 'like' function is the best possibility.
  • 看一下红移文档,我不认为isnumber或isnumeric是函数。似乎“喜欢”函数是最好的可能性。
  • I tried

    我试着

    where left(field1, 4) like '[0-9][0-9][0-9][0-9]'

    左(field1, 4)如[0-9][0-9][0-9][0-9][0-9][0-9]

this did not work and from the link below seems like redshift may not support that:

这是行不通的,从下面的链接来看,红移可能不支持这一点:

https://forums.aws.amazon.com/message.jspa?messageID=439850

https://forums.aws.amazon.com/message.jspa?messageID=439850

is there an error in the 'where' clause? if not and that clause isn't supported in redshift, is there a way to filter? I was thinking of using cast

where从句中有错误吗?如果在红移中不支持该子句,是否有办法过滤?我在考虑使用石膏

cast(left(field1,4) as integer) 

and then passing over the row if it generated an error, but not sure how to do this in amazon redshift. or is there some other proxy for the isnumeric filter.

如果它产生了一个错误,然后通过这个行,但不知道如何在amazon redshift中这样做。或者isnumeric过滤器是否有其他代理。

thanks

谢谢

6 个解决方案

#1


7  

Although long time has passed since this question was asked I have not found an adequate response. So I feel obliged to share my solution which works fine on my Redshift cluster today (March 2016).

虽然问这个问题已经有很长时间了,但我没有找到一个适当的回答。因此,我觉得有必要在今天(2016年3月)的红移集群上分享我的解决方案。

The UDF function is:

UDF函数是:

create or replace function isnumeric (aval VARCHAR(20000))
  returns bool
IMMUTABLE 
as $$
    try:
       x = int(aval);
    except:
       return (1==2);
    else:
       return (1==1);
$$ language plpythonu;

Usage would be:

用法是:

select isnumeric(mycolumn), * from mytable
    where isnumeric(mycolumn)=false

#2


5  

Try something like:

尝试:

where field1 ~ '^[0-9]{4}'

It will match any string, that starts with 4 digits.

它将匹配任何以4位数开头的字符串。

#3


3  

It seems that redshift doesn't support any of the following:

似乎红移不支持以下任何一种:

where left(field1,4) like '[0-9][0-9][0-9][0-9]' 
where left(field1,4) ~ '^[0-9]{4}'
where left(field1,4) like '^[0-9]{4}'

what does seem to work is:

看起来起作用的是:

where left(field1,4) between 0 and 9999

this returns all rows that start with four numeric characters.

这将返回以4个数字字符开头的所有行。

it seems that even though field1 is type string, the 'between' function interprets left(field1,4) as a single integer when the string characters are numeric (and does not give an error when they are not numeric). I'll follow up if I find a problem. For instance I don't deal with anything less than 1000, so I assume, but am not sure, that 0001 is interpreted as 1.

尽管field1是字符串类型,但在字符串字符为数值时,'between'函数将其解释为单个整数(field1,4)(并且在非数值字符时不会出现错误)。如果我发现问题,我会继续跟进。例如,我不处理任何小于1000的数据,所以我假设(但不确定)0001被解释为1。

#4


2  

looks like what you are looking for the is the similar to function (Redshift doc)

看起来你要找的是与函数相似的东西(红移doc)

where left(field,4) similar to [0-9]{4}

#5


2  

where regexp_instr(field1,'^[0-9]{4}') = 0

will remove rows starting with 4 digits (the above regexp_instr will return 1 for the rows with field1 starting with 4 digits)

将删除以4位数字开头的行(上面的regexp_instr将以4位数开始,以field1的形式返回1)

#6


1  

We have tried the following and worked for most of our scenarios:

我们已经尝试了以下方法,并为我们的大部分场景工作:

columnn ~ '^[-]{0,1}[0-9]{1,}[.]{0,1}[0-9]{0,}$'

columnn ~ " ^[-]{ 0,1 }[0 - 9]{ 1,}[]{ 0,1 }[0 - 9]{ 0 } $”

This will positive, negative, integer and float numbers.

这将是正数、负数、整数和浮点数。

#1


7  

Although long time has passed since this question was asked I have not found an adequate response. So I feel obliged to share my solution which works fine on my Redshift cluster today (March 2016).

虽然问这个问题已经有很长时间了,但我没有找到一个适当的回答。因此,我觉得有必要在今天(2016年3月)的红移集群上分享我的解决方案。

The UDF function is:

UDF函数是:

create or replace function isnumeric (aval VARCHAR(20000))
  returns bool
IMMUTABLE 
as $$
    try:
       x = int(aval);
    except:
       return (1==2);
    else:
       return (1==1);
$$ language plpythonu;

Usage would be:

用法是:

select isnumeric(mycolumn), * from mytable
    where isnumeric(mycolumn)=false

#2


5  

Try something like:

尝试:

where field1 ~ '^[0-9]{4}'

It will match any string, that starts with 4 digits.

它将匹配任何以4位数开头的字符串。

#3


3  

It seems that redshift doesn't support any of the following:

似乎红移不支持以下任何一种:

where left(field1,4) like '[0-9][0-9][0-9][0-9]' 
where left(field1,4) ~ '^[0-9]{4}'
where left(field1,4) like '^[0-9]{4}'

what does seem to work is:

看起来起作用的是:

where left(field1,4) between 0 and 9999

this returns all rows that start with four numeric characters.

这将返回以4个数字字符开头的所有行。

it seems that even though field1 is type string, the 'between' function interprets left(field1,4) as a single integer when the string characters are numeric (and does not give an error when they are not numeric). I'll follow up if I find a problem. For instance I don't deal with anything less than 1000, so I assume, but am not sure, that 0001 is interpreted as 1.

尽管field1是字符串类型,但在字符串字符为数值时,'between'函数将其解释为单个整数(field1,4)(并且在非数值字符时不会出现错误)。如果我发现问题,我会继续跟进。例如,我不处理任何小于1000的数据,所以我假设(但不确定)0001被解释为1。

#4


2  

looks like what you are looking for the is the similar to function (Redshift doc)

看起来你要找的是与函数相似的东西(红移doc)

where left(field,4) similar to [0-9]{4}

#5


2  

where regexp_instr(field1,'^[0-9]{4}') = 0

will remove rows starting with 4 digits (the above regexp_instr will return 1 for the rows with field1 starting with 4 digits)

将删除以4位数字开头的行(上面的regexp_instr将以4位数开始,以field1的形式返回1)

#6


1  

We have tried the following and worked for most of our scenarios:

我们已经尝试了以下方法,并为我们的大部分场景工作:

columnn ~ '^[-]{0,1}[0-9]{1,}[.]{0,1}[0-9]{0,}$'

columnn ~ " ^[-]{ 0,1 }[0 - 9]{ 1,}[]{ 0,1 }[0 - 9]{ 0 } $”

This will positive, negative, integer and float numbers.

这将是正数、负数、整数和浮点数。