I have a dataset with a data that is formatted like this:
我有一个数据集,其数据格式如下:
Date | exec_time
------------+---------
Today | 99999 ms
Yesterday | 1 ms
Tomorrow | 50000 ms
Another Day | None Recorded
Last Day | ms
What I need to do is write a query to get all of the exec_time
values that are >= 60000
我需要做的是编写一个查询来获取> = 60000的所有exec_time值
The way I've tried to write it is like this:
我尝试写它的方式是这样的:
select exec_time
from myTable
where exec_time not like '%N%'
and cast(split_part(exec_time,' ', 1) as int) >= 60000
order by len(exec_time) desc, exec_time desc
limit 10
However, when I run this, I get this error:
但是,当我运行它时,我收到此错误:
ERROR: Invalid digit, Value '2', Pos 0, Type: Integer
Detail:
-----------------------------------------------
error: Invalid digit, Value '2', Pos 0, Type: Integer
code: 1207
context:
query: 2780081
location: :0
process: query0_61 [pid=0]
-----------------------------------------------
Any ideas how I can get around this?
有什么想法我可以解决这个问题吗?
2 个解决方案
#1
2
The error: WHERE
conditions are not executed in any given order.
Use a CASE
statement to avoid the exception.
错误:WHERE条件不以任何给定顺序执行。使用CASE语句来避免异常。
SELECT exec_time
FROM myTable
WHERE CASE WHEN exec_time NOT LIKE '%N%' THEN split_part(exec_time,' ', 1)::int >= 60000 ELSE FALSE END
ORDER BY length(exec_time) desc, exec_time desc
LIMIT 10;
While being at it, if 'None Recorded'
is the only case to rule out, use a faster left-anchored check:
在此期间,如果“无记录”是唯一要排除的情况,请使用更快的左锚定检查:
exec_time NOT LIKE 'N%'
If the above still errors out, check with this to find any offending rows you may have missed:
如果以上仍然出错,请检查以查找您可能错过的任何违规行:
SELECT DISTINCT exec_time
FROM myTable
WHERE exec_time NOT LIKE '%N%'
AND exec_time !~ '^\\d+ ' -- not all digits before the first space
In modern Postgres you only need a single backslash. '^\d+ '
! Seems you have to double up on backslashes in Redshift, which seems to still use the outdated Posix escape syntax for strings by default, and without explicit declaration (E'^\\d+ '
)!
在现代Postgres中,您只需要一个反斜杠。 '^ \ d +'!看起来你必须加倍Redshift中的反斜杠,默认情况下似乎仍然使用过时的Posix转义语法,没有明确的声明(E'^ \\ d +')!
Generally, it's not a good idea to mix data this way. You should have an integer
column to store execution time. Much cheaper, cleaner and faster.
通常,以这种方式混合数据不是一个好主意。您应该有一个整数列来存储执行时间。更便宜,更清洁,更快捷。
#2
1
I think the problem is the "None Recorded" value. I don't know if it is going to run the first where to exclude the first or not. Try this:
我认为问题是“无记录”值。我不知道它是否会运行第一个排除第一个或不排除第一个的位置。试试这个:
SELECT exec_time
FROM (SELECT exec_time FROM myTable WHERE exec_time NOT LIKE 'N%') as foo
WHERE cast(split_part(foo.exec_time, ' ', 1) as int) >= 60000
ORDER by length(foo.exec_time) desc, foo.exec_time desc
limit 10
#1
2
The error: WHERE
conditions are not executed in any given order.
Use a CASE
statement to avoid the exception.
错误:WHERE条件不以任何给定顺序执行。使用CASE语句来避免异常。
SELECT exec_time
FROM myTable
WHERE CASE WHEN exec_time NOT LIKE '%N%' THEN split_part(exec_time,' ', 1)::int >= 60000 ELSE FALSE END
ORDER BY length(exec_time) desc, exec_time desc
LIMIT 10;
While being at it, if 'None Recorded'
is the only case to rule out, use a faster left-anchored check:
在此期间,如果“无记录”是唯一要排除的情况,请使用更快的左锚定检查:
exec_time NOT LIKE 'N%'
If the above still errors out, check with this to find any offending rows you may have missed:
如果以上仍然出错,请检查以查找您可能错过的任何违规行:
SELECT DISTINCT exec_time
FROM myTable
WHERE exec_time NOT LIKE '%N%'
AND exec_time !~ '^\\d+ ' -- not all digits before the first space
In modern Postgres you only need a single backslash. '^\d+ '
! Seems you have to double up on backslashes in Redshift, which seems to still use the outdated Posix escape syntax for strings by default, and without explicit declaration (E'^\\d+ '
)!
在现代Postgres中,您只需要一个反斜杠。 '^ \ d +'!看起来你必须加倍Redshift中的反斜杠,默认情况下似乎仍然使用过时的Posix转义语法,没有明确的声明(E'^ \\ d +')!
Generally, it's not a good idea to mix data this way. You should have an integer
column to store execution time. Much cheaper, cleaner and faster.
通常,以这种方式混合数据不是一个好主意。您应该有一个整数列来存储执行时间。更便宜,更清洁,更快捷。
#2
1
I think the problem is the "None Recorded" value. I don't know if it is going to run the first where to exclude the first or not. Try this:
我认为问题是“无记录”值。我不知道它是否会运行第一个排除第一个或不排除第一个的位置。试试这个:
SELECT exec_time
FROM (SELECT exec_time FROM myTable WHERE exec_time NOT LIKE 'N%') as foo
WHERE cast(split_part(foo.exec_time, ' ', 1) as int) >= 60000
ORDER by length(foo.exec_time) desc, foo.exec_time desc
limit 10