在SQL Server中组合dbplyr和case_when

时间:2021-05-20 09:50:16

I am using dbplyr to write and run queries in SQL Server, and want to apply a conditioned mutate. This can be done using ifelse or using case_when. The query works when using ifelse but throws and exception when using case_when.

我正在使用dbplyr在SQL Server中编写和运行查询,并希望应用条件变异。这可以使用ifelse或使用case_when来完成。使用ifelse时查询有效,但在使用case_when时抛出异常。

The issue appears to be the SQL syntax that the two commands get translated into. The case_when syntax does not appear to be valid SQL. Can you tell me why and how to fix it? Or is this a bug?

问题似乎是两个命令被转换为的SQL语法。 case_when语法似乎不是有效的SQL。你能告诉我为什么以及如何解决它?或者这是一个错误?

# libraries
library(DBI)
library(dplyr)
library(dbplyr)

# establish connection to database table
connection_string = "database.specific.string"
# mine looks something like "DRIVER=...; Trusted_Connection=...; DATABASE=...' SERVER=..."
db_connection = dbConnect(odbc::odbc(), .connection_string = connection_string)
my_table = tbl(db_connection, from = my_table_name)

# attempted query
tmp = my_table %>%
    mutate(new_col = case_when(col1 == col2 ~ "a",
                               col1 != col2 ~ "b"))

# check SQL code for query
show_query(tmp)

The resulting SQL query is:

生成的SQL查询是:

SELECT 
    col1, col2,
    CASE
       WHEN CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) THEN ('a')
       WHEN CONVERT(BIT, IIF(col1 <> col2, 1.0, 0.0))) THEN ('b')
    END AS new_col
FROM my_database.my_table_name

Running this code throws an error

运行此代码会引发错误

An expression of non-boolean type specified in a context where a condition is expected, near 'THEN'

在预期条件的上下文中指定的非布尔类型的表达式,接近'THEN'

However the ifelse query works as expected:

但ifelse查询按预期工作:

# attempted query
tmp = my_table %>%
    mutate(new_col = ifelse(col1 == col2, "a", "b"))

# check SQL code for query
show_query(tmp)

The resulting SQL query is:

生成的SQL查询是:

SELECT 
    col1, col2,
    CASE
       WHEN (CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) = TRUE) THEN ('a')
       WHEN (CONVERT(BIT, IIF(col1 = col2, 1.0, 0.0))) = FALSE) THEN ('b')
    END AS new_col
FROM my_database.my_table_name

Note that in both cases the SQL syntax have been produced using show_query. Using translate_sql to produce the SQL code consistently produced much cleaner SQL syntax, but this is not the syntax that gets run on the server.

请注意,在这两种情况下,都使用show_query生成SQL语法。使用translate_sql生成SQL代码始终产生更清晰的SQL语法,但这不是在服务器上运行的语法。

Does anyone else get these SQL queries? Any suggestions as to what is wrong and how to fix this?

有没有其他人得到这些SQL查询?有关错误以及如何解决此问题的任何建议?

Updated

Posted as an issue on the tidyverse and was informed that a solution has already been developed for case_when(..., TRUE ~ "b") being translated to ELSE 'b' (here).

作为tidyverse上的一个问题发布,并被告知已经为case_when(...,TRUE~“b”)转换为ELSE'b'(此处)开发了一个解决方案。

However, as this does not address the syntax that is causing this exception. Question edited to focus on syntax that is the cause.

但是,这不会解决导致此异常的语法。编辑问题以关注导致原因的语法。

Update 2

Posted as an issue on dbplyr. A response from Christophe Dervieux (cderv) identifies that the cause appears to be that SQL server needs a special translation for case_when like it has for ifelse.

在dbplyr上发布为问题。 Christophe Dervieux(cderv)的回应表明原因似乎是SQL服务器需要case_when的特殊翻译,就像它对ifelse一样。

In the mean time, users can use multiple ifelse or if_else statements.

同时,用户可以使用多个ifelse或if_else语句。

2 个解决方案

#1


0  

Is it just that you have got your dplyr syntax slightly wrong?

只是你的dplyr语法有点错误吗?

Try this

# attempted query
tmp = my_table %>%
    mutate(new_col = case_when(col1 == col2 ~ "a",
                               col1 == 'TRUE' ~ "b"
              # alternatively  col1 == 1 ~ "b"
)) 

#2


0  

See Update 2 above:

见上面的更新2:

This appears to be an issue effecting case_when for SQL server.

这似乎是影响SQL Server的case_when的问题。

The work-around at present is to use multiple ifelse or if_else statements:

目前的解决方法是使用多个ifelse或if_else语句:

data %>%
    mutate(new_col = ifelse(condition1, val1, NA)) %>%
    mutate(new_col = ifelse(is.na(new_col) & condition2, val2, new_col)) %>%
    mutate(new_col = ifelse(is.na(new_col) & condition3, val3, new_col))
    # etc

#1


0  

Is it just that you have got your dplyr syntax slightly wrong?

只是你的dplyr语法有点错误吗?

Try this

# attempted query
tmp = my_table %>%
    mutate(new_col = case_when(col1 == col2 ~ "a",
                               col1 == 'TRUE' ~ "b"
              # alternatively  col1 == 1 ~ "b"
)) 

#2


0  

See Update 2 above:

见上面的更新2:

This appears to be an issue effecting case_when for SQL server.

这似乎是影响SQL Server的case_when的问题。

The work-around at present is to use multiple ifelse or if_else statements:

目前的解决方法是使用多个ifelse或if_else语句:

data %>%
    mutate(new_col = ifelse(condition1, val1, NA)) %>%
    mutate(new_col = ifelse(is.na(new_col) & condition2, val2, new_col)) %>%
    mutate(new_col = ifelse(is.na(new_col) & condition3, val3, new_col))
    # etc