Pyspark: when子句中的多个条件。

时间:2022-06-22 02:31:45

I would like to modify the cell values of a dataframe column (Age) where currently it is blank and I would only do it if another column (Survived) has the value 0 for the corresponding row where it is blank for Age. If it is 1 in the Survived column but blank in Age column then I will keep it as null.

我想修改dataframe列(Age)的单元格值,当前为空,只有当另一个列(幸存)的值为0时,我才会这样做,因为对应的行为空。如果它在幸存列中为1,但在年龄列中为空,那么我将它保留为null。

I tried to use && operator but it didn't work. Here is my code:

我试着用&& &算子,但没有用。这是我的代码:

tdata.withColumn("Age",  when((tdata.Age == "" && tdata.Survived == "0"), mean_age_0).otherwise(tdata.Age)).show()

Any suggestions how to handle that? Thanks.

有什么建议吗?谢谢。

Error Message:

错误信息:

SyntaxError: invalid syntax
  File "<ipython-input-33-3e691784411c>", line 1
    tdata.withColumn("Age",  when((tdata.Age == "" && tdata.Survived == "0"), mean_age_0).otherwise(tdata.Age)).show()
                                                    ^

2 个解决方案

#1


33  

You get SyntaxError error exception because Python has no && operator. It has and and & where the latter one is the correct choice to create boolean expressions on Column (| for a logical disjunction and ~ for logical negation).

您将获得SyntaxError异常,因为Python没有&&操作符。后者是在列上创建布尔表达式的正确选择(逻辑分离的|和逻辑否定的~)。

Condition you created is also invalid because it doesn't consider operator precedence. & in Python has a higher precedence than == so expression has to be parenthesized.

您创建的条件也无效,因为它不考虑操作符优先级。在Python中,&的优先级高于==,因此表达式必须被括号括起来。

(col("Age") == "") & (col("Survived") == "0")
## Column<b'((Age = ) AND (Survived = 0))'>

On a side note when function is equivalent to case expression not WHEN clause. Still the same rules apply. Conjunction:

当函数等价于case表达式时,而不是when子句。同样的规则也适用。连词:

df.where((col("foo") > 0) & (col("bar") < 0))

Disjunction:

分离:

df.where((col("foo") > 0) | (col("bar") < 0))

You can of course define conditions separately to avoid brackets:

当然,你可以单独定义条件以避免括号:

cond1 = col("Age") == "" 
cond2 = col("Survived") == "0"

cond1 & cond2

#2


-2  

It should be:

应该是:

$when(((tdata.Age == "" ) & (tdata.Survived == "0")), mean_age_0)

#1


33  

You get SyntaxError error exception because Python has no && operator. It has and and & where the latter one is the correct choice to create boolean expressions on Column (| for a logical disjunction and ~ for logical negation).

您将获得SyntaxError异常,因为Python没有&&操作符。后者是在列上创建布尔表达式的正确选择(逻辑分离的|和逻辑否定的~)。

Condition you created is also invalid because it doesn't consider operator precedence. & in Python has a higher precedence than == so expression has to be parenthesized.

您创建的条件也无效,因为它不考虑操作符优先级。在Python中,&的优先级高于==,因此表达式必须被括号括起来。

(col("Age") == "") & (col("Survived") == "0")
## Column<b'((Age = ) AND (Survived = 0))'>

On a side note when function is equivalent to case expression not WHEN clause. Still the same rules apply. Conjunction:

当函数等价于case表达式时,而不是when子句。同样的规则也适用。连词:

df.where((col("foo") > 0) & (col("bar") < 0))

Disjunction:

分离:

df.where((col("foo") > 0) | (col("bar") < 0))

You can of course define conditions separately to avoid brackets:

当然,你可以单独定义条件以避免括号:

cond1 = col("Age") == "" 
cond2 = col("Survived") == "0"

cond1 & cond2

#2


-2  

It should be:

应该是:

$when(((tdata.Age == "" ) & (tdata.Survived == "0")), mean_age_0)