I am facing a peculiar or unknown concatenation problem during pyspark sql query
我在pyspark sql查询期间面临一个特殊或未知的连接问题
spark.sql("select *,rtrim(IncomeCat)+' '+IncomeCatDesc as trimcat from Dim_CMIncomeCat_handled").show()
In this query both IncomeCat and IncomeCatDesc fields hold String type value so logically i thought it would concatenate but i get resultant field null
在这个查询中,IncomeCat和IncomeCatDesc字段都保存字符串类型值,所以逻辑上我认为它会连接但我得到结果字段null
where the achievable result will be '14100abcd' where 14100 is IncomeCat part and abcd is IncomeCatdesc part . i have tried explicit casting as well on IncomeCat field
可实现的结果将是'14100abcd',其中14100是IncomeCat部分,abcd是IncomeCatdesc部分。我已经在IncomeCat字段上尝试过显式转换
spark.sql("select *,cast(rtrim(IncomeCat) as string)+' '+IncomeCatDesc as IncomeCatAndDesc from Dim_CMIncomeCat_handled").show()
but i am getting same result. so am i something missing here. kindly help me to solve this thanks\ Kalyan
但我得到了同样的结果。所以我在这里失踪了。请帮助我解决这个问题\ Kalyan
1 个解决方案
#1
0
Spark doesn't override +
operator for strings and as a result query you use doesn't express concatenation. If you take a look at the basic example you'll see what is going on:
Spark不会覆盖字符串的+运算符,因此您使用的查询不表示连接。如果您看一下基本示例,您将看到发生了什么:
spark.sql("SELECT 'a' + 'b'").explain()
== Physical Plan ==
*Project [null AS (CAST(a AS DOUBLE) + CAST(b AS DOUBLE))#48]
+- Scan OneRowRelation[]
Both arguments are assumed to be numeric and in general case the result will be undefined. Of course it will work for strings that can be casted to numerics:
两个参数都假定为数字,一般情况下结果将是未定义的。当然它适用于可以转换为数字的字符串:
spark.sql("SELECT '1' + '2'").show()
+---------------------------------------+
|(CAST(1 AS DOUBLE) + CAST(2 AS DOUBLE))|
+---------------------------------------+
| 3.0|
+---------------------------------------+
To concatenate strings you can use concat
:
要连接字符串,您可以使用concat:
spark.sql("SELECT CONCAT('a', 'b')").show()
+------------+
|concat(a, b)|
+------------+
| ab|
+------------+
or concat_ws
:
spark.sql("SELECT CONCAT_WS('*', 'a', 'b')").show()
+------------------+
|concat_ws(*, a, b)|
+------------------+
| a*b|
+------------------+
#1
0
Spark doesn't override +
operator for strings and as a result query you use doesn't express concatenation. If you take a look at the basic example you'll see what is going on:
Spark不会覆盖字符串的+运算符,因此您使用的查询不表示连接。如果您看一下基本示例,您将看到发生了什么:
spark.sql("SELECT 'a' + 'b'").explain()
== Physical Plan ==
*Project [null AS (CAST(a AS DOUBLE) + CAST(b AS DOUBLE))#48]
+- Scan OneRowRelation[]
Both arguments are assumed to be numeric and in general case the result will be undefined. Of course it will work for strings that can be casted to numerics:
两个参数都假定为数字,一般情况下结果将是未定义的。当然它适用于可以转换为数字的字符串:
spark.sql("SELECT '1' + '2'").show()
+---------------------------------------+
|(CAST(1 AS DOUBLE) + CAST(2 AS DOUBLE))|
+---------------------------------------+
| 3.0|
+---------------------------------------+
To concatenate strings you can use concat
:
要连接字符串,您可以使用concat:
spark.sql("SELECT CONCAT('a', 'b')").show()
+------------+
|concat(a, b)|
+------------+
| ab|
+------------+
or concat_ws
:
spark.sql("SELECT CONCAT_WS('*', 'a', 'b')").show()
+------------------+
|concat_ws(*, a, b)|
+------------------+
| a*b|
+------------------+