使用rtrim()时spark sql中的字符串连接问题

时间:2021-12-01 23:09:18

I am facing a peculiar or unknown concatenation problem during pyspark sql query

我在pyspark sql查询期间面临一个特殊或未知的连接问题

spark.sql("select *,rtrim(IncomeCat)+' '+IncomeCatDesc as trimcat from Dim_CMIncomeCat_handled").show() 

In this query both IncomeCat and IncomeCatDesc fields hold String type value so logically i thought it would concatenate but i get resultant field null

在这个查询中,IncomeCat和IncomeCatDesc字段都保存字符串类型值,所以逻辑上我认为它会连接但我得到结果字段null

使用rtrim()时spark sql中的字符串连接问题

where the achievable result will be '14100abcd' where 14100 is IncomeCat part and abcd is IncomeCatdesc part . i have tried explicit casting as well on IncomeCat field

可实现的结果将是'14100abcd',其中14100是IncomeCat部分,abcd是IncomeCatdesc部分。我已经在IncomeCat字段上尝试过显式转换

spark.sql("select *,cast(rtrim(IncomeCat) as string)+' '+IncomeCatDesc as IncomeCatAndDesc from Dim_CMIncomeCat_handled").show() 

but i am getting same result. so am i something missing here. kindly help me to solve this thanks\ Kalyan

但我得到了同样的结果。所以我在这里失踪了。请帮助我解决这个问题\ Kalyan

1 个解决方案

#1


0  

Spark doesn't override + operator for strings and as a result query you use doesn't express concatenation. If you take a look at the basic example you'll see what is going on:

Spark不会覆盖字符串的+运算符,因此您使用的查询不表示连接。如果您看一下基本示例,您将看到发生了什么:

spark.sql("SELECT 'a' + 'b'").explain()
== Physical Plan ==
*Project [null AS (CAST(a AS DOUBLE) + CAST(b AS DOUBLE))#48]
+- Scan OneRowRelation[]

Both arguments are assumed to be numeric and in general case the result will be undefined. Of course it will work for strings that can be casted to numerics:

两个参数都假定为数字,一般情况下结果将是未定义的。当然它适用于可以转换为数字的字符串:

spark.sql("SELECT '1' + '2'").show()
+---------------------------------------+
|(CAST(1 AS DOUBLE) + CAST(2 AS DOUBLE))|
+---------------------------------------+
|                                    3.0|
+---------------------------------------+

To concatenate strings you can use concat:

要连接字符串,您可以使用concat:

spark.sql("SELECT CONCAT('a', 'b')").show()
+------------+
|concat(a, b)|
+------------+
|          ab|
+------------+

or concat_ws:

spark.sql("SELECT CONCAT_WS('*', 'a', 'b')").show()
+------------------+
|concat_ws(*, a, b)|
+------------------+
|               a*b|
+------------------+

#1


0  

Spark doesn't override + operator for strings and as a result query you use doesn't express concatenation. If you take a look at the basic example you'll see what is going on:

Spark不会覆盖字符串的+运算符,因此您使用的查询不表示连接。如果您看一下基本示例,您将看到发生了什么:

spark.sql("SELECT 'a' + 'b'").explain()
== Physical Plan ==
*Project [null AS (CAST(a AS DOUBLE) + CAST(b AS DOUBLE))#48]
+- Scan OneRowRelation[]

Both arguments are assumed to be numeric and in general case the result will be undefined. Of course it will work for strings that can be casted to numerics:

两个参数都假定为数字,一般情况下结果将是未定义的。当然它适用于可以转换为数字的字符串:

spark.sql("SELECT '1' + '2'").show()
+---------------------------------------+
|(CAST(1 AS DOUBLE) + CAST(2 AS DOUBLE))|
+---------------------------------------+
|                                    3.0|
+---------------------------------------+

To concatenate strings you can use concat:

要连接字符串,您可以使用concat:

spark.sql("SELECT CONCAT('a', 'b')").show()
+------------+
|concat(a, b)|
+------------+
|          ab|
+------------+

or concat_ws:

spark.sql("SELECT CONCAT_WS('*', 'a', 'b')").show()
+------------------+
|concat_ws(*, a, b)|
+------------------+
|               a*b|
+------------------+