I have been reading many articles about Spark 2.0 "whole-stage code generation". Since the technique optimize the code at compiling stage, I have several questions about that:
我一直在阅读很多关于Spark 2.0“全阶段代码生成”的文章。由于该技术在编译阶段优化了代码,因此我有几个问题:
Q1. Can Python or R take advantage of this technique? Q2. In Scala/Java, How to take advantage of this technique? Should I have to bring all the query using Spark's API, or just a string query is good enough? For example, can each of the following programs taking advantage of the "whole-stage code generation":
Q1。 Python或R可以利用这种技术吗? Q2。在Scala / Java中,如何利用这种技术?我是否应该使用Spark的API提供所有查询,或者只是字符串查询是否足够好?例如,以下每个程序都可以利用“整阶代码生成”:
case 1:
sparksession.sql("select * from a john b on a.id = b.id")
case 2:
val talbe_a = sparksession.sql("select * from a)
val table_b = sparksession.sql("select * from b)
val table_c = table_a.join(table_b, table_a(COL_ADID) === table_b(COL_ADID))
Q3. If Q2 case 1 is able to utilize "whole-stage code generation", how about we read the query string from external files, like that:
Q3。如果Q2案例1能够利用“整阶代码生成”,我们如何从外部文件中读取查询字符串,如下所示:
val query = scala.io.Source.fromFile(queryfile).mkString
sparksession.sql(query)
In the above code, the complier really doesn't know what the query string looks like, at the compiling stage, can it utilize the "whole-stage code generation" technique?
在上面的代码中,编译器实际上不知道查询字符串是什么样的,在编译阶段,它是否可以利用“整阶代码生成”技术?
1 个解决方案
#1
0
-
All languages using Spark SQL API can benefit from codegen as long as they don't use language specific extensions (Python UDF,
dapply
,gapply
in R)使用Spark SQL API的所有语言都可以从codegen中受益,只要它们不使用特定于语言的扩展(Python UDF,dapply,在R中使用gapply)
-
Both SQL and
DataFrame
APIs are supported and they way you provide the query doesn't matter. Codegen is internal process applied between user input and query execution.SQL和DataFrame API都受支持,它们提供查询的方式无关紧要。 Codegen是在用户输入和查询执行之间应用的内部过程。
#1
0
-
All languages using Spark SQL API can benefit from codegen as long as they don't use language specific extensions (Python UDF,
dapply
,gapply
in R)使用Spark SQL API的所有语言都可以从codegen中受益,只要它们不使用特定于语言的扩展(Python UDF,dapply,在R中使用gapply)
-
Both SQL and
DataFrame
APIs are supported and they way you provide the query doesn't matter. Codegen is internal process applied between user input and query execution.SQL和DataFrame API都受支持,它们提供查询的方式无关紧要。 Codegen是在用户输入和查询执行之间应用的内部过程。