I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements
, with the following body
我在Livy Server中执行一个语句,使用HTTP POST调用localhost:8998 / sessions / 0 / statements,具有以下正文
{
"code": "spark.sql(\"select * from test_table limit 10\")"
}
I would like an answer in the following format
我想以下列格式给出答案
(...)
"data": {
"application/json": "[
{"id": "123", "init_date": 1481649345, ...},
{"id": "133", "init_date": 1481649333, ...},
{"id": "155", "init_date": 1481642153, ...},
]"
}
(...)
but what I'm getting is
但我得到的是
(...)
"data": {
"text/plain": "res0: org.apache.spark.sql.DataFrame = [id: string, init_date: timestamp ... 64 more fields]"
}
(...)
Which is the toString()
version of the dataframe.
哪个是数据帧的toString()版本。
Is there some way to return a dataframe as JSON using the Livy Server?
有没有办法使用Livy Server将数据帧作为JSON返回?
EDIT
Found a JIRA issue that addresses the problem: https://issues.cloudera.org/browse/LIVY-72
找到解决问题的JIRA问题:https://issues.cloudera.org/browse/LIVY-72
By the comments one can say that Livy does not and will not support such feature?
根据评论可以说Livy不会也不会支持这样的功能?
3 个解决方案
#1
2
I don't have a lot of experience with Livy, but as far as I know this endpoint is used as an interactive shell and the output will be a string with the actual result that would be shown by a shell. So, with that in mind, I can think of a way to emulate the result you want, but It may not be the best way to do it:
我对Livy没有很多经验,但据我所知,这个端点被用作交互式shell,输出将是一个字符串,其实际结果将由shell显示。所以,考虑到这一点,我可以想出一种模仿你想要的结果的方法,但它可能不是最好的方法:
{
"code": "println(spark.sql(\"select * from test_table limit 10\").toJSON.collect.mkString(\"[\", \",\", \"]\"))"
}
Then, you will have a JSON wrapped in a string, so your client could parse it.
然后,您将在字符串中包含一个JSON,以便您的客户端可以解析它。
#2
3
I recommend using the built-in (albeit hard to find documentation for) magics %json
and %table
:
我建议使用魔法%json和%table的内置(虽然很难找到文档):
%json
JSON%
session_url = host + "/sessions/1"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val d = spark.sql("SELECT COUNT(DISTINCT food_item) FROM food_item_tbl")
val e = d.collect
%json e
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
%table
%表
session_url = host + "/sessions/21"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val x = List((1, "a", 0.12), (3, "b", 0.63))
%table x
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
Related: Apache Livy: query Spark SQL via REST: possible?
相关:Apache Livy:通过REST查询Spark SQL:可能吗?
#3
0
I think in general your best bet is to write your output to a database of some kind. If you write to a randomly named table, you could have your code read it after the script is done.
我认为一般来说,最好的办法是将输出写入某种数据库。如果您写入随机命名的表,则可以在脚本完成后将代码读取。
#1
2
I don't have a lot of experience with Livy, but as far as I know this endpoint is used as an interactive shell and the output will be a string with the actual result that would be shown by a shell. So, with that in mind, I can think of a way to emulate the result you want, but It may not be the best way to do it:
我对Livy没有很多经验,但据我所知,这个端点被用作交互式shell,输出将是一个字符串,其实际结果将由shell显示。所以,考虑到这一点,我可以想出一种模仿你想要的结果的方法,但它可能不是最好的方法:
{
"code": "println(spark.sql(\"select * from test_table limit 10\").toJSON.collect.mkString(\"[\", \",\", \"]\"))"
}
Then, you will have a JSON wrapped in a string, so your client could parse it.
然后,您将在字符串中包含一个JSON,以便您的客户端可以解析它。
#2
3
I recommend using the built-in (albeit hard to find documentation for) magics %json
and %table
:
我建议使用魔法%json和%table的内置(虽然很难找到文档):
%json
JSON%
session_url = host + "/sessions/1"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val d = spark.sql("SELECT COUNT(DISTINCT food_item) FROM food_item_tbl")
val e = d.collect
%json e
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
%table
%表
session_url = host + "/sessions/21"
statements_url = session_url + '/statements'
data = {
'code': textwrap.dedent("""\
val x = List((1, "a", 0.12), (3, "b", 0.63))
%table x
""")}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
Related: Apache Livy: query Spark SQL via REST: possible?
相关:Apache Livy:通过REST查询Spark SQL:可能吗?
#3
0
I think in general your best bet is to write your output to a database of some kind. If you write to a randomly named table, you could have your code read it after the script is done.
我认为一般来说,最好的办法是将输出写入某种数据库。如果您写入随机命名的表,则可以在脚本完成后将代码读取。