I am getting some strange errors that are difficult to debug. I am running a simple UDF JavaScript mapper which maps the JSON data and imports it into BigQuery. I've run other UDF functions previously and never encountered such errors.
我收到一些难以调试的奇怪错误。我正在运行一个简单的UDF JavaScript映射器,它映射JSON数据并将其导入BigQuery。我之前运行过其他UDF函数,从未遇到过这样的错误。
Is there any way to debug (with the actual debugger or at least with console.log or similar) the Dataflow templates UDF errors?
有没有办法调试(使用实际的调试器或至少使用console.log或类似的)数据流模板UDF错误?
The error in question: exception: "java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException: java.lang.RuntimeException: org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1] at com.google.cloud.dataflow.worker.GroupAlsoByWindowsParDoFn$1.output(GroupAlsoByWindowsParDoFn.java:183) at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner$1.outputWindowedValue(GroupAlsoByWindowFnRunner.java:101) at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:54) at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:37) at com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:114) ...
例外:有问题的错误“了java.lang.RuntimeException:org.apache.beam.sdk.util.UserCodeException:了java.lang.RuntimeException:了java.lang.RuntimeException:org.json.JSONException:一个JSONObject文字必须以'{' 以1字符2线1]在com.google.cloud.dataflow.worker.GroupAlsoByWindowsParDoFn $ 1.输出(GroupAlsoByWindowsParDoFn.java:183)在com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner $ 1.outputWindowedValue(GroupAlsoByWindowFnRunner .java:101)com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:54)at com.google.cloud.dataflow.worker.util.BatchGroupAlsoByWindowReshuffleFn.processElement(BatchGroupAlsoByWindowReshuffleFn.java:37 )在com.google.cloud.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:114)...
It's very difficult to say what this error is about: is this input data that is mis-formatted or output JSON from the UDF?
很难说出这个错误的含义:这个输入数据是否格式错误或从UDF输出JSON?
I've tried everything so far:
到目前为止我已经尝试了一切:
- Unit tested the UDF locally with a sample data
- 单元使用样本数据在本地测试UDF
- Run the integration tests with the exact same file I try to analyse in the real environment
- 使用我尝试在真实环境中分析的完全相同的文件运行集成测试
- Used an empty JSON on the input (with empty object
{}
) - 在输入上使用空JSON(使用空对象{})
- Used a UDF function that returns an empty JSON object
- 使用返回空JSON对象的UDF函数
Any tips on debugging Dataflow UDF Javascript would be highly appreciated.
有关调试Dataflow UDF Javascript的任何提示都将受到高度赞赏。
Is the source code of these Java classes available anywhere online?
这些Java类的源代码是否可以在线访问?
1 个解决方案
#1
1
In this case the culprit turned out to be the BigQuery Schema, which needs to be wrapped into the JSON object:
在这种情况下,罪魁祸首证明是BigQuery Schema,需要将其包装到JSON对象中:
{
"BigQuery Schema": [
... schema goes here
]
}
The following code could be useful for debugging: TextIOToBigQuery.java
以下代码可用于调试:TextIOToBigQuery.java
See the repo: https://github.com/GoogleCloudPlatform/DataflowTemplates
请参阅repo:https://github.com/GoogleCloudPlatform/DataflowTemplates
#1
1
In this case the culprit turned out to be the BigQuery Schema, which needs to be wrapped into the JSON object:
在这种情况下,罪魁祸首证明是BigQuery Schema,需要将其包装到JSON对象中:
{
"BigQuery Schema": [
... schema goes here
]
}
The following code could be useful for debugging: TextIOToBigQuery.java
以下代码可用于调试:TextIOToBigQuery.java
See the repo: https://github.com/GoogleCloudPlatform/DataflowTemplates
请参阅repo:https://github.com/GoogleCloudPlatform/DataflowTemplates