I'm running a simple DataFlow pipeline w/ the Python SDK for counting keywords. The job runs fine for pre-processing the input data, but it fails for grouping/output steps with the following error.
我正在运行一个带有Python SDK的简单DataFlow管道来计算关键字。该作业可以正常运行以预处理输入数据,但是对于具有以下错误的分组/输出步骤失败。
I guess the logs says the worker is having an issue accessing the temp folder, but the storage bucket in our project exists with proper permissions. What could be a possible issue for this?
我想日志说工人在访问临时文件夹时遇到问题,但我们项目中的存储桶存在适当的权限。这可能是一个什么问题?
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcsio.py", line
606, in write raise self.upload_thread.last_error # pylint:
disable=raising-bad-type HttpError: HttpError accessing
<https://www.googleapis.com/resumable/upload/storage/v1/b/[PROJECT-NAME-REDACTED]-temp-2016-08-07_04-42-52/o?uploadType=resumable&alt=json&name=0015bf8d-fa87-4c9a-82d6-8ffcd742d770>:
response: <{'status': '404', 'alternate-protocol': '443:quic',
'content-length': '165', 'vary': 'Origin, X-Origin', 'server':
'UploadServer', 'x-guploader-uploadid':
'AEnB2UoYRPUwhz-OXlJ437k0J8Uxd1lJvTsFbfVJF_YMP2GQEvmdDpo7e-3DVhuqNd9b1A_RFPbfIcK6hCsFcar-hdI94rqJZUvATcDmGRRIvHecAt5CTrg',
'date': 'Sun, 07 Aug 2016 04:43:23 GMT', 'alt-svc': 'quic=":443";
ma=2592000; v="36,35,34,33,32,31,30"', 'content-type':
'application/json; charset=UTF-8'}>, content <{ "error": { "errors": [
{ "domain": "global", "reason": "notFound", "message": "Not Found" }
], "code": 404, "message": "Not Found" } } >
1 个解决方案
#1
0
This is https://issues.apache.org/jira/browse/BEAM-539, which doesn't allow root buckets as outputs for TextFileSink. As a workaround, please use a subdirectory path (e.g. gs://foo/bar) as output locations.
这是https://issues.apache.org/jira/browse/BEAM-539,它不允许根存储桶作为TextFileSink的输出。作为解决方法,请使用子目录路径(例如gs:// foo / bar)作为输出位置。
#1
0
This is https://issues.apache.org/jira/browse/BEAM-539, which doesn't allow root buckets as outputs for TextFileSink. As a workaround, please use a subdirectory path (e.g. gs://foo/bar) as output locations.
这是https://issues.apache.org/jira/browse/BEAM-539,它不允许根存储桶作为TextFileSink的输出。作为解决方法,请使用子目录路径(例如gs:// foo / bar)作为输出位置。