Can't recover from checkpointing to Azure blob storage with a wasbs://...
url
无法使用wasbs:// ... url从检查点恢复到Azure blob存储
Using Standalone Spark 2.0.2 in cluster mode.
在群集模式下使用Standalone Spark 2.0.2。
val ssc = StreamingContext.getOrCreate(checkpointPath, () => createSSC(), hadoopConf)
I set fs.azure
and fs.azure.account.key.$account.blob.core.windows.net
via the hadoopConf
in hadoopConf.set
and redundantly in the createSSC function via sparkSession.sparkContext.hadoopConfiguration.set
我通过hadoopConf.set中的hadoopConf设置fs.azure和fs.azure.account.key。$ account.blob.core.windows.net,并通过sparkSession.sparkContext.hadoopConfiguration.set在createSSC函数中冗余设置。
The job successfully writes checkpointing files while running and runs until I stop it.
作业在运行时成功写入检查点文件并运行直到我停止它。
When I restart it, the context created from checkpointing data doesn't have the hadoopConf info to re-access wasbs://
storage and throws an error saying it can't create container with anonymous access.
当我重新启动它时,从检查点数据创建的上下文没有hadoopConf信息来重新访问wasbs://存储并抛出一个错误,说它无法创建具有匿名访问权限的容器。
What am I missing? I've found a couple similar posts about S3 but no clear solution.
我错过了什么?我发现了一些关于S3的类似帖子,但没有明确的解决方案。
The error:
More details: this happens after restarting from checkpointing inside the kafka 0.10.1.1 connector and I've confirmed that the sparkContext.hadoopConf attached to that RDD does have the correct key.
更多细节:从kafka 0.10.1.1连接器中的checkpointing重新启动后会发生这种情况,并且我已经确认连接到该RDD的sparkContext.hadoopConf确实具有正确的密钥。
1 个解决方案
#1
0
Workaround:
Put the key in the spark core-site.xml
. I was trying to avoid this because the credentials are a deploy-time setting - I won't set these at compile time or docker image build time.
将密钥放在spark core-site.xml中。我试图避免这种情况,因为凭据是部署时设置 - 我不会在编译时或docker映像构建时设置它们。
Before my container calls spark-submit
, it now creates the /opt/spark/conf/core-site.xml
file from the template below:
在我的容器调用spark-submit之前,它现在从下面的模板创建/opt/spark/conf/core-site.xml文件:
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.azure</name>
<value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
</property>
<property>
<name>fs.azure.account.key.[CHECKPOINT_BLOB_ACCOUNT].blob.core.windows.net</name>
<value>[CHECKPOINT_BLOB_KEY]</value>
</property>
</configuration>
#1
0
Workaround:
Put the key in the spark core-site.xml
. I was trying to avoid this because the credentials are a deploy-time setting - I won't set these at compile time or docker image build time.
将密钥放在spark core-site.xml中。我试图避免这种情况,因为凭据是部署时设置 - 我不会在编译时或docker映像构建时设置它们。
Before my container calls spark-submit
, it now creates the /opt/spark/conf/core-site.xml
file from the template below:
在我的容器调用spark-submit之前,它现在从下面的模板创建/opt/spark/conf/core-site.xml文件:
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.azure</name>
<value>org.apache.hadoop.fs.azure.NativeAzureFileSystem</value>
</property>
<property>
<name>fs.azure.account.key.[CHECKPOINT_BLOB_ACCOUNT].blob.core.windows.net</name>
<value>[CHECKPOINT_BLOB_KEY]</value>
</property>
</configuration>