Currently have some confusion on the credentials/configuration used by dataflow...
目前对数据流使用的凭据/配置有一些混淆......
From my experimentation, it seems that dataflow is always using the default configuration instead of the active configuration. Is that correct? (For example in my gcloud config
if I have a default configuration with project A while my active configuration is on project B, it seems that my dataflow job will submit to project A always. Also in this way it seems that the dataflow job is ignoring what is set in options.setProject()
, so sort of wondering when is dataflow using options.getProject()
again...?)
从我的实验来看,似乎数据流始终使用默认配置而不是活动配置。那是对的吗? (例如,在我的gcloud配置中,如果我的项目A有一个默认配置,而我的活动配置在项目B上,那么我的数据流作业似乎总是会提交给项目A.同样这样,数据流作业似乎忽略了在options.setProject()中设置了什么,所以想知道什么时候数据流再次使用options.getProject()...?)
And also wondering is there any way that I submit dataflow job with customized configuration, say I want to submit multiple jobs to different projects with different credentials on the same run(without manually changing my gcloud config
)?
并且还想知道是否有任何方式我使用自定义配置提交数据流作业,比如说我想在同一次运行中向不同的项目提交多个作业(不需要手动更改我的gcloud配置)?
btw I am running the dataflow job on dataflow services cloud platform but submit the job from non-gce Cloudservices Account if it will make a difference.
顺便说一下,我在数据流服务云平台上运行数据流作业,但如果它会产生影响,则从非gce Cloudservices帐户提交作业。
2 个解决方案
#1
4
Google Cloud Dataflow by default uses the application default credentials library to get the credentials if they are not specified. The library currently only supports getting the credentials using the gcloud
default configuration. Similarly, for the project, Google Cloud Dataflow uses the gcloud
default configuration.
默认情况下,Google Cloud Dataflow会使用应用程序默认凭据库来获取未指定的凭据。该库目前仅支持使用gcloud默认配置获取凭据。同样,对于项目,Google Cloud Dataflow使用gcloud默认配置。
To be able to run jobs with a different project, one can manually specify on the command-line (for example --project=myProject
, if using PipelineOptionsFactory.fromArgs) or set the option explicitly utilizing GcpOptions.setProject.
为了能够使用不同的项目运行作业,可以在命令行上手动指定(例如--project = myProject,如果使用PipelineOptionsFactory.fromArgs)或使用GcpOptions.setProject显式设置选项。
To be able to run jobs with different credentials, one can construct a credentials object and can explicitly set it utilizing GcpOptions.setGcpCredential or one can rely on using the ways that the application default credentials library supports generating the credentials object automatically which Google Cloud Dataflow is tied into. One example would be to use the environment variable GOOGLE_APPLICATION_CREDENTIALS
as explained here.
为了能够运行具有不同凭据的作业,可以构建凭证对象,并且可以使用GcpOptions.setGcpCredential显式设置它,或者可以依赖于使用应用程序默认凭证库支持自动生成凭证对象的方式,Google Cloud Dataflow是并入。一个示例是使用环境变量GOOGLE_APPLICATION_CREDENTIALS,如此处所述。
#2
0
The code I used to have Dataflow populate its workers with the service account we wanted (in addition to Lukas answer above):
我以前的代码让Dataflow使用我们想要的服务帐户填充其工作人员(除了上面的Lukas回答):
final List<String> SCOPES = Arrays.asList(
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/devstorage.full_control",
"https://www.googleapis.com/auth/userinfo.email",
"https://www.googleapis.com/auth/datastore",
"https://www.googleapis.com/auth/pubsub");
options.setGcpCredential(ServiceAccountCredentials.fromStream(new FileInputStream("key.json")).createScoped(SCOPES));
options.setServiceAccount("xxx@yyy.iam.gserviceaccount.com");
#1
4
Google Cloud Dataflow by default uses the application default credentials library to get the credentials if they are not specified. The library currently only supports getting the credentials using the gcloud
default configuration. Similarly, for the project, Google Cloud Dataflow uses the gcloud
default configuration.
默认情况下,Google Cloud Dataflow会使用应用程序默认凭据库来获取未指定的凭据。该库目前仅支持使用gcloud默认配置获取凭据。同样,对于项目,Google Cloud Dataflow使用gcloud默认配置。
To be able to run jobs with a different project, one can manually specify on the command-line (for example --project=myProject
, if using PipelineOptionsFactory.fromArgs) or set the option explicitly utilizing GcpOptions.setProject.
为了能够使用不同的项目运行作业,可以在命令行上手动指定(例如--project = myProject,如果使用PipelineOptionsFactory.fromArgs)或使用GcpOptions.setProject显式设置选项。
To be able to run jobs with different credentials, one can construct a credentials object and can explicitly set it utilizing GcpOptions.setGcpCredential or one can rely on using the ways that the application default credentials library supports generating the credentials object automatically which Google Cloud Dataflow is tied into. One example would be to use the environment variable GOOGLE_APPLICATION_CREDENTIALS
as explained here.
为了能够运行具有不同凭据的作业,可以构建凭证对象,并且可以使用GcpOptions.setGcpCredential显式设置它,或者可以依赖于使用应用程序默认凭证库支持自动生成凭证对象的方式,Google Cloud Dataflow是并入。一个示例是使用环境变量GOOGLE_APPLICATION_CREDENTIALS,如此处所述。
#2
0
The code I used to have Dataflow populate its workers with the service account we wanted (in addition to Lukas answer above):
我以前的代码让Dataflow使用我们想要的服务帐户填充其工作人员(除了上面的Lukas回答):
final List<String> SCOPES = Arrays.asList(
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/devstorage.full_control",
"https://www.googleapis.com/auth/userinfo.email",
"https://www.googleapis.com/auth/datastore",
"https://www.googleapis.com/auth/pubsub");
options.setGcpCredential(ServiceAccountCredentials.fromStream(new FileInputStream("key.json")).createScoped(SCOPES));
options.setServiceAccount("xxx@yyy.iam.gserviceaccount.com");