What steps will reproduce the problem?
哪些步骤将重现该问题?
Add a DatastoreIO read operation for a kind within a specific namespace. Size estimation and subsequent split algorithm using datastore system stats table is failing.
为特定命名空间中的某种类添加DatastoreIO读取操作。使用数据存储系统统计表的大小估计和后续拆分算法失败。
What is the expected output? What do you see instead?
什么是预期的输出?你看到了什么呢?
DatastoreIO.queryLatestStatisticsTimestamp us using the system table "Stat_Total" to retrieve the timestamp for the latest stats run. The method is leveraging a common method DatastoreIO.makeRequest, which is applying the namespace of the kind being read. Because "Stat_Total" resides in the default namespace, nothing is returned resulting in an error that stats cannot be read for the kind. DatastoreIO falls back to using the number of workers to split the query, which is not ideal in our case. This appears to be a defect. The namespace should not be applied to the query to retrieve stats.
DatastoreIO.queryLatestStatisticsTimestamp我们使用系统表“Stat_Total”来检索最新统计数据运行的时间戳。该方法利用了一个常见的方法DatastoreIO.makeRequest,它正在应用正在读取的类型的命名空间。由于“Stat_Total”驻留在默认命名空间中,因此不会返回任何内容,从而导致无法读取该类型的统计信息的错误。 DatastoreIO回退到使用工作人员来分割查询,这在我们的案例中并不理想。这似乎是一个缺陷。不应将命名空间应用于查询以检索统计信息。
What version of the product are you using? On what operating system? Version 1.6 / Default GCE Dataflow Service VMs
您使用的是哪个版本的产品?在什么操作系统?版本1.6 /默认GCE数据流服务VM
1 个解决方案
#1
2
Thanks for reporting the problem. You are right, we need to use "Stat_Ns_Total" when namespace is provided. I will submit a fix and it should be available in the next release (1.7.0)
谢谢你报告这个问题。你是对的,我们需要在提供命名空间时使用“Stat_Ns_Total”。我将提交修复程序,它应该在下一个版本(1.7.0)中提供
Update: This has been fixed and released in 1.7.0
更新:已修复并在1.7.0中发布
#1
2
Thanks for reporting the problem. You are right, we need to use "Stat_Ns_Total" when namespace is provided. I will submit a fix and it should be available in the next release (1.7.0)
谢谢你报告这个问题。你是对的,我们需要在提供命名空间时使用“Stat_Ns_Total”。我将提交修复程序,它应该在下一个版本(1.7.0)中提供
Update: This has been fixed and released in 1.7.0
更新:已修复并在1.7.0中发布