I have tried the example code of SortValues transform using DirectRunner
on local machine (Windows)
我在本地机器上尝试使用DirectRunner进行SortValues转换的示例代码(Windows)
PCollection<KV<String, KV<String, Integer>>> input = ...
PCollection<KV<String, Iterable<KV<String, Integer>>>> grouped =
input.apply(GroupByKey.<String, KV<String, Integer>>create());
PCollection<KV<String, Iterable<KV<String, Integer>>>> groupedAndSorted =
grouped.apply(SortValues.<String, String, Integer>create(BufferedExternalSorter.options()));
but I got the error PipelineExecutionException: java.lang.NoClassDefFoundError: org/apache/hadoop/io/Writable
. Does this mean this transform function only works in Hadoop environment?
但我收到错误PipelineExecutionException:java.lang.NoClassDefFoundError:org / apache / hadoop / io / Writable。这是否意味着此转换功能仅适用于Hadoop环境?
1 个解决方案
#1
1
As of today, if you use Beam with release version below 2.0.0, you will have to add two hadoop dependencies in your maven pom file for this SortValues module to work.
截至今天,如果您使用版本低于2.0.0的版本的Beam,则必须在maven pom文件中添加两个hadoop依赖项才能使此SortValues模块正常工作。
- add
hadoop-common
version 2.7.3 or later - 添加hadoop-common版本2.7.3或更高版本
- add
hadoop-mapreduce-client-core
version 2.7.3 or later. - 添加hadoop-mapreduce-client-core版本2.7.3或更高版本。
Otherwise, you will just need to use Beam with release version >= 2.0.0.
否则,您只需要使用发布版本> = 2.0.0的Beam。
#1
1
As of today, if you use Beam with release version below 2.0.0, you will have to add two hadoop dependencies in your maven pom file for this SortValues module to work.
截至今天,如果您使用版本低于2.0.0的版本的Beam,则必须在maven pom文件中添加两个hadoop依赖项才能使此SortValues模块正常工作。
- add
hadoop-common
version 2.7.3 or later - 添加hadoop-common版本2.7.3或更高版本
- add
hadoop-mapreduce-client-core
version 2.7.3 or later. - 添加hadoop-mapreduce-client-core版本2.7.3或更高版本。
Otherwise, you will just need to use Beam with release version >= 2.0.0.
否则,您只需要使用发布版本> = 2.0.0的Beam。