在Google Cloud Dataflow中使用带有复杂PCollection类型的TextIO.Write

时间:2021-04-23 15:37:47

I have a PCollection that looks like this:

我有一个看起来像这样的PCollection:

PCollection<KV<KV<String, EventSession>, Long>> windowed_counts

My goal is to write this out as a text file. I thought to use something like:

我的目标是将其写为文本文件。我想过用的东西:

windowed_counts.apply( TextIO.Write.to( "output" ));

but am having a hard time getting the Coders setup correctly. This is what I thought would work:

但我很难正确设置Coders。这是我认为会起作用的:

    KvCoder kvcoder = KvCoder.of(KvCoder.of(StringUtf8Coder.of(), AvroDeterministicCoder.of(EventSession.class) ), TextualLongCoder.of());
    TextIO.Write.Bound io = TextIO.Write.withCoder( kvcoder );
    windowed_counts.apply( io.to( "output" ));

where TextualLongCoder is my own subclass of AtomicCoder, analogous to TextualIntegerCoder. The EventSession class is annotated to use AvroDeterministicCoder as it's default coder.

其中TextualLongCoder是我自己的AtomicCoder的子类,类似于TextualIntegerCoder。 EventSession类注释为使用AvroDeterministicCoder作为其默认编码器。

But with this I get garbled output that includes non-textual character, etc. Can anybody advice on how you would write this particular PCollection out as text? I'm sure there's something obvious I'm missing here...

但是有了这个,我得到了包含非文本字符等的乱码输出。有人可以建议你如何将这个特定的PCollection写成文本吗?我确信这里有一些显而易见的东西......

1 个解决方案

#1


4  

Did you try creating a transform that will convert a PCollection of KV<KV<String, EventSession>, Long> to a PCollection of Strings and then writing it into a text file?

您是否尝试创建一个转换,将Kol ,Long>的PCollection转换为PCollection of Strings,然后将其写入文本文件?

I found it to be most flexible way for my needs

我发现它是满足我需求的最灵活方式

#1


4  

Did you try creating a transform that will convert a PCollection of KV<KV<String, EventSession>, Long> to a PCollection of Strings and then writing it into a text file?

您是否尝试创建一个转换,将Kol ,Long>的PCollection转换为PCollection of Strings,然后将其写入文本文件?

I found it to be most flexible way for my needs

我发现它是满足我需求的最灵活方式