I need to read in an avro file from local or gcs, via java. I followed the example from docs from https://beam.apache.org/documentation/sdks/javadoc/2.0.0/index.html?org/apache/beam/sdk/io/AvroIO.html
我需要通过java读取本地或gcs的avro文件。我按照https://beam.apache.org/documentation/sdks/javadoc/2.0.0/index.html?org/apache/beam/sdk/io/AvroIO.html中的文档示例进行操作
Pipeline p = ...;
// A Read from a GCS file (runs locally and using remote execution):
Schema schema = new Schema.Parser().parse(new File("schema.avsc"));
PCollection<GenericRecord> records =
p.apply(AvroIO.readGenericRecords(schema)
.from("gs://my_bucket/path/to/records-*.avro"));
But when I try to process it through a DoFn there doesnt appear to be any data there. The avro file does have data and was able to run a function to generate a schema from it. If anybody has advice please share.
但是当我尝试通过DoFn处理它时,那里似乎没有任何数据。 avro文件确实有数据,并且能够运行一个函数来从中生成模式。如果有人有建议请分享。
2 个解决方案
#1
1
I absolutely agree with Andrew, more information would be required. However, I think you should consider using AvroIO.Read which is a more appropriate transform to read records from one or more Avro files.
我绝对同意安德鲁,需要更多信息。但是,我认为你应该考虑使用AvroIO.Read,这是一个更合适的转换来读取一个或多个Avro文件中的记录。
https://cloud.google.com/dataflow/model/avro-io#reading-with-avroio
https://cloud.google.com/dataflow/model/avro-io#reading-with-avroio
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
Schema schema = new Schema.Parser().parse(new File("schema.avsc"));
PCollection<GenericRecord> records =
p.apply(AvroIO.Read.named("ReadFromAvro")
.from("gs://my_bucket/path/records-*.avro")
.withSchema(schema));
#2
0
Hey guys thanks for looking into this. I can't share any code because they belong to clients. I did not receive any error messages, and the debugger did see data, but we were not able to see the data in the avro file (via pardo).
嘿伙计们感谢您对此进行调查。我无法共享任何代码,因为它们属于客户端。我没有收到任何错误消息,调试器确实看到了数据,但是我们无法在avro文件中看到数据(通过pardo)。
I did manage to fix the issue by recreating the dataflow project using the Eclipse wizard. I even used the same code. I wonder why I did not receive any error messages.
我确实通过使用Eclipse向导重新创建数据流项目来解决问题。我甚至使用相同的代码。我想知道为什么我没有收到任何错误消息。
#1
1
I absolutely agree with Andrew, more information would be required. However, I think you should consider using AvroIO.Read which is a more appropriate transform to read records from one or more Avro files.
我绝对同意安德鲁,需要更多信息。但是,我认为你应该考虑使用AvroIO.Read,这是一个更合适的转换来读取一个或多个Avro文件中的记录。
https://cloud.google.com/dataflow/model/avro-io#reading-with-avroio
https://cloud.google.com/dataflow/model/avro-io#reading-with-avroio
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline p = Pipeline.create(options);
Schema schema = new Schema.Parser().parse(new File("schema.avsc"));
PCollection<GenericRecord> records =
p.apply(AvroIO.Read.named("ReadFromAvro")
.from("gs://my_bucket/path/records-*.avro")
.withSchema(schema));
#2
0
Hey guys thanks for looking into this. I can't share any code because they belong to clients. I did not receive any error messages, and the debugger did see data, but we were not able to see the data in the avro file (via pardo).
嘿伙计们感谢您对此进行调查。我无法共享任何代码,因为它们属于客户端。我没有收到任何错误消息,调试器确实看到了数据,但是我们无法在avro文件中看到数据(通过pardo)。
I did manage to fix the issue by recreating the dataflow project using the Eclipse wizard. I even used the same code. I wonder why I did not receive any error messages.
我确实通过使用Eclipse向导重新创建数据流项目来解决问题。我甚至使用相同的代码。我想知道为什么我没有收到任何错误消息。