I'm trying to convert a string column filled with null values and a few numbers stored as strings, to an integer column in Google's DataFlow. Could anyone help me out with a python code to do that?
我正在尝试将填充了空值的字符串列和存储为字符串的一些数字转换为Google DataFlow中的整数列。任何人都可以用python代码帮我解决这个问题吗?
1 个解决方案
#1
0
Looks like this has been sitting out here for awhile. It would be helpful if you could post some example text/code of what you have tried so far or what the data looks like. Here is the best I can do with limited information:
看起来这已经在这里待了一段时间。如果您可以发布一些您目前为止尝试过的示例文本/代码或数据的样子,将会很有帮助。这是我能用有限信息做的最好的事情:
with beam.Pipeline(options=PipelineOptions()) as p:
#this reads in the data
your_data = p | 'Your_Data' >> beam.io.ReadFromText('/path/to/data.csv')
#each line is read in as a String '11139422, null, null, 60.75'
#so we split each row of the PCollection into it's own String of values
# '11139422', '', '', '60.75'
split_your_data = your_data | 'split' >> beam.FlatMap(lambda x: x.split(","))
#We then have to convert everything to int values
your_data_to_int = split_your_data | 'String_to_Int' >> beam.Map(lambda w: int(w))
#1
0
Looks like this has been sitting out here for awhile. It would be helpful if you could post some example text/code of what you have tried so far or what the data looks like. Here is the best I can do with limited information:
看起来这已经在这里待了一段时间。如果您可以发布一些您目前为止尝试过的示例文本/代码或数据的样子,将会很有帮助。这是我能用有限信息做的最好的事情:
with beam.Pipeline(options=PipelineOptions()) as p:
#this reads in the data
your_data = p | 'Your_Data' >> beam.io.ReadFromText('/path/to/data.csv')
#each line is read in as a String '11139422, null, null, 60.75'
#so we split each row of the PCollection into it's own String of values
# '11139422', '', '', '60.75'
split_your_data = your_data | 'split' >> beam.FlatMap(lambda x: x.split(","))
#We then have to convert everything to int values
your_data_to_int = split_your_data | 'String_to_Int' >> beam.Map(lambda w: int(w))