扩展单个变量的频率数据集

I have 2 columns of data: One with Temperature values and the other with frequencies at which each temperature was observed. I have been trying to write code in python that takes this 2-column frequency data and creates an expanded array of the temperature data. I essentially want to reverse the process of "counting" the values and expose all the raw data values in a single array.

我有2列数据:一列有温度值,另一列有观察每个温度的频率。我一直在尝试在python中编写代码来获取这个2列频率数据,并创建一个扩展的温度数据数组。我本质上想要反转“计数”值的过程,并将所有原始数据值暴露在单个数组中。

The way I am currently reading in the data is as follows:

我目前阅读数据的方式如下:

f = np.genfromtxt('playground_sum.txt', usecols=(0,1))

freq = f[:,1]
temp = f[:,0]
freq = freq.astype(int)

new = []
for line in f:
    new = np.repeat(temp,freq)
print new

This worked! Any other methods welcome.

这有效!欢迎任何其他方法。

2 个解决方案

#1

Similar to @Totem solution, except I think you should convert frequency to integer.

类似于@Totem解决方案,除了我认为你应该将频率转换为整数。

array = []
with open('test.csv') as f:
    for line in f:
        temp, freq = line.split(',')
        try:
            freq = int(freq)
        except Exception as e:
            continue

        array.extend([temp] * freq)

print array

#2

Try this:

array = []
with open('myfile.txt') as f:
    for line in f:
        line = line.strip()[1: -1] # gives ex: '10, 0'
        try:
            temp, freq = [int(i) for i in line.split(',')] # a list comprehension
        except ValueError:
            continue
        array.extend([temp] * freq)

This assumes that each line in the file looks like this: [10, 0]

这假设文件中的每一行都是这样的:[10,0]

This code outputs a list looking like this: [10, 20, 20, 30, 30, 30]

此代码输出如下所示的列表:[10,20,20,30,30,30]

#1