将列元素转换为pandas中的列名

时间:2022-10-10 22:58:11

I have a large .csv file which is constantly being updated in real time with several thousand lines displayed as follows:

我有一个大的.csv文件,它不断实时更新,显示数千行,如下所示:

 time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0
 time5,stockA,bid,1.1
 time6,stockA,ask,1.2

What is the fastest way to read this into a dataframe that looks like this:

将此内容读取到如下所示的数据框中的最快方法是什么:

   time     stock       bid    ask
   time1    stockA      1      
   time2    stockA             1.1
   time3    stockB             2.1
   time4    stockB      2.0    
   time5    stockA      1.1
   time6    stockA             1.2

Any help is appreciated

任何帮助表示赞赏

2 个解决方案

#1


You can use read_csv and specify header=None and pass the column names as a list:

您可以使用read_csv并指定header = None并将列名称作为列表传递:

In [124]:

t="""time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0"""
​
df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'])
df
Out[124]:
     time   stock  bid  ask
0   time1  stockA  bid  1.0
1   time2  stockA  ask  1.1
2   time3  stockB  ask  2.1
3   time4  stockB  bid  2.0

You'll have to re-encode the bid column to 1 or 2:

您必须将出价列重新编码为1或2:

In [126]:

df['bid'] = df['bid'].replace('bid', 1)
df['bid'] = df['bid'].replace('ask', 2)
df
Out[126]:
     time   stock  bid  ask
0   time1  stockA    1  1.0
1   time2  stockA    2  1.1
2   time3  stockB    2  2.1
3   time4  stockB    1  2.0

EDIT

Based on your updated sample data and desired output the following works:

根据您更新的样本数据和所需的输出,以下工作:

In [29]:

t="""time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0
 time5,stockA,bid,1.1
 time6,stockA,ask,1.2"""
​
df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'])
df
Out[29]:
     time   stock  bid  ask
0   time1  stockA  bid  1.0
1   time2  stockA  ask  1.1
2   time3  stockB  ask  2.1
3   time4  stockB  bid  2.0
4   time5  stockA  bid  1.1
5   time6  stockA  ask  1.2
In [30]:

df.loc[df['bid'] == 'bid', 'bid'] = df['ask']
df.loc[df['bid'] != 'ask', 'ask'] = ''
df.loc[df['bid'] == 'ask','bid'] = ''
df
Out[30]:
     time   stock  bid  ask
0   time1  stockA    1     
1   time2  stockA       1.1
2   time3  stockB       2.1
3   time4  stockB    2     
4   time5  stockA  1.1     
5   time6  stockA       1.2

#2


Here is a more concise way I think.

我认为这是一种更简洁的方式。

 df = pd.read_csv('prices.csv', header=None, names=['time', 'stock', 'type',   'prices'], 
                  index_col=['time', 'stock', 'type'])

In [1062]:

df
Out[1062]:
                    prices
time    stock   type    
time1   stockA  bid 1.0
time2   stockA  ask 1.1
time3   stockB  ask 2.1
time4   stockB  bid 2.0
time5   stockA  bid 1.1
time6   stockA  ask 1.2
time7   stockA  high1.5
time8   stockA  low 0.5

I think that's what the DataFrame should look like. Then do

我认为这就是DataFrame的样子。然后做

In [1064]:

df.unstack()
Out[1064]:
                prices
type            ask bid high low
time    stock               
time1   stockA  NaN 1.0 NaN NaN
time2   stockA  1.1 NaN NaN NaN
time3   stockB  2.1 NaN NaN NaN
time4   stockB  NaN 2.0 NaN NaN
time5   stockA  NaN 1.1 NaN NaN
time6   stockA  1.2 NaN NaN NaN
time7   stockA  NaN NaN 1.5 NaN
time8   stockA  NaN NaN NaN 0.5

You can fill the Nans with whatever you prefer using df.fillna. Generally speaking, converting a columns values into column headers is called pivoting. .unstack pivots a level of a MultiIndex. You can check .pivot as well.

您可以使用df.fillna填写您喜欢的任何内容。一般来说,将列值转换为列标题称为透视。 .unstack支持MultiIndex的级别。你也可以检查.pivot。

#1


You can use read_csv and specify header=None and pass the column names as a list:

您可以使用read_csv并指定header = None并将列名称作为列表传递:

In [124]:

t="""time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0"""
​
df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'])
df
Out[124]:
     time   stock  bid  ask
0   time1  stockA  bid  1.0
1   time2  stockA  ask  1.1
2   time3  stockB  ask  2.1
3   time4  stockB  bid  2.0

You'll have to re-encode the bid column to 1 or 2:

您必须将出价列重新编码为1或2:

In [126]:

df['bid'] = df['bid'].replace('bid', 1)
df['bid'] = df['bid'].replace('ask', 2)
df
Out[126]:
     time   stock  bid  ask
0   time1  stockA    1  1.0
1   time2  stockA    2  1.1
2   time3  stockB    2  2.1
3   time4  stockB    1  2.0

EDIT

Based on your updated sample data and desired output the following works:

根据您更新的样本数据和所需的输出,以下工作:

In [29]:

t="""time1,stockA,bid,1
 time2,stockA,ask,1.1
 time3,stockB,ask,2.1
 time4,stockB,bid,2.0
 time5,stockA,bid,1.1
 time6,stockA,ask,1.2"""
​
df = pd.read_csv(io.StringIO(t), header=None, names=['time', 'stock', 'bid', 'ask'])
df
Out[29]:
     time   stock  bid  ask
0   time1  stockA  bid  1.0
1   time2  stockA  ask  1.1
2   time3  stockB  ask  2.1
3   time4  stockB  bid  2.0
4   time5  stockA  bid  1.1
5   time6  stockA  ask  1.2
In [30]:

df.loc[df['bid'] == 'bid', 'bid'] = df['ask']
df.loc[df['bid'] != 'ask', 'ask'] = ''
df.loc[df['bid'] == 'ask','bid'] = ''
df
Out[30]:
     time   stock  bid  ask
0   time1  stockA    1     
1   time2  stockA       1.1
2   time3  stockB       2.1
3   time4  stockB    2     
4   time5  stockA  1.1     
5   time6  stockA       1.2

#2


Here is a more concise way I think.

我认为这是一种更简洁的方式。

 df = pd.read_csv('prices.csv', header=None, names=['time', 'stock', 'type',   'prices'], 
                  index_col=['time', 'stock', 'type'])

In [1062]:

df
Out[1062]:
                    prices
time    stock   type    
time1   stockA  bid 1.0
time2   stockA  ask 1.1
time3   stockB  ask 2.1
time4   stockB  bid 2.0
time5   stockA  bid 1.1
time6   stockA  ask 1.2
time7   stockA  high1.5
time8   stockA  low 0.5

I think that's what the DataFrame should look like. Then do

我认为这就是DataFrame的样子。然后做

In [1064]:

df.unstack()
Out[1064]:
                prices
type            ask bid high low
time    stock               
time1   stockA  NaN 1.0 NaN NaN
time2   stockA  1.1 NaN NaN NaN
time3   stockB  2.1 NaN NaN NaN
time4   stockB  NaN 2.0 NaN NaN
time5   stockA  NaN 1.1 NaN NaN
time6   stockA  1.2 NaN NaN NaN
time7   stockA  NaN NaN 1.5 NaN
time8   stockA  NaN NaN NaN 0.5

You can fill the Nans with whatever you prefer using df.fillna. Generally speaking, converting a columns values into column headers is called pivoting. .unstack pivots a level of a MultiIndex. You can check .pivot as well.

您可以使用df.fillna填写您喜欢的任何内容。一般来说,将列值转换为列标题称为透视。 .unstack支持MultiIndex的级别。你也可以检查.pivot。