So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like
所以我的数据集有n个日期的位置信息。问题是每个日期实际上是一个不同的列标题。例如,CSV看起来像
location name Jan-2010 Feb-2010 March-2010
A "test" 12 20 30
B "foo" 18 20 25
What I would like is for it to look like
我希望它看起来像
location name Date Value
A "test" Jan-2010 12
A "test" Feb-2010 20
A "test" March-2010 30
B "foo" Jan-2010 18
B "foo" Feb-2010 20
B "foo" March-2010 25
problem is I don't know how many dates are in the column (though I know they will always start after name)
问题是我不知道列中有多少个日期(虽然我知道它们总是会在名字后面开始)
2 个解决方案
#1
67
You can use pd.melt
to get most of the way there, and then sort:
您可以使用pd.melt来获取大部分内容,然后排序:
>>> df
location name Jan-2010 Feb-2010 March-2010
0 A test 12 20 30
1 B foo 18 20 25
>>> df2 = pd.melt(df, id_vars=["location", "name"],
var_name="Date", value_name="Value")
>>> df2
location name Date Value
0 A test Jan-2010 12
1 B foo Jan-2010 18
2 A test Feb-2010 20
3 B foo Feb-2010 20
4 A test March-2010 30
5 B foo March-2010 25
>>> df2 = df2.sort(["location", "name"])
>>> df2
location name Date Value
0 A test Jan-2010 12
2 A test Feb-2010 20
4 A test March-2010 30
1 B foo Jan-2010 18
3 B foo Feb-2010 20
5 B foo March-2010 25
(Might want to throw in a .reset_index(drop=True)
, just to keep the output clean.)
(可能想要输入.reset_index(drop = True),只是为了保持输出清洁。)
Note: pd.DataFrame.sort
has been deprecated in favour of pd.DataFrame.sort_values
.
注意:不推荐使用pd.DataFrame.sort,而选择pd.DataFrame.sort_values。
#2
2
I guess I found a simpler solution
我想我找到了一个更简单的解决方案
temp1 = pd.melt(df1, id_vars=["location"], var_name='Date', value_name='Value')
temp2 = pd.melt(df1, id_vars=["name"], var_name='Date', value_name='Value')
Concat whole temp1
with temp2
's column name
concat整个temp1与temp2的列名
temp1['new_column'] = temp2['name']
You now have what you asked for.
你现在拥有了你所要求的。
#1
67
You can use pd.melt
to get most of the way there, and then sort:
您可以使用pd.melt来获取大部分内容,然后排序:
>>> df
location name Jan-2010 Feb-2010 March-2010
0 A test 12 20 30
1 B foo 18 20 25
>>> df2 = pd.melt(df, id_vars=["location", "name"],
var_name="Date", value_name="Value")
>>> df2
location name Date Value
0 A test Jan-2010 12
1 B foo Jan-2010 18
2 A test Feb-2010 20
3 B foo Feb-2010 20
4 A test March-2010 30
5 B foo March-2010 25
>>> df2 = df2.sort(["location", "name"])
>>> df2
location name Date Value
0 A test Jan-2010 12
2 A test Feb-2010 20
4 A test March-2010 30
1 B foo Jan-2010 18
3 B foo Feb-2010 20
5 B foo March-2010 25
(Might want to throw in a .reset_index(drop=True)
, just to keep the output clean.)
(可能想要输入.reset_index(drop = True),只是为了保持输出清洁。)
Note: pd.DataFrame.sort
has been deprecated in favour of pd.DataFrame.sort_values
.
注意:不推荐使用pd.DataFrame.sort,而选择pd.DataFrame.sort_values。
#2
2
I guess I found a simpler solution
我想我找到了一个更简单的解决方案
temp1 = pd.melt(df1, id_vars=["location"], var_name='Date', value_name='Value')
temp2 = pd.melt(df1, id_vars=["name"], var_name='Date', value_name='Value')
Concat whole temp1
with temp2
's column name
concat整个temp1与temp2的列名
temp1['new_column'] = temp2['name']
You now have what you asked for.
你现在拥有了你所要求的。