I have a dataframe as below
我有如下的数据aframe
df = pd.DataFrame({'a':[1,1,1,2,2,2],
'b': [10, 20, 30, 20, 40, 60],
'c': [80, 80, 80, 120, 120, 120]})
I want to get 3D array
我想要得到3D数组
array([[[ 1, 10, 80],
[ 2, 20, 120] ],
[[ 1, 20, 80] ,
[ 2, 40, 120] ],
[[ 1, 30, 80],
[ 2, 60, 120]]], dtype=int64)
I do like this
我非常喜欢这
values = df.values
values.reshape(3, 2, 3)
and get an incorrect array. How to get the expected array?
得到一个错误的数组。如何获得预期的数组?
2 个解决方案
#1
3
Get the array data, then reshape splitting the first axis into two with the first of them being of length 2
giving us a 3D
array and then swap those two axes -
获取数组数据,然后将第一个轴分解为两个,第一个轴长度为2,得到一个3D数组,然后交换这两个轴-
df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
Sample run -
样本运行-
In [711]: df
Out[711]:
a b c
0 1 10 80
1 1 20 80
2 1 30 80
3 2 20 120
4 2 40 120
5 2 60 120
In [713]: df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
Out[713]:
array([[[ 1, 10, 80],
[ 2, 20, 120]],
[[ 1, 20, 80],
[ 2, 40, 120]],
[[ 1, 30, 80],
[ 2, 60, 120]]])
This gives us a view into the original data without making a copy and as such has a minimal constant time.
这使我们不用复制就能看到原始数据,因此具有最小的常数时间。
Runtime test
运行时测试
Case #1 :
例# 1:
In [730]: df = pd.DataFrame(np.random.randint(0,9,(2000,100)))
# @cᴏʟᴅsᴘᴇᴇᴅ's soln
In [731]: %timeit np.stack(np.split(df.values, 2), axis=1)
10000 loops, best of 3: 109 µs per loop
In [732]: %timeit df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
100000 loops, best of 3: 8.55 µs per loop
Case #2 :
例# 2:
In [733]: df = pd.DataFrame(np.random.randint(0,9,(2000,2000)))
# @cᴏʟᴅsᴘᴇᴇᴅ's soln
In [734]: %timeit np.stack(np.split(df.values, 2), axis=1)
100 loops, best of 3: 4.3 ms per loop
In [735]: %timeit df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
100000 loops, best of 3: 8.37 µs per loop
#2
2
Try np.split
+ np.stack
:
尝试np。分+ np.stack:
np.stack(np.split(df.values, 2), axis=1)
array([[[ 1, 10, 80],
[ 2, 20, 120]],
[[ 1, 20, 80],
[ 2, 40, 120]],
[[ 1, 30, 80],
[ 2, 60, 120]]])
#1
3
Get the array data, then reshape splitting the first axis into two with the first of them being of length 2
giving us a 3D
array and then swap those two axes -
获取数组数据,然后将第一个轴分解为两个,第一个轴长度为2,得到一个3D数组,然后交换这两个轴-
df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
Sample run -
样本运行-
In [711]: df
Out[711]:
a b c
0 1 10 80
1 1 20 80
2 1 30 80
3 2 20 120
4 2 40 120
5 2 60 120
In [713]: df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
Out[713]:
array([[[ 1, 10, 80],
[ 2, 20, 120]],
[[ 1, 20, 80],
[ 2, 40, 120]],
[[ 1, 30, 80],
[ 2, 60, 120]]])
This gives us a view into the original data without making a copy and as such has a minimal constant time.
这使我们不用复制就能看到原始数据,因此具有最小的常数时间。
Runtime test
运行时测试
Case #1 :
例# 1:
In [730]: df = pd.DataFrame(np.random.randint(0,9,(2000,100)))
# @cᴏʟᴅsᴘᴇᴇᴅ's soln
In [731]: %timeit np.stack(np.split(df.values, 2), axis=1)
10000 loops, best of 3: 109 µs per loop
In [732]: %timeit df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
100000 loops, best of 3: 8.55 µs per loop
Case #2 :
例# 2:
In [733]: df = pd.DataFrame(np.random.randint(0,9,(2000,2000)))
# @cᴏʟᴅsᴘᴇᴇᴅ's soln
In [734]: %timeit np.stack(np.split(df.values, 2), axis=1)
100 loops, best of 3: 4.3 ms per loop
In [735]: %timeit df.values.reshape(2,-1,df.shape[1]).swapaxes(0,1)
100000 loops, best of 3: 8.37 µs per loop
#2
2
Try np.split
+ np.stack
:
尝试np。分+ np.stack:
np.stack(np.split(df.values, 2), axis=1)
array([[[ 1, 10, 80],
[ 2, 20, 120]],
[[ 1, 20, 80],
[ 2, 40, 120]],
[[ 1, 30, 80],
[ 2, 60, 120]]])