I have the following pandas dataframe:
我有以下pandas数据帧:
df = pd.read_csv('path/file/file.csv',
header=0, sep=',', names=['PhraseId', 'SentenceId', 'Phrase', 'Sentiment'])
I would like to print it with andrew_curves i tried the following:
我想用andrew_curves打印它我尝试了以下内容:
andrews_curves(df, 'Name')
Any idea of how to plot this?. This is the content of the csv:
知道如何策划这个吗?这是csv的内容:
PhraseId, SentenceId, Phrase, Sentiment
1, 1, A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ., 1
2, 1, A series of escapades demonstrating the adage that what is good for the goose, 2
3, 1, A series, 2
4, 1, A, 2
5, 1, series, 2
6, 1, of escapades demonstrating the adage that what is good for the goose, 2
7, 1, of, 2
8, 1, escapades demonstrating the adage that what is good for the goose, 2
9, 1, escapades, 2
10, 1, demonstrating the adage that what is good for the goose, 2
11, 1, demonstrating the adage, 2
12, 1, demonstrating, 2
13, 1, the adage, 2
14, 1, the, 2
15, 1, adage, 2
16, 1, that what is good for the goose, 2
17, 1, that, 2
18, 1, what is good for the goose, 2
19, 1, what, 2
20, 1, is good for the goose, 2
21, 1, is, 2
22, 1, good for the goose, 3
23, 1, good, 3
24, 1, for the goose, 2
25, 1, for, 2
26, 1, the goose, 2
27, 1, goose, 2
28, 1, is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ., 2
29, 1, is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story, 2
1 个解决方案
#1
2
In the doc page that you linked to, the Iris dataset has a column called 'Name'
. When you call
在您链接到的doc页面中,Iris数据集有一个名为“Name”的列。你打电话的时候
andrews_curves(data, 'Name')
the rows of data
are grouped by the value of Name
. That's why for the Iris dataset you get three different colors for the lines.
数据行按名称的值进行分组。这就是为什么对于Iris数据集,你会得到三种不同颜色的线条。
In your dataset you have three columns: A
, B
, C
. To call andrews_curves
on your df
, you first need to identify the value you want to group by. If, for example, it is the value of the C
column, then call
在数据集中,您有三列:A,B,C。要在df上调用andrews_curves,首先需要确定要分组的值。例如,如果它是C列的值,则调用
andrews_curves(data, 'C')
If, on the other hand, you want to group by the column names, A
, B
, C
, then first melt your DataFrame to convert it from wide format to long format, and then call andrews_curves
on the variable
column (which holds the value A
,B
, or C
for each row):
另一方面,如果要按列名A,B,C进行分组,则先熔化DataFrame,将其从宽格式转换为长格式,然后在变量列上调用andrews_curves(保存值每行A,B或C):
import numpy as np
import pandas as pd
import pandas.tools.plotting as pdplt
import matplotlib.pyplot as plt
x = np.linspace(-1, 1, 1000)
df = pd.DataFrame({'A': np.sin(x**2)/x,
'B': np.sin(x)*np.exp(-x),
'C': np.cos(x)*x})
pdplt.andrews_curves(pd.melt(df), 'variable')
plt.show()
yields
#1
2
In the doc page that you linked to, the Iris dataset has a column called 'Name'
. When you call
在您链接到的doc页面中,Iris数据集有一个名为“Name”的列。你打电话的时候
andrews_curves(data, 'Name')
the rows of data
are grouped by the value of Name
. That's why for the Iris dataset you get three different colors for the lines.
数据行按名称的值进行分组。这就是为什么对于Iris数据集,你会得到三种不同颜色的线条。
In your dataset you have three columns: A
, B
, C
. To call andrews_curves
on your df
, you first need to identify the value you want to group by. If, for example, it is the value of the C
column, then call
在数据集中,您有三列:A,B,C。要在df上调用andrews_curves,首先需要确定要分组的值。例如,如果它是C列的值,则调用
andrews_curves(data, 'C')
If, on the other hand, you want to group by the column names, A
, B
, C
, then first melt your DataFrame to convert it from wide format to long format, and then call andrews_curves
on the variable
column (which holds the value A
,B
, or C
for each row):
另一方面,如果要按列名A,B,C进行分组,则先熔化DataFrame,将其从宽格式转换为长格式,然后在变量列上调用andrews_curves(保存值每行A,B或C):
import numpy as np
import pandas as pd
import pandas.tools.plotting as pdplt
import matplotlib.pyplot as plt
x = np.linspace(-1, 1, 1000)
df = pd.DataFrame({'A': np.sin(x**2)/x,
'B': np.sin(x)*np.exp(-x),
'C': np.cos(x)*x})
pdplt.andrews_curves(pd.melt(df), 'variable')
plt.show()
yields