如何使用andrew_curves绘制pandas数据框?

时间:2022-08-22 23:44:11

I have the following pandas dataframe:

我有以下pandas数据帧:

df = pd.read_csv('path/file/file.csv',
                 header=0, sep=',', names=['PhraseId', 'SentenceId', 'Phrase', 'Sentiment'])

I would like to print it with andrew_curves i tried the following:

我想用andrew_curves打印它我尝试了以下内容:

andrews_curves(df, 'Name')

Any idea of how to plot this?. This is the content of the csv:

知道如何策划这个吗?这是csv的内容:

PhraseId, SentenceId, Phrase, Sentiment
1, 1, A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ., 1
2, 1, A series of escapades demonstrating the adage that what is good for the goose, 2
3, 1, A series, 2
4, 1, A, 2
5, 1, series, 2
6, 1, of escapades demonstrating the adage that what is good for the goose, 2
7, 1, of, 2
8, 1, escapades demonstrating the adage that what is good for the goose, 2
9, 1, escapades, 2
10, 1, demonstrating the adage that what is good for the goose, 2
11, 1, demonstrating the adage, 2
12, 1, demonstrating, 2
13, 1, the adage, 2
14, 1, the, 2
15, 1, adage, 2
16, 1, that what is good for the goose, 2
17, 1, that, 2
18, 1, what is good for the goose, 2
19, 1, what, 2
20, 1, is good for the goose, 2
21, 1, is, 2
22, 1, good for the goose, 3
23, 1, good, 3
24, 1, for the goose, 2
25, 1, for, 2
26, 1, the goose, 2
27, 1, goose, 2
28, 1, is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ., 2
29, 1, is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story, 2

1 个解决方案

#1


2  

In the doc page that you linked to, the Iris dataset has a column called 'Name'. When you call

在您链接到的doc页面中,Iris数据集有一个名为“Name”的列。你打电话的时候

andrews_curves(data, 'Name')

the rows of data are grouped by the value of Name. That's why for the Iris dataset you get three different colors for the lines.

数据行按名称的值进行分组。这就是为什么对于Iris数据集,你会得到三种不同颜色的线条。

In your dataset you have three columns: A, B, C. To call andrews_curves on your df, you first need to identify the value you want to group by. If, for example, it is the value of the C column, then call

在数据集中,您有三列:A,B,C。要在df上调用andrews_curves,首先需要确定要分组的值。例如,如果它是C列的值,则调用

andrews_curves(data, 'C')

If, on the other hand, you want to group by the column names, A, B, C, then first melt your DataFrame to convert it from wide format to long format, and then call andrews_curves on the variable column (which holds the value A,B, or C for each row):

另一方面,如果要按列名A,B,C进行分组,则先熔化DataFrame,将其从宽格式转换为长格式,然后在变量列上调用andrews_curves(保存值每行A,B或C):

import numpy as np
import pandas as pd
import pandas.tools.plotting as pdplt
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 1000)
df = pd.DataFrame({'A': np.sin(x**2)/x,
                   'B': np.sin(x)*np.exp(-x),
                   'C': np.cos(x)*x})
pdplt.andrews_curves(pd.melt(df), 'variable')
plt.show()

yields

如何使用andrew_curves绘制pandas数据框?

#1


2  

In the doc page that you linked to, the Iris dataset has a column called 'Name'. When you call

在您链接到的doc页面中,Iris数据集有一个名为“Name”的列。你打电话的时候

andrews_curves(data, 'Name')

the rows of data are grouped by the value of Name. That's why for the Iris dataset you get three different colors for the lines.

数据行按名称的值进行分组。这就是为什么对于Iris数据集,你会得到三种不同颜色的线条。

In your dataset you have three columns: A, B, C. To call andrews_curves on your df, you first need to identify the value you want to group by. If, for example, it is the value of the C column, then call

在数据集中,您有三列:A,B,C。要在df上调用andrews_curves,首先需要确定要分组的值。例如,如果它是C列的值,则调用

andrews_curves(data, 'C')

If, on the other hand, you want to group by the column names, A, B, C, then first melt your DataFrame to convert it from wide format to long format, and then call andrews_curves on the variable column (which holds the value A,B, or C for each row):

另一方面,如果要按列名A,B,C进行分组,则先熔化DataFrame,将其从宽格式转换为长格式,然后在变量列上调用andrews_curves(保存值每行A,B或C):

import numpy as np
import pandas as pd
import pandas.tools.plotting as pdplt
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 1000)
df = pd.DataFrame({'A': np.sin(x**2)/x,
                   'B': np.sin(x)*np.exp(-x),
                   'C': np.cos(x)*x})
pdplt.andrews_curves(pd.melt(df), 'variable')
plt.show()

yields

如何使用andrew_curves绘制pandas数据框?