
时间:2022-08-22 23:44:11

I have the following pandas dataframe:


df = pd.read_csv('path/file/file.csv',
                 header=0, sep=',', names=['PhraseId', 'SentenceId', 'Phrase', 'Sentiment'])

I would like to print it with andrew_curves i tried the following:


andrews_curves(df, 'Name')

Any idea of how to plot this?. This is the content of the csv:


PhraseId, SentenceId, Phrase, Sentiment
1, 1, A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ., 1
2, 1, A series of escapades demonstrating the adage that what is good for the goose, 2
3, 1, A series, 2
4, 1, A, 2
5, 1, series, 2
6, 1, of escapades demonstrating the adage that what is good for the goose, 2
7, 1, of, 2
8, 1, escapades demonstrating the adage that what is good for the goose, 2
9, 1, escapades, 2
10, 1, demonstrating the adage that what is good for the goose, 2
11, 1, demonstrating the adage, 2
12, 1, demonstrating, 2
13, 1, the adage, 2
14, 1, the, 2
15, 1, adage, 2
16, 1, that what is good for the goose, 2
17, 1, that, 2
18, 1, what is good for the goose, 2
19, 1, what, 2
20, 1, is good for the goose, 2
21, 1, is, 2
22, 1, good for the goose, 3
23, 1, good, 3
24, 1, for the goose, 2
25, 1, for, 2
26, 1, the goose, 2
27, 1, goose, 2
28, 1, is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ., 2
29, 1, is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story, 2

1 个解决方案



In the doc page that you linked to, the Iris dataset has a column called 'Name'. When you call


andrews_curves(data, 'Name')

the rows of data are grouped by the value of Name. That's why for the Iris dataset you get three different colors for the lines.


In your dataset you have three columns: A, B, C. To call andrews_curves on your df, you first need to identify the value you want to group by. If, for example, it is the value of the C column, then call


andrews_curves(data, 'C')

If, on the other hand, you want to group by the column names, A, B, C, then first melt your DataFrame to convert it from wide format to long format, and then call andrews_curves on the variable column (which holds the value A,B, or C for each row):


import numpy as np
import pandas as pd
import pandas.tools.plotting as pdplt
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 1000)
df = pd.DataFrame({'A': np.sin(x**2)/x,
                   'B': np.sin(x)*np.exp(-x),
                   'C': np.cos(x)*x})
pdplt.andrews_curves(pd.melt(df), 'variable')





In the doc page that you linked to, the Iris dataset has a column called 'Name'. When you call


andrews_curves(data, 'Name')

the rows of data are grouped by the value of Name. That's why for the Iris dataset you get three different colors for the lines.


In your dataset you have three columns: A, B, C. To call andrews_curves on your df, you first need to identify the value you want to group by. If, for example, it is the value of the C column, then call


andrews_curves(data, 'C')

If, on the other hand, you want to group by the column names, A, B, C, then first melt your DataFrame to convert it from wide format to long format, and then call andrews_curves on the variable column (which holds the value A,B, or C for each row):


import numpy as np
import pandas as pd
import pandas.tools.plotting as pdplt
import matplotlib.pyplot as plt

x = np.linspace(-1, 1, 1000)
df = pd.DataFrame({'A': np.sin(x**2)/x,
                   'B': np.sin(x)*np.exp(-x),
                   'C': np.cos(x)*x})
pdplt.andrews_curves(pd.melt(df), 'variable')

