从CSV文件中读取列似乎不起作用

时间:2022-09-03 14:04:39

I have a .csv dataset of news articles which (should have) columns of publication, date, title etc. When I open this file with Numbers it perfectly shows this, every column is accounted for. However, when I try to use the file in the Jupyter Notebook, the columns don't seem to work properly. Here is what I have:

我有一个.csv数据集的新闻文章(应该有)发布,日期,标题等列。当我用Numbers打开这个文件时,它完美地显示了这一点,每一列都被考虑在内。但是,当我尝试在Jupyter Notebook中使用该文件时,列似乎无法正常工作。这是我有的:

%matplotlib inline
import matplotlib
import numpy as np
import matplotlib.pyplot as plt

import pandas as pd

data = pd.read_table("filename.csv",encoding="utf-8")

data.columns #and 

then it gives me:

然后它给了我:

Index(['SEARCH_ROW,PUBLICATION,DATE,TITLE,EDITION,BYLINE,LANGUAGE,SECTION,JOURNAL-CODE,NYT,PUBLICATION-TYPE,LENGTH,LOAD-DATE,TEXT'], dtype='object')

Opening the file with Microsoft Excel gives me the same problem; every column is named:

用Microsoft Excel打开文件会给我带来同样的问题;每列都命名为:

SEARCH_ROW,PUBLICATION,DATE,TITLE,EDITION,BYLINE,LANGUAGE,SECTION,JOURNAL-CODE,NYT,PUBLICATION-TYPE,LENGTH,LOAD-DATE,TEXT

Is there someway to split this one-big column in to the original multiple columns form?

有没有办法将这个一大列拆分成原始的多列形式?

2 个解决方案

#1


0  

pd.read_table(...) uses tab ('\t') as a separator per default.

pd.read_table(...)使用tab('\ t')作为默认分隔符。

So try to specify comma as a separator explicitly:

因此,请尝试明确指定逗号作为分隔符:

pd.read_table(filename, sep=',')

or use pd.read_csv(), which uses comma as a sparator per default

或者使用pd.read_csv(),默认情况下使用逗号作为sparator

#2


0  

you can use :

您可以使用 :

data = np.genfromtxt('filename.csv', delimiter=',')

#1


0  

pd.read_table(...) uses tab ('\t') as a separator per default.

pd.read_table(...)使用tab('\ t')作为默认分隔符。

So try to specify comma as a separator explicitly:

因此,请尝试明确指定逗号作为分隔符:

pd.read_table(filename, sep=',')

or use pd.read_csv(), which uses comma as a sparator per default

或者使用pd.read_csv(),默认情况下使用逗号作为sparator

#2


0  

you can use :

您可以使用 :

data = np.genfromtxt('filename.csv', delimiter=',')