如何在Pandas中读取奇怪的csv文件?

时间:2022-02-20 20:29:50

I would like to read sample csv file shown in below

我想阅读下面显示的示例csv文件

--------------
 |A|B|C| 
--------------
 |1|2|3| 
--------------
 |4|5|6| 
--------------
 |7|8|9| 
--------------

I tried

pd.read_csv("sample.csv",sep="|")

But it didn't work well.

但它没有奏效。

How can I read this csv?

我怎么读这个csv?

3 个解决方案

#1


11  

You can add parameter comment to read_csv and then remove columns with NaN by dropna:

您可以向read_csv添加参数注释,然后通过dropna删除NaN列:

import pandas as pd
import io

temp=u"""--------------
|A|B|C|
--------------
|1|2|3|
--------------
|4|5|6|
--------------
|7|8|9|
--------------"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="|", comment='-').dropna(axis=1, how='all')

print (df)
   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

More general solution:

更一般的解决方案

import pandas as pd
import io

temp=u"""--------------
|A|B|C|
--------------
|1|2|3|
--------------
|4|5|6|
--------------
|7|8|9|
--------------"""
#after testing replace io.StringIO(temp) to filename
#separator is char which is NOT in csv
df = pd.read_csv(io.StringIO(temp), sep="^", comment='-')

#remove first and last | in data and in column names
df.iloc[:,0] = df.iloc[:,0].str.strip('|') 
df.columns = df.columns.str.strip('|')
#split column names
cols = df.columns.str.split('|')[0]
#split data
df = df.iloc[:,0].str.split('|', expand=True)
df.columns = cols
print (df)
   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

#2


1  

Try "import csv" rather than directly use pandas.

尝试“import csv”而不是直接使用pandas。

import csv

easy_csv = []

with open('sample.csv', 'rb') as csvfile:
   test = csv.reader(csvfile, delimiter=' ', quotechar='|')
   for row in test:
      row_preprocessed = """ handling rows at here; removing |, ignoring row that has ----"""
      easy_csv.append([row_preprocessed])

After this preprocessing, you can save it into comma separated csv files to easily handle on pandas.

在预处理之后,您可以将其保存为逗号分隔的csv文件,以便轻松处理pandas。

#3


0  

i try this code and its ok !:

我尝试这个代码,它的确定!:

import pandas as pd
import numpy as np
a = pd.read_csv("a.csv",sep="|")
print(a)
for i in a:
    print(i)

如何在Pandas中读取奇怪的csv文件?

#1


11  

You can add parameter comment to read_csv and then remove columns with NaN by dropna:

您可以向read_csv添加参数注释,然后通过dropna删除NaN列:

import pandas as pd
import io

temp=u"""--------------
|A|B|C|
--------------
|1|2|3|
--------------
|4|5|6|
--------------
|7|8|9|
--------------"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="|", comment='-').dropna(axis=1, how='all')

print (df)
   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

More general solution:

更一般的解决方案

import pandas as pd
import io

temp=u"""--------------
|A|B|C|
--------------
|1|2|3|
--------------
|4|5|6|
--------------
|7|8|9|
--------------"""
#after testing replace io.StringIO(temp) to filename
#separator is char which is NOT in csv
df = pd.read_csv(io.StringIO(temp), sep="^", comment='-')

#remove first and last | in data and in column names
df.iloc[:,0] = df.iloc[:,0].str.strip('|') 
df.columns = df.columns.str.strip('|')
#split column names
cols = df.columns.str.split('|')[0]
#split data
df = df.iloc[:,0].str.split('|', expand=True)
df.columns = cols
print (df)
   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9

#2


1  

Try "import csv" rather than directly use pandas.

尝试“import csv”而不是直接使用pandas。

import csv

easy_csv = []

with open('sample.csv', 'rb') as csvfile:
   test = csv.reader(csvfile, delimiter=' ', quotechar='|')
   for row in test:
      row_preprocessed = """ handling rows at here; removing |, ignoring row that has ----"""
      easy_csv.append([row_preprocessed])

After this preprocessing, you can save it into comma separated csv files to easily handle on pandas.

在预处理之后,您可以将其保存为逗号分隔的csv文件,以便轻松处理pandas。

#3


0  

i try this code and its ok !:

我尝试这个代码,它的确定!:

import pandas as pd
import numpy as np
a = pd.read_csv("a.csv",sep="|")
print(a)
for i in a:
    print(i)

如何在Pandas中读取奇怪的csv文件?