I have a csv file with 3 columns, wherein each row of Column 3 has list of values in it. As you can see from the following table structure
我有一个包含3列的csv文件,其中第3列的每一行都包含值列表。从下表结构中可以看出
Col1,Col2,Col3
1,a1,"['Proj1', 'Proj2']"
2,a2,"['Proj3', 'Proj2']"
3,a3,"['Proj4', 'Proj1']"
4,a4,"['Proj3', 'Proj4']"
5,a5,"['Proj5', 'Proj2']"
Whenever I try to read this csv, Col3 is getting read as str object and not as list. I tried to alter the dtype of that column to list but got "Attribute Error" as below
每当我尝试读取此csv时,Col3将被读取为str对象而不是列表。我试图改变该列的dtype列表,但得到“属性错误”如下
df = pd.read_csv("inputfile.csv")
df.Col3.dtype = list
AttributeError Traceback (most recent call last)
<ipython-input-19-6f9ec76b1b30> in <module>()
----> 1 df.Col3.dtype = list
C:\Python27\lib\site-packages\pandas\core\generic.pyc in __setattr__(self, name, value)
1953 object.__setattr__(self, name, value)
1954 except (AttributeError, TypeError):
-> 1955 object.__setattr__(self, name, value)
1956
1957 #----------------------------------------------------------------------
AttributeError: can't set attribute
AttributeError:无法设置属性
It would be really great if you can guide me how to go about it.
如果你可以指导我如何去做它真的很棒。
2 个解决方案
#1
8
You could use the ast lib:
你可以使用ast lib:
from ast import literal_eval
df.Col3 = df.Col3.apply(literal_eval)
print(df.Col3[0][0])
Proj1
You can also do it when you create the dataframe from the csv, using converters
:
您也可以使用转换器从csv创建数据框时执行此操作:
df = pd.read_csv("in.csv",converters={"Col3": literal_eval})
If you are sure the format is he same for all strings, stripping and splitting will be a lot faster:
如果您确定所有字符串的格式相同,则剥离和拆分将更快:
df = pd.read_csv("in.csv",converters={"Col3": lambda x: x.strip("[]").split(", ")})
But you will end up with the strings wrapped in quotes
但是你最终会用引号括起来的字符串
#2
-1
Try removing the the '[' and ']' brackets from the column. Then use the python string split
function to convert that into a list.
尝试从列中删除'['和']'括号。然后使用python string split函数将其转换为列表。
df['Col3'] = df['Col3'].str.replace(']',"")
df['Col3'] = df['Col3'].str.replace('[',"")
df['Col3'] = df['Col3'].str.split()
#1
8
You could use the ast lib:
你可以使用ast lib:
from ast import literal_eval
df.Col3 = df.Col3.apply(literal_eval)
print(df.Col3[0][0])
Proj1
You can also do it when you create the dataframe from the csv, using converters
:
您也可以使用转换器从csv创建数据框时执行此操作:
df = pd.read_csv("in.csv",converters={"Col3": literal_eval})
If you are sure the format is he same for all strings, stripping and splitting will be a lot faster:
如果您确定所有字符串的格式相同,则剥离和拆分将更快:
df = pd.read_csv("in.csv",converters={"Col3": lambda x: x.strip("[]").split(", ")})
But you will end up with the strings wrapped in quotes
但是你最终会用引号括起来的字符串
#2
-1
Try removing the the '[' and ']' brackets from the column. Then use the python string split
function to convert that into a list.
尝试从列中删除'['和']'括号。然后使用python string split函数将其转换为列表。
df['Col3'] = df['Col3'].str.replace(']',"")
df['Col3'] = df['Col3'].str.replace('[',"")
df['Col3'] = df['Col3'].str.split()