当从csv读取到dataframe pandas python时，dict对象转换为字符串

I have a csv file, which has got many columns. One column contains data in the form of dict objects as well as strings.

我有一个csv文件,它有很多列。一列包含dict对象和字符串形式的数据。

For eg: Column contains data like : {"a":5,"b":6,"c":8},"usa","india",{"a":9,"b":10,"c":11}

例如:列包含如下数据:{“a”:5,“b”:6,“c”:8},“usa”,“india”,{“a”:9,“b”:10,“ C“:11}

When I read this csv into a dataframe using :

当我使用以下命令将此csv读入数据帧时:

df = pd.read_csv(path)

this column data is recognised as string when i did df.applymap(type) to check the type of each element stored in this particular column.

当我执行df.applymap(type)以检查存储在此特定列中的每个元素的类型时,此列数据被识别为字符串。

But data does not have quotes around it neither in csv nor in the dataframe. But still dict objects are converted to string and stored in dataframe.

但是数据在csv和数据帧中都没有引号。但仍然将dict对象转换为字符串并存储在数据帧中。

On checking type of column, it turns out to be object.

在检查列的类型时,它结果是对象。

Please suggest how to read from csv into dataframe such that dict objects are recognised as dict and strings as strings in this particular column.

请建议如何从csv读入数据帧,以便在此特定列中将dict对象识别为dict和字符串作为字符串。

1 个解决方案

#1

You can convert the strings that should be dicts (or other types) using literal_eval:

您可以使用literal_eval转换应该是dicts(或其他类型)的字符串:

from ast import literal_eval

def try_literal_eval(s):
    try:
        return literal_eval(s)
    except ValueError:
        return s

Now you can apply this to your DataFrame:

现在,您可以将其应用于您的DataFrame:

In [11]: df = pd.DataFrame({'A': ["hello","world",'{"a":5,"b":6,"c":8}',"usa","india",'{"d":9,"e":10,"f":11}']})

In [12]: df.loc[2, "A"]
Out[12]: '{"a":5,"b":6,"c":8}'

In [13]: df
Out[13]:
                       A
0                  hello
1                  world
2    {"a":5,"b":6,"c":8}
3                    usa
4                  india
5  {"d":9,"e":10,"f":11}


In [14]: df.applymap(try_literal_eval)
Out[14]:
                            A
0                       hello
1                       world
2    {'a': 5, 'b': 6, 'c': 8}
3                         usa
4                       india
5  {'d': 9, 'e': 10, 'f': 11}

In [15]: df.applymap(try_literal_eval).loc[2, "A"]
Out[15]: {'a': 5, 'b': 6, 'c': 8}

Note: This is pretty expensive (time-wise) as far as other calls go, however when you're dealing with dictionaries in DataFrames/Series you're necessarily defaulting back to python objects so things are going to be relatively slow... It's probably a good idea to denormalize i.e. get the data back as columns e.g. using json_normalize.

注意:就其他调用来说,这是非常昂贵的(按时间),但是当你在DataFrames / Series中处理字典时,你必须默认返回python对象,所以事情会变得相对缓慢......非规范化可能是一个好主意,即将数据作为列返回,例如使用json_normalize。

#1