Python熊猫用NaN值编写sql。

时间:2021-04-05 23:46:01

I'm trying to read a few hundred tables from ascii and then write them to mySQL. It seems easy to do with Pandas but I hit an error that doesn't make sense to me:

我尝试从ascii中读取几百个表,然后将它们写到mySQL中。对熊猫来说,这似乎很容易,但我犯了一个对我来说毫无意义的错误:

I have a data frame of 8 columns. Here is the column list/index:

我有一个8列的数据框架。这是列列表/索引:

metricDF.columns

Index([u'FID', u'TYPE', u'CO', u'CITY', u'LINENO', u'SUBLINE', u'VALUE_010', u'VALUE2_015'], dtype=object)

I then use to_sql to append the data up to mySQL

然后我使用to_sql将数据附加到mySQL

metricDF.to_sql(con=con, name=seqFile, if_exists='append', flavor='mysql')

I get a strange error about a column being "nan":

我得到一个奇怪的错误关于一个列是“nan”:

OperationalError: (1054, "Unknown column 'nan' in 'field list'")

As you can see all my columns have names. I realize mysql/sql support for writing appears in development so perhaps that's the reason? If so is there a work around? Any suggestions would be greatly appreciated.

你可以看到我所有的专栏都有名字。我知道mysql/sql支持的写作出现在开发中,所以这可能是原因?如果是这样的话,还有工作吗?如有任何建议,将不胜感激。

3 个解决方案

#1


20  

Update: starting with pandas 0.15, to_sql supports writing NaN values (they will be written as NULL in the database), so the workaround described below should not be needed anymore (see https://github.com/pydata/pandas/pull/8208).
Pandas 0.15 will be released in coming October, and the feature is merged in the development version.

更新:从panda 0.15开始,to_sql支持编写NaN值(它们将被写入数据库中为NULL),因此下面描述的解决方案应该不再需要了(参见https://github.com/pydata/pandas/pull/8208)。熊猫0.15将在10月份发布,该功能将被合并到开发版本中。


This is probably due to NaN values in your table, and this is a known shortcoming at the moment that the pandas sql functions don't handle NaNs well (https://github.com/pydata/pandas/issues/2754, https://github.com/pydata/pandas/issues/4199)

这可能是由于您的表中的NaN值造成的,这是目前已知的一个缺点,即熊猫sql函数不能很好地处理NaNs (https://github.com/pydata/pandas/issues/2754, https://github.com/pydata/pandas/issues/4199)

As a workaround at this moment (for pandas versions 0.14.1 and lower), you can manually convert the nan values to None with:

作为目前的解决方案(对于熊猫版本0.14.1或更低的版本),您可以使用以下工具手动将nan值转换为None:

df2 = df.astype(object).where(pd.notnull(df), None)

and then write the dataframe to sql. This however converts all columns to object dtype. Because of this, you have to create the database table based on the original dataframe. Eg if your first row does not contain NaNs:

然后将dataframe写入sql。但是,这将所有列转换为对象dtype。因此,您必须基于原始的dataframe创建数据库表。如果你的第一行不包含NaNs:

df[:1].to_sql('table_name', con)
df2[1:].to_sql('table_name', con, if_exists='append')

#2


2  

using the previous solution will change column dtype from float64 to object_.

使用之前的解决方案将列dtype从float64更改为object_。

I have found a better solution, just add the following _write_mysql function:

我找到了一个更好的解决方案,添加以下_write_mysql函数:

from pandas.io import sql

def _write_mysql(frame, table, names, cur):
    bracketed_names = ['`' + column + '`' for column in names]
    col_names = ','.join(bracketed_names)
    wildcards = ','.join([r'%s'] * len(names))
    insert_query = "INSERT INTO %s (%s) VALUES (%s)" % (
        table, col_names, wildcards)

    data = [[None if type(y) == float and np.isnan(y) else y for y in x] for x in frame.values]

    cur.executemany(insert_query, data)

And then override its implementation in pandas as below:

然后重写在熊猫中的实现如下:

sql._write_mysql = _write_mysql

With this code, nan values will be saved correctly in the database without altering the column type.

使用此代码,nan值将被正确地保存在数据库中,而不会改变列类型。

#3


-1  

NaT to MySQL still not handled in pandas 15.2

NaT to MySQL在熊猫15.2中仍然没有处理

#1


20  

Update: starting with pandas 0.15, to_sql supports writing NaN values (they will be written as NULL in the database), so the workaround described below should not be needed anymore (see https://github.com/pydata/pandas/pull/8208).
Pandas 0.15 will be released in coming October, and the feature is merged in the development version.

更新:从panda 0.15开始,to_sql支持编写NaN值(它们将被写入数据库中为NULL),因此下面描述的解决方案应该不再需要了(参见https://github.com/pydata/pandas/pull/8208)。熊猫0.15将在10月份发布,该功能将被合并到开发版本中。


This is probably due to NaN values in your table, and this is a known shortcoming at the moment that the pandas sql functions don't handle NaNs well (https://github.com/pydata/pandas/issues/2754, https://github.com/pydata/pandas/issues/4199)

这可能是由于您的表中的NaN值造成的,这是目前已知的一个缺点,即熊猫sql函数不能很好地处理NaNs (https://github.com/pydata/pandas/issues/2754, https://github.com/pydata/pandas/issues/4199)

As a workaround at this moment (for pandas versions 0.14.1 and lower), you can manually convert the nan values to None with:

作为目前的解决方案(对于熊猫版本0.14.1或更低的版本),您可以使用以下工具手动将nan值转换为None:

df2 = df.astype(object).where(pd.notnull(df), None)

and then write the dataframe to sql. This however converts all columns to object dtype. Because of this, you have to create the database table based on the original dataframe. Eg if your first row does not contain NaNs:

然后将dataframe写入sql。但是,这将所有列转换为对象dtype。因此,您必须基于原始的dataframe创建数据库表。如果你的第一行不包含NaNs:

df[:1].to_sql('table_name', con)
df2[1:].to_sql('table_name', con, if_exists='append')

#2


2  

using the previous solution will change column dtype from float64 to object_.

使用之前的解决方案将列dtype从float64更改为object_。

I have found a better solution, just add the following _write_mysql function:

我找到了一个更好的解决方案,添加以下_write_mysql函数:

from pandas.io import sql

def _write_mysql(frame, table, names, cur):
    bracketed_names = ['`' + column + '`' for column in names]
    col_names = ','.join(bracketed_names)
    wildcards = ','.join([r'%s'] * len(names))
    insert_query = "INSERT INTO %s (%s) VALUES (%s)" % (
        table, col_names, wildcards)

    data = [[None if type(y) == float and np.isnan(y) else y for y in x] for x in frame.values]

    cur.executemany(insert_query, data)

And then override its implementation in pandas as below:

然后重写在熊猫中的实现如下:

sql._write_mysql = _write_mysql

With this code, nan values will be saved correctly in the database without altering the column type.

使用此代码,nan值将被正确地保存在数据库中,而不会改变列类型。

#3


-1  

NaT to MySQL still not handled in pandas 15.2

NaT to MySQL在熊猫15.2中仍然没有处理