I have a pandas dataframe that is dynamically created with columns names that vary. I'm trying to push them to sql, but don't want them to go to mssqlserver as the default datatype "text" (can anyone explain why this is the default? Wouldn't it make sense to use a more common datatype?)
我有一个动态创建的熊猫dataframe,它的列名称不同。我试图将它们推到sql,但不希望它们作为默认的数据类型“text”访问mssqlserver(有人能解释为什么这是默认的吗?使用更常见的数据类型不是很有意义吗?)
Does anyone know how I can specify a datatype for all columns?
有人知道如何为所有列指定数据类型吗?
column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = #Data type for all columns#)
the dtype argument takes a dict, and since I don't know what the columns will be it is hard to set them all to be 'sqlalchemy.types.NVARCHAR'
dtype参数需要一个命令,因为我不知道这些列将会是什么,所以很难将它们都设置为“sqlalchemy.types.NVARCHAR”。
This is what I would like to do:
这就是我想做的:
column_errors.to_sql('load_errors',push_conn, if_exists = 'append', index = False, dtype = 'sqlalchemy.types.NVARCHAR')
Any help/understanding of how best to specify all column types would be much appreciated!
任何帮助/理解如何最好地指定所有列类型将非常感谢!
2 个解决方案
#1
18
You can create this dict dynamically if you do not know the column names in advance:
如果您事先不知道列名,您可以动态地创建此命令:
from sqlalchemy.types import NVARCHAR
df.to_sql(...., dtype={col_name: NVARCHAR for col_name in df})
Note that you have to pass the sqlalchemy type object itself (or an instance to specify parameters like NVARCHAR(length=10)
) and not a string as in your example.
注意,您必须传递sqlalchemy类型对象本身(或一个实例来指定像NVARCHAR(length=10)这样的参数),而不是像您的示例中的字符串。
#2
10
To use dtype, pass a dictionary keyed to each data frame column with corresponding sqlalchemy types. Change keys to actual data frame column names:
要使用dtype,可以将一个字典键入到每个数据框架列,并使用相应的sqlalchemy类型。将键更改为实际的数据帧列名:
import sqlalchemy
import pandas as pd
...
column_errors.to_sql('load_errors',push_conn,
if_exists = 'append',
index = False,
dtype={'datefld': sqlalchemy.DateTime(),
'intfld': sqlalchemy.types.INTEGER(),
'strfld': sqlalchemy.types.NVARCHAR(length=255)
'floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True)
'booleanfld': sqlalchemy.types.Boolean})
You may even be able to dynamically create this dtype
dictionary given you do not know column names or types beforehand:
您甚至可以在事先不知道列名或类型的情况下动态创建这个dtype字典:
def sqlcol(dfparam):
dtypedict = {}
for i,j in zip(dfparam.columns, dfparam.dtypes):
if "object" in str(j):
dtypedict.update({i: sqlalchemy.types.NVARCHAR(length=255)})
if "datetime" in str(j):
dtypedict.update({i: sqlalchemy.types.DateTime()})
if "float" in str(j):
dtypedict.update({i: sqlalchemy.types.Float(precision=3, asdecimal=True)})
if "int" in str(j):
dtypedict.update({i: sqlalchemy.types.INT()})
return dtypedict
outputdict = sqlcol(df)
column_errors.to_sql('load_errors',
push_conn,
if_exists = 'append',
index = False,
dtype = outputdict)
#1
18
You can create this dict dynamically if you do not know the column names in advance:
如果您事先不知道列名,您可以动态地创建此命令:
from sqlalchemy.types import NVARCHAR
df.to_sql(...., dtype={col_name: NVARCHAR for col_name in df})
Note that you have to pass the sqlalchemy type object itself (or an instance to specify parameters like NVARCHAR(length=10)
) and not a string as in your example.
注意,您必须传递sqlalchemy类型对象本身(或一个实例来指定像NVARCHAR(length=10)这样的参数),而不是像您的示例中的字符串。
#2
10
To use dtype, pass a dictionary keyed to each data frame column with corresponding sqlalchemy types. Change keys to actual data frame column names:
要使用dtype,可以将一个字典键入到每个数据框架列,并使用相应的sqlalchemy类型。将键更改为实际的数据帧列名:
import sqlalchemy
import pandas as pd
...
column_errors.to_sql('load_errors',push_conn,
if_exists = 'append',
index = False,
dtype={'datefld': sqlalchemy.DateTime(),
'intfld': sqlalchemy.types.INTEGER(),
'strfld': sqlalchemy.types.NVARCHAR(length=255)
'floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True)
'booleanfld': sqlalchemy.types.Boolean})
You may even be able to dynamically create this dtype
dictionary given you do not know column names or types beforehand:
您甚至可以在事先不知道列名或类型的情况下动态创建这个dtype字典:
def sqlcol(dfparam):
dtypedict = {}
for i,j in zip(dfparam.columns, dfparam.dtypes):
if "object" in str(j):
dtypedict.update({i: sqlalchemy.types.NVARCHAR(length=255)})
if "datetime" in str(j):
dtypedict.update({i: sqlalchemy.types.DateTime()})
if "float" in str(j):
dtypedict.update({i: sqlalchemy.types.Float(precision=3, asdecimal=True)})
if "int" in str(j):
dtypedict.update({i: sqlalchemy.types.INT()})
return dtypedict
outputdict = sqlcol(df)
column_errors.to_sql('load_errors',
push_conn,
if_exists = 'append',
index = False,
dtype = outputdict)