如何将Pandas Dataframe写入Django模型

时间:2021-01-03 23:48:17

I have been using pandas in python and I usually write a dataframe to my db table as below. I am now now migrating to Django, how can I write the same dataframe to a table through a model called MyModel? Assistance really appreciated.

我一直在python中使用pandas,我通常将数据帧写入我的db表,如下所示。我现在正在迁移到Django,如何通过名为MyModel的模型将相同的数据帧写入表中?援助真的很感激。

# Original pandas code
    engine = create_engine('postgresql://myuser:mypassword@localhost:5432/mydb', echo=False)
    mydataframe.to_sql('mytable', engine,if_exists='append',index=True)

2 个解决方案

#1


9  

Use your own pandas code along side a Django model that is mapped to the same SQL table

I am not aware of any explicit support to write a pandas dataframe to a Django model. However, in a Django app, you can still use your own code to read or write to the database, in addition to using the ORM (e.g. through your Django model)

我不知道有任何明确的支持将pandas数据帧写入Django模型。但是,在Django应用程序中,除了使用ORM(例如通过Django模型)之外,您仍然可以使用自己的代码来读取或写入数据库。

And given that you most likely have data in the database previously written by pandas' to_sql, you can keep using the same database and the same pandas code and simply create a Django model that can access that table

鉴于您最有可能在数据库中拥有以前由pandas的to_sql编写的数据,您可以继续使用相同的数据库和相同的pandas代码,只需创建一个可以访问该表的Django模型

e.g. if your pandas code was writing to SQL table mytable, simply create a model like this:

例如如果您的pandas代码正在写入SQL表mytable,只需创建一个这样的模型:

class MyModel(Model):
    class Meta:
        db_table = 'mytable' # This tells Django where the SQL table is
        managed = False # Use this if table already exists
                        # and doesn't need to be managed by Django

    field_1 = ...
    field_2 = ...

Now you can use this model from Django simultaneously with your existing pandas code (possibly in a single Django app)

现在,您可以将Django中的此模型与现有的pandas代码同时使用(可能在单个Django应用程序中)

Django database settings

To get the same DB credentials into the pandas SQL functions simply read the fields from Django settings, e.g.:

要获得相同的DB凭据到pandas SQL函数只需读取Django设置中的字段,例如:

from django.conf import settings

user = settings.DATABASES['default']['USER']
password = settings.DATABASES['default']['PASSWORD']
database_name = settings.DATABASES['default']['NAME']
# host = settings.DATABASES['default']['HOST']
# port = settings.DATABASES['default']['PORT']

database_url = 'postgresql://{user}:{password}@localhost:5432/{database_name}'.format(
    user=user,
    password=password,
    database_name=database_name,
)

engine = create_engine(database_url, echo=False)

The alternative is not recommended as it's inefficient

I don't really see a way beside reading the dataframe row by row and then creating a model instance, and saving it, which is really slow. You might get away with some batch insert operation, but why bother since pandas' to_sql already does that for us. And reading Django querysets into a pandas dataframe is just inefficient when pandas can do that faster for us too.

我真的没有看到旁边一行读取数据帧然后创建模型实例并保存它的方法,这非常慢。你可能会逃避一些批量插入操作,但为什么麻烦,因为pandas的to_sql已经为我们做了。将pangas查询集读入pandas数据帧只是效率低下,当大熊猫也能为我们做得更快时。

# Doing it like this is slow
for index, row in df.iterrows():
     model = MyModel()
     model.field_1 = row['field_1']
     model.save()

#2


6  

I'm just going through the same exercise at the moment. The approach I've taken is to create a list of new objects from the DataFrame and then bulk create them:

我现在正在进行同样的练习。我采用的方法是从DataFrame创建一个新对象列表,然后批量创建它们:

bulk_create(objs, batch_size=None)

This method inserts the provided list of objects into the database in an efficient manner (generally only 1 query, no matter how many objects there are)

此方法以有效的方式将提供的对象列表插入到数据库中(通常只有1个查询,无论有多少个对象)

An example might look like this:

示例可能如下所示:

# Not able to iterate directly over the DataFrame
df_records = df.to_dict('records')

model_instances = [MyModel(
    field_1=record['field_1'],
    field_2=record['field_2'],
) for record in df_records]

MyModel.objects.bulk_create(model_instances)

#1


9  

Use your own pandas code along side a Django model that is mapped to the same SQL table

I am not aware of any explicit support to write a pandas dataframe to a Django model. However, in a Django app, you can still use your own code to read or write to the database, in addition to using the ORM (e.g. through your Django model)

我不知道有任何明确的支持将pandas数据帧写入Django模型。但是,在Django应用程序中,除了使用ORM(例如通过Django模型)之外,您仍然可以使用自己的代码来读取或写入数据库。

And given that you most likely have data in the database previously written by pandas' to_sql, you can keep using the same database and the same pandas code and simply create a Django model that can access that table

鉴于您最有可能在数据库中拥有以前由pandas的to_sql编写的数据,您可以继续使用相同的数据库和相同的pandas代码,只需创建一个可以访问该表的Django模型

e.g. if your pandas code was writing to SQL table mytable, simply create a model like this:

例如如果您的pandas代码正在写入SQL表mytable,只需创建一个这样的模型:

class MyModel(Model):
    class Meta:
        db_table = 'mytable' # This tells Django where the SQL table is
        managed = False # Use this if table already exists
                        # and doesn't need to be managed by Django

    field_1 = ...
    field_2 = ...

Now you can use this model from Django simultaneously with your existing pandas code (possibly in a single Django app)

现在,您可以将Django中的此模型与现有的pandas代码同时使用(可能在单个Django应用程序中)

Django database settings

To get the same DB credentials into the pandas SQL functions simply read the fields from Django settings, e.g.:

要获得相同的DB凭据到pandas SQL函数只需读取Django设置中的字段,例如:

from django.conf import settings

user = settings.DATABASES['default']['USER']
password = settings.DATABASES['default']['PASSWORD']
database_name = settings.DATABASES['default']['NAME']
# host = settings.DATABASES['default']['HOST']
# port = settings.DATABASES['default']['PORT']

database_url = 'postgresql://{user}:{password}@localhost:5432/{database_name}'.format(
    user=user,
    password=password,
    database_name=database_name,
)

engine = create_engine(database_url, echo=False)

The alternative is not recommended as it's inefficient

I don't really see a way beside reading the dataframe row by row and then creating a model instance, and saving it, which is really slow. You might get away with some batch insert operation, but why bother since pandas' to_sql already does that for us. And reading Django querysets into a pandas dataframe is just inefficient when pandas can do that faster for us too.

我真的没有看到旁边一行读取数据帧然后创建模型实例并保存它的方法,这非常慢。你可能会逃避一些批量插入操作,但为什么麻烦,因为pandas的to_sql已经为我们做了。将pangas查询集读入pandas数据帧只是效率低下,当大熊猫也能为我们做得更快时。

# Doing it like this is slow
for index, row in df.iterrows():
     model = MyModel()
     model.field_1 = row['field_1']
     model.save()

#2


6  

I'm just going through the same exercise at the moment. The approach I've taken is to create a list of new objects from the DataFrame and then bulk create them:

我现在正在进行同样的练习。我采用的方法是从DataFrame创建一个新对象列表,然后批量创建它们:

bulk_create(objs, batch_size=None)

This method inserts the provided list of objects into the database in an efficient manner (generally only 1 query, no matter how many objects there are)

此方法以有效的方式将提供的对象列表插入到数据库中(通常只有1个查询,无论有多少个对象)

An example might look like this:

示例可能如下所示:

# Not able to iterate directly over the DataFrame
df_records = df.to_dict('records')

model_instances = [MyModel(
    field_1=record['field_1'],
    field_2=record['field_2'],
) for record in df_records]

MyModel.objects.bulk_create(model_instances)