如何将Pandas Dataframe写入现有的Django模型

时间:2021-01-03 23:47:53

I am trying to insert data in a Pandas DataFrame into an existing Django model, Agency, that uses a SQLite backend. However, following the answers on How to write a Pandas Dataframe to Django model and Saving a Pandas DataFrame to a Django Model leads to the whole SQLite table being replaced and breaking the Django code. Specifically, it is the Django auto-generated id primary key column that is replaced by index that causes the errors when rendering templates (no such column: agency.id).

我试图将Pandas DataFrame中的数据插入到使用SQLite后端的现有Django模型Agency中。但是,按照如何将一个Pandas Dataframe写入Django模型并将Pandas DataFrame保存到Django模型的答案导致整个SQLite表被替换并打破Django代码。具体来说,它是由Django自动生成的id主键列替换为在呈现模板时导致错误的索引(没有这样的列:agency.id)。

Here is the code and the result of using Pandas to_sql on the SQLite table, agency.

以下是在SQLite表,代理上使用Pandas to_sql的代码和结果。

In models.py:

class Agency(models.Model):
    name = models.CharField(max_length=128)

In myapp/management/commands/populate.py:

class Command(BaseCommand):

def handle(self, *args, **options):

    # Open ModelConnection
    from django.conf import settings
    database_name = settings.DATABASES['default']['NAME']
    database_url = 'sqlite:///{}'.format(database_name)
    engine = create_engine(database_url, echo=False)

    # Insert data data
    agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})
    agencies.to_sql("agency", con=engine, if_exists="replace")

Calling 'python manage.py populate' successfully adds the three agencies into the table:

调用'python manage.py populate'成功地将三个代理添加到表中:

index    name
0        Agency 1
1        Agency 2
2        Agency 3

However, doing so has changed the DDL of the table from:

但是,这样做已经改变了表格的DDL:

CREATE TABLE "agency" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(128) NOT NULL)

to:

CREATE TABLE agency (
  "index" BIGINT, 
  name TEXT
);
CREATE INDEX ix_agency_index ON agency ("index")

How can I add the DataFrame to the model managed by Django and keep the Django ORM intact?

如何将DataFrame添加到Django管理的模型中并保持Django ORM不变?

2 个解决方案

#1


4  

To answer my own question, as I import data using Pandas into Django quite often nowadays, the mistake I was making was trying to use Pandas built-in Sql Alchemy DB ORM which was modifying the underlying database table definition. In the context above, you can simply use the Django ORM to connect and insert the data:

回答我自己的问题,因为我现在经常使用Pandas将数据导入Django,我所犯的错误是试图使用Pandas内置的Sql Alchemy DB ORM来修改底层数据库表定义。在上面的上下文中,您可以简单地使用Django ORM来连接和插入数据:

from myapp.models import Agency

class Command(BaseCommand):

    def handle(self, *args, **options):

        # Process data with Pandas
        agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})

        # iterate over DataFrame and create your objects
        for agency in agencies.itertuples():
            agency = Agency.objects.create(name=agency.name)

However, you may often want to import data using an external script rather than using a management command, as above, or using Django's shell. In this case you must first connect to the Django ORM by calling the setup method:

但是,您可能经常需要使用外部脚本导入数据,而不是使用管理命令(如上所述)或使用Django的shell。在这种情况下,您必须首先通过调用setup方法连接到Django ORM:

import os, sys

import django
import pandas as pd

sys.path.append('../..') # add path to project root dir
os.environ["DJANGO_SETTINGS_MODULE"] = "myproject.settings"

# for more sophisticated setups, if you need to change connection settings (e.g. when using django-environ):
#os.environ["DATABASE_URL"] = "postgres://myuser:mypassword@localhost:54324/mydb"

# Connect to Django ORM
django.setup()

# process data
from myapp.models import Agency
Agency.objects.create(name='MyAgency')
  • Here I have exported my settings module myproject.settings to the DJANGO_SETTINGS_MODULE so that django.setup() can pick up the project settings.

    在这里,我将我的设置模块myproject.settings导出到DJANGO_SETTINGS_MODULE,以便django.setup()可以获取项目设置。

  • Depending on where you run the script from, you may need to path to the system path so Django can find the settings module. In this case, I run my script two directories below my project root.

    根据您运行脚本的位置,您可能需要路径到系统路径,以便Django可以找到设置模块。在这种情况下,我在我的项目根目录下运行我的脚本两个目录。

  • You can modify any settings before calling setup. If your script needs to connect to the DB differently than whats configured in settings. For example, when running a script locally against Django/postgres Docker containers.

    您可以在调用setup之前修改任何设置。如果您的脚本需要以不同于在设置中配置的方式连接到数据库。例如,在本地针对Django / postgres Docker容器运行脚本时。

Note, the above example was using the django-environ to specify DB settings.

注意,上面的示例使用django-environ指定数据库设置。

#2


1  

There is a syntax error in the itertuples, it is missing round brackets.

迭代中存在语法错误,缺少圆括号。

Should be

for agency in agencies.itertuples():
    agency = Agency.objects.create(name=agency.name)

Thank you for sharing your answer.

感谢您分享您的回答。

Reference to pandas 0.22.0 documentation, Link to pandas.DataFrame.itertuples

参考pandas 0.22.0文档,链接到pandas.DataFrame.itertuples

#1


4  

To answer my own question, as I import data using Pandas into Django quite often nowadays, the mistake I was making was trying to use Pandas built-in Sql Alchemy DB ORM which was modifying the underlying database table definition. In the context above, you can simply use the Django ORM to connect and insert the data:

回答我自己的问题,因为我现在经常使用Pandas将数据导入Django,我所犯的错误是试图使用Pandas内置的Sql Alchemy DB ORM来修改底层数据库表定义。在上面的上下文中,您可以简单地使用Django ORM来连接和插入数据:

from myapp.models import Agency

class Command(BaseCommand):

    def handle(self, *args, **options):

        # Process data with Pandas
        agencies = pd.DataFrame({"name": ["Agency 1", "Agency 2", "Agency 3"]})

        # iterate over DataFrame and create your objects
        for agency in agencies.itertuples():
            agency = Agency.objects.create(name=agency.name)

However, you may often want to import data using an external script rather than using a management command, as above, or using Django's shell. In this case you must first connect to the Django ORM by calling the setup method:

但是,您可能经常需要使用外部脚本导入数据,而不是使用管理命令(如上所述)或使用Django的shell。在这种情况下,您必须首先通过调用setup方法连接到Django ORM:

import os, sys

import django
import pandas as pd

sys.path.append('../..') # add path to project root dir
os.environ["DJANGO_SETTINGS_MODULE"] = "myproject.settings"

# for more sophisticated setups, if you need to change connection settings (e.g. when using django-environ):
#os.environ["DATABASE_URL"] = "postgres://myuser:mypassword@localhost:54324/mydb"

# Connect to Django ORM
django.setup()

# process data
from myapp.models import Agency
Agency.objects.create(name='MyAgency')
  • Here I have exported my settings module myproject.settings to the DJANGO_SETTINGS_MODULE so that django.setup() can pick up the project settings.

    在这里,我将我的设置模块myproject.settings导出到DJANGO_SETTINGS_MODULE,以便django.setup()可以获取项目设置。

  • Depending on where you run the script from, you may need to path to the system path so Django can find the settings module. In this case, I run my script two directories below my project root.

    根据您运行脚本的位置,您可能需要路径到系统路径,以便Django可以找到设置模块。在这种情况下,我在我的项目根目录下运行我的脚本两个目录。

  • You can modify any settings before calling setup. If your script needs to connect to the DB differently than whats configured in settings. For example, when running a script locally against Django/postgres Docker containers.

    您可以在调用setup之前修改任何设置。如果您的脚本需要以不同于在设置中配置的方式连接到数据库。例如,在本地针对Django / postgres Docker容器运行脚本时。

Note, the above example was using the django-environ to specify DB settings.

注意,上面的示例使用django-environ指定数据库设置。

#2


1  

There is a syntax error in the itertuples, it is missing round brackets.

迭代中存在语法错误,缺少圆括号。

Should be

for agency in agencies.itertuples():
    agency = Agency.objects.create(name=agency.name)

Thank you for sharing your answer.

感谢您分享您的回答。

Reference to pandas 0.22.0 documentation, Link to pandas.DataFrame.itertuples

参考pandas 0.22.0文档,链接到pandas.DataFrame.itertuples