与原始sql相比,Django ORM性能不佳

时间:2022-09-17 23:14:23

I'm using Django ORM for data query and I get almost 2 million rows in this table.I tried

我正在使用Django ORM进行数据查询,我在这个表中得到了近200万行。我试过了

app_count = App.objects.count()

and

from django.db import connection
cursor = connection.cursor()
cursor.execute('''SELECT count(*) FROM app''')

The mysql slow_query log gave me

mysql slow_query日志给了我

Time: 2017-04-27T09:18:38.809498Z

时间:2017-04-27T09:18:38.809498Z

User@Host: www[www] @ [172.19.0.3] Id: 5

User @ Host:www [www] @ [172.19.0.3] Id:5

Query_time: 4.107433 Lock_time: 0.004405 Rows_sent: 1 Rows_examined: 0

Query_time:4.107433 Lock_time:0.004405 Rows_sent:1 Rows_examined:0

use app_platform; SET timestamp=1493284718; SELECT count(*) FROM app;

使用app_platform; SET时间戳= 1493284718; SELECT count(*)FROM app;

This query took avg more than 4 seconds, but when I used mysql client and mysql shell to do this query

这个查询花了超过4秒的avg,但是当我使用mysql客户端和mysql shell来做这个查询时

mysql> select count(*) from app;

+----------+
| count(*) |
+----------+
|  1870019 |
+----------+

1 row in set (0.41 sec)

Just take me 0.4 second, 10X difference, Why and how to improve it.

只需要0.4秒,10倍的差异,为什么以及如何改进它。

EDIT

here is my model

这是我的模特

class AppMain(models.Model):
    """
    """
    store = models.ForeignKey("AppStore", related_name="main_store")
    name = models.CharField(max_length=256)
    version = models.CharField(max_length=256, blank=True)
    developer = models.CharField(db_index=True, max_length=256, blank=True)
    md5 = models.CharField(max_length=256, blank=True)
    type = models.CharField(max_length=256, blank=True)
    size = models.IntegerField(blank=True)
    download = models.CharField(max_length=1024, blank=True)
    download_md5 = models.CharField(max_length=256, blank=True)
    download_times = models.BigIntegerField(blank=True)
    snapshot = models.CharField(max_length=2048, blank=True)
    description = models.CharField(max_length=5000, blank=True)
    app_update_time = models.DateTimeField(blank=True)
    create_time = models.DateTimeField(db_index=True, auto_now_add=True)
    update_time = models.DateTimeField(auto_now=True)

    class Meta:
        unique_together = ("store", "name", "version")

EDIT 2

And I'm using Docker and docker-compose for my project

我正在使用Docker和docker-compose来完成我的项目

version: '2'
services:
  mysqldb:
    restart: always
    image: mysql:latest
    ports:
      - "3306:3306"
    environment:
      MYSQL_ROOT_PASSWORD: just_for_test
      MYSQL_USER: www
      MYSQL_PASSWORD: www
      MYSQL_DATABASE: app_platform
    volumes:
      - mysqldata:/var/lib/mysql
      - ./config/:/etc/mysql/conf.d
      - ./log/mysql/:/var/log/mysql/
  web:
    restart: always
    build: ./app_platform/app_platform
    env_file: .env
    environment:
      PYTHONPATH: '/usr/src/app/app_platform'
    command: bash -c "gunicorn --chdir /usr/src/app/app_platform app_platform.wsgi:application  -k gevent  -w 6 -b :8000 --timeout 8000 --reload"
    volumes:
      - ./app_platform:/usr/src/app
      - ./sqldata:/usr/src/sqldata
      - /usr/src/app/static
    ports:
      - "8000"
    dns:
        - 114.114.114.114
        - 8.8.8.8
    links:
      - mysqldb
  nginx:
    restart: always
    build: ./nginx/
    ports:
      - "80:80"
    volumes:
      - ./app_platform:/usr/src/app
      - ./nginx/sites-enabled/:/etc/nginx/sites-enabled
    links:
      - web:web
volumes:
    mysqldata:

And my django settings look like this:

import os
from django.utils.translation import ugettext_lazy as _

LANGUAGES = (
    ('en', _('English')),
    ('zh-CN', _('Chinese')),
)


LANGUAGE_CODE = 'zh-CN'

BASE_DIR = os.path.dirname(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

LOCALE_PATHS = (
    os.path.join(BASE_DIR, "locale"),
)

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'just_for_test'

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework',
    'app_scrapy',
    'app_user',
    'app_api',
    'app_check',
    'common',
    'debug_toolbar',
]


MIDDLEWARE_CLASSES = [
    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'debug_toolbar.middleware.DebugToolbarMiddleware',
    'django.middleware.locale.LocaleMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
]

AUTH_USER_MODEL = 'app_user.MyUser'

AUTHENTICATION_BACKENDS = (
    'app_user.models.CustomAuth', 'django.contrib.auth.backends.ModelBackend')


ROOT_URLCONF = 'app_platform.urls'


TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': ["/usr/src/app/app_platform/templates"],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.template.context_processors.i18n',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages',
            ],
        },
    },
]

WSGI_APPLICATION = 'app_platform.wsgi.application'

LOGIN_REDIRECT_URL = '/'
LOGIN_URL = '/login/'
# Database
# https://docs.djangoproject.com/en/1.9/ref/settings/#databases
# Password validation
# https://docs.djangoproject.com/en/1.9/ref/settings/#auth-password-validators

AUTH_PASSWORD_VALIDATORS = [
    {
        'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
    },
]

STATICFILES_FINDERS = (
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder'
)

# Internationalization
# https://docs.djangoproject.com/en/1.9/topics/i18n/

TIME_ZONE = 'Asia/Shanghai'

USE_I18N = True

USE_L10N = True

USE_TZ = True


# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.9/howto/static-files/

STATIC_ROOT = "/static/"

STATIC_URL = '/static/'

STATICFILES_DIRS = (
    'public/static/',
)


DEBUG = True

ALLOWED_HOSTS = []

REST_FRAMEWORK = {
    'DEFAULT_AUTHENTICATION_CLASSES': (
        'rest_framework.authentication.BasicAuthentication',
        'rest_framework.authentication.SessionAuthentication',
    ),
    'DEFAULT_PERMISSION_CLASSES': (
        'rest_framework.permissions.AllowAny',
    ),
    'DEFAULT_PAGINATION_CLASS':
        'rest_framework.pagination.LimitOffsetPagination',
        'PAGE_SIZE': 5,
}

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'app_platform',
        'USER': 'www',
        'PASSWORD': 'www',
        'HOST': 'mysqldb',   # Or an IP Address that your DB is hosted on
        'PORT': '3306',
    }
}

DEBUG_TOOLBAR_CONFIG = {
    "SHOW_TOOLBAR_CALLBACK": lambda request: True,
}

My app table info

我的app表信息

CREATE TABLE `app` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(256) NOT NULL,
  `version` varchar(256) NOT NULL,
  `developer` varchar(256) NOT NULL,
  `md5` varchar(256) NOT NULL,
  `type` varchar(256) NOT NULL,
  `size` int(11) NOT NULL,
  `download` varchar(1024) NOT NULL,
  `download_md5` varchar(256) NOT NULL,
  `download_times` bigint(20) NOT NULL,
  `snapshot` varchar(2048) NOT NULL,
  `description` varchar(5000) NOT NULL,
  `app_update_time` datetime(6) NOT NULL,
  `create_time` datetime(6) NOT NULL,
  `update_time` datetime(6) NOT NULL,
  `store_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `app_store_id_6822fab1_uniq` (`store_id`,`name`,`version`),
  KEY `app_7473547c` (`store_id`),
  KEY `app_developer_b74bcd8e_uniq` (`developer`),
  KEY `app_create_time_a071d977_uniq` (`create_time`),
  CONSTRAINT `app_store_id_aef091c6_fk_app_scrapy_appstore_id` FOREIGN KEY (`store_id`) REFERENCES `app_scrapy_appstore` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1870020 DEFAULT CHARSET=utf8;

EDIT 3

Here is EXPLAIN SELECT COUNT(*) FROM app;

这是EXPLAIN SELECT COUNT(*)FROM app;

mysql> EXPLAIN SELECT COUNT(*) FROM `app`;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)

EDIT 4

Here is my mysql.cnf

这是我的mysql.cnf

innodb_read_io_threads=12
innodb_write_io_threads=12
innodb_io_capacity=300
innodb_read_io_threads=12
innodb_write_io_threads=12  #To stress the double write buffer
innodb_buffer_pool_size=3G
innodb_log_file_size = 32M #Small log files, more page flush
innodb_log_buffer_size=8M
innodb_flush_method=O_DIRECT

My docker settings is 2 CPUS and 4GB Memory

我的泊坞窗设置是2 CPUS和4GB内存

EDIT 5

When I ran ORM query inside django shell, just took me 0.5-1 second. So the problem is about docker settings? or maybe gunicorn settings?

当我在django shell里面运行ORM查询时,我只花了0.5-1秒。所以问题是关于docker设置?或者也许是gunicorn设置?

1 个解决方案

#1


6  

10X -- I like it. That exactly matches my Rule of Thumb: "If the data is not cached, the query will take 10 times as long as if it is cached." (Rick's RoTs)

10X - 我喜欢它。这完全符合我的经验法则:“如果数据没有被缓存,查询将花费10倍于缓存的时间。” (里克的RoTs)

But, let's move on to the real question: "4.1s is too slow, what can I do about it."

但是,让我们继续讨论真正的问题:“4.1s太慢了,我该怎么办呢。”

  • Change your app so you don't need the number of rows. Have you noticed that Search engines no longer say "out of 12345678 hits"?

    更改您的应用,这样您就不需要行数。您是否注意到搜索引擎不再说出“12345678点击”?

  • Keep an estimate, rather than recomputing.

    保持估计,而不是重新计算。

  • Let's see EXPLAIN SELECT COUNT(*) FROM app; it might give some more clues. (One place you say app, another you say app_scrapy_appmain; are they the same??)

    我们来看看EXPLAIN SELECT COUNT(*)FROM app;它可能会提供更多线索。 (一个地方你说app,另一个你说app_scrapy_appmain;它们是一样的吗??)

  • As long as you never DELETE any rows, this would give you the same answer: SELECT MAX(id) FROM app, and run "instantly". (Once a DELETE, ROLLBACK, etc) happens, id(s) will be lost, so the COUNT will be less than MAX.)

    只要你永远不删除任何行,这将给你相同的答案:SELECT MAX(id)FROM app,并立即运行。 (一旦发生DELETE,ROLLBACK等),id(s)将丢失,因此COUNT将小于MAX。)

More

更多

innodb_buffer_pool_size=3G is probably too much on only 4GB of RAM. If MySQL swaps, performance becomes really bad. Suggest only 2G, unless you can see that it is not swapping.

innodb_buffer_pool_size = 3G可能只有4GB的RAM太多了。如果MySQL交换,性能变得非常糟糕。建议只有2G,除非你可以看到它没有交换。

Note: Scanning 1.8M rows is destined to take at least 0.4s on that hardware, or probably any hardware. It takes time do to the task. Also, doing the 'long' query interferes in two ways with other tasks: It consumes CPU and/or I/O while performing the query, plus it may be bumping other blocks out of cache, causing them to slow down. So, I really think the 'right' thing to do is to take heed of my hints on avoiding COUNT(*). Here's another one:

注意:扫描1.8M行在该硬件或任何硬件上注定至少需要0.4秒。这项任务需要时间。此外,执行“长”查询会以两种方式干扰其他任务:它在执行查询时会消耗CPU和/或I / O,而且可能会将其他块从缓存中移出,导致它们变慢。所以,我真的认为“正确”的做法是注意避免COUNT(*)的提示。这是另一个:

  • Build and maintain a "Summary Table" of daily subtotals of this (and other) table. Include in it the daily COUNT(*) and whatever else you might like to have. This will even shrink the 0.4s timing by using SUM(subtotal) from this table. More on Summary Tables.
  • 构建并维护此(和其他)表的每日小计的“汇总表”。在其中包含每日COUNT(*)以及您可能想要的任何其他内容。这甚至可以通过使用此表中的SUM(小计)缩小0.4s时序。更多关于汇总表。

#1


6  

10X -- I like it. That exactly matches my Rule of Thumb: "If the data is not cached, the query will take 10 times as long as if it is cached." (Rick's RoTs)

10X - 我喜欢它。这完全符合我的经验法则:“如果数据没有被缓存,查询将花费10倍于缓存的时间。” (里克的RoTs)

But, let's move on to the real question: "4.1s is too slow, what can I do about it."

但是,让我们继续讨论真正的问题:“4.1s太慢了,我该怎么办呢。”

  • Change your app so you don't need the number of rows. Have you noticed that Search engines no longer say "out of 12345678 hits"?

    更改您的应用,这样您就不需要行数。您是否注意到搜索引擎不再说出“12345678点击”?

  • Keep an estimate, rather than recomputing.

    保持估计,而不是重新计算。

  • Let's see EXPLAIN SELECT COUNT(*) FROM app; it might give some more clues. (One place you say app, another you say app_scrapy_appmain; are they the same??)

    我们来看看EXPLAIN SELECT COUNT(*)FROM app;它可能会提供更多线索。 (一个地方你说app,另一个你说app_scrapy_appmain;它们是一样的吗??)

  • As long as you never DELETE any rows, this would give you the same answer: SELECT MAX(id) FROM app, and run "instantly". (Once a DELETE, ROLLBACK, etc) happens, id(s) will be lost, so the COUNT will be less than MAX.)

    只要你永远不删除任何行,这将给你相同的答案:SELECT MAX(id)FROM app,并立即运行。 (一旦发生DELETE,ROLLBACK等),id(s)将丢失,因此COUNT将小于MAX。)

More

更多

innodb_buffer_pool_size=3G is probably too much on only 4GB of RAM. If MySQL swaps, performance becomes really bad. Suggest only 2G, unless you can see that it is not swapping.

innodb_buffer_pool_size = 3G可能只有4GB的RAM太多了。如果MySQL交换,性能变得非常糟糕。建议只有2G,除非你可以看到它没有交换。

Note: Scanning 1.8M rows is destined to take at least 0.4s on that hardware, or probably any hardware. It takes time do to the task. Also, doing the 'long' query interferes in two ways with other tasks: It consumes CPU and/or I/O while performing the query, plus it may be bumping other blocks out of cache, causing them to slow down. So, I really think the 'right' thing to do is to take heed of my hints on avoiding COUNT(*). Here's another one:

注意:扫描1.8M行在该硬件或任何硬件上注定至少需要0.4秒。这项任务需要时间。此外,执行“长”查询会以两种方式干扰其他任务:它在执行查询时会消耗CPU和/或I / O,而且可能会将其他块从缓存中移出,导致它们变慢。所以,我真的认为“正确”的做法是注意避免COUNT(*)的提示。这是另一个:

  • Build and maintain a "Summary Table" of daily subtotals of this (and other) table. Include in it the daily COUNT(*) and whatever else you might like to have. This will even shrink the 0.4s timing by using SUM(subtotal) from this table. More on Summary Tables.
  • 构建并维护此(和其他)表的每日小计的“汇总表”。在其中包含每日COUNT(*)以及您可能想要的任何其他内容。这甚至可以通过使用此表中的SUM(小计)缩小0.4s时序。更多关于汇总表。