I'm using Django ORM for data query and I get almost 2 million rows in this table.I tried
我正在使用Django ORM进行数据查询,我在这个表中得到了近200万行。我试过了
app_count = App.objects.count()
and
和
from django.db import connection
cursor = connection.cursor()
cursor.execute('''SELECT count(*) FROM app''')
The mysql slow_query log gave me
mysql slow_query日志给了我
Time: 2017-04-27T09:18:38.809498Z
时间:2017-04-27T09:18:38.809498Z
User@Host: www[www] @ [172.19.0.3] Id: 5
User @ Host:www [www] @ [172.19.0.3] Id:5
Query_time: 4.107433 Lock_time: 0.004405 Rows_sent: 1 Rows_examined: 0
Query_time:4.107433 Lock_time:0.004405 Rows_sent:1 Rows_examined:0
use app_platform; SET timestamp=1493284718; SELECT count(*) FROM app;
使用app_platform; SET时间戳= 1493284718; SELECT count(*)FROM app;
This query took avg more than 4 seconds, but when I used mysql client and mysql shell to do this query
这个查询花了超过4秒的avg,但是当我使用mysql客户端和mysql shell来做这个查询时
mysql> select count(*) from app;
+----------+
| count(*) |
+----------+
| 1870019 |
+----------+
1 row in set (0.41 sec)
Just take me 0.4 second, 10X difference, Why and how to improve it.
只需要0.4秒,10倍的差异,为什么以及如何改进它。
EDIT
here is my model
这是我的模特
class AppMain(models.Model):
"""
"""
store = models.ForeignKey("AppStore", related_name="main_store")
name = models.CharField(max_length=256)
version = models.CharField(max_length=256, blank=True)
developer = models.CharField(db_index=True, max_length=256, blank=True)
md5 = models.CharField(max_length=256, blank=True)
type = models.CharField(max_length=256, blank=True)
size = models.IntegerField(blank=True)
download = models.CharField(max_length=1024, blank=True)
download_md5 = models.CharField(max_length=256, blank=True)
download_times = models.BigIntegerField(blank=True)
snapshot = models.CharField(max_length=2048, blank=True)
description = models.CharField(max_length=5000, blank=True)
app_update_time = models.DateTimeField(blank=True)
create_time = models.DateTimeField(db_index=True, auto_now_add=True)
update_time = models.DateTimeField(auto_now=True)
class Meta:
unique_together = ("store", "name", "version")
EDIT 2
And I'm using Docker and docker-compose for my project
我正在使用Docker和docker-compose来完成我的项目
version: '2'
services:
mysqldb:
restart: always
image: mysql:latest
ports:
- "3306:3306"
environment:
MYSQL_ROOT_PASSWORD: just_for_test
MYSQL_USER: www
MYSQL_PASSWORD: www
MYSQL_DATABASE: app_platform
volumes:
- mysqldata:/var/lib/mysql
- ./config/:/etc/mysql/conf.d
- ./log/mysql/:/var/log/mysql/
web:
restart: always
build: ./app_platform/app_platform
env_file: .env
environment:
PYTHONPATH: '/usr/src/app/app_platform'
command: bash -c "gunicorn --chdir /usr/src/app/app_platform app_platform.wsgi:application -k gevent -w 6 -b :8000 --timeout 8000 --reload"
volumes:
- ./app_platform:/usr/src/app
- ./sqldata:/usr/src/sqldata
- /usr/src/app/static
ports:
- "8000"
dns:
- 114.114.114.114
- 8.8.8.8
links:
- mysqldb
nginx:
restart: always
build: ./nginx/
ports:
- "80:80"
volumes:
- ./app_platform:/usr/src/app
- ./nginx/sites-enabled/:/etc/nginx/sites-enabled
links:
- web:web
volumes:
mysqldata:
And my django settings look like this:
import os
from django.utils.translation import ugettext_lazy as _
LANGUAGES = (
('en', _('English')),
('zh-CN', _('Chinese')),
)
LANGUAGE_CODE = 'zh-CN'
BASE_DIR = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
LOCALE_PATHS = (
os.path.join(BASE_DIR, "locale"),
)
# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'just_for_test'
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework',
'app_scrapy',
'app_user',
'app_api',
'app_check',
'common',
'debug_toolbar',
]
MIDDLEWARE_CLASSES = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'debug_toolbar.middleware.DebugToolbarMiddleware',
'django.middleware.locale.LocaleMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]
AUTH_USER_MODEL = 'app_user.MyUser'
AUTHENTICATION_BACKENDS = (
'app_user.models.CustomAuth', 'django.contrib.auth.backends.ModelBackend')
ROOT_URLCONF = 'app_platform.urls'
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': ["/usr/src/app/app_platform/templates"],
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.template.context_processors.i18n',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
WSGI_APPLICATION = 'app_platform.wsgi.application'
LOGIN_REDIRECT_URL = '/'
LOGIN_URL = '/login/'
# Database
# https://docs.djangoproject.com/en/1.9/ref/settings/#databases
# Password validation
# https://docs.djangoproject.com/en/1.9/ref/settings/#auth-password-validators
AUTH_PASSWORD_VALIDATORS = [
{
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
},
]
STATICFILES_FINDERS = (
'django.contrib.staticfiles.finders.FileSystemFinder',
'django.contrib.staticfiles.finders.AppDirectoriesFinder'
)
# Internationalization
# https://docs.djangoproject.com/en/1.9/topics/i18n/
TIME_ZONE = 'Asia/Shanghai'
USE_I18N = True
USE_L10N = True
USE_TZ = True
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.9/howto/static-files/
STATIC_ROOT = "/static/"
STATIC_URL = '/static/'
STATICFILES_DIRS = (
'public/static/',
)
DEBUG = True
ALLOWED_HOSTS = []
REST_FRAMEWORK = {
'DEFAULT_AUTHENTICATION_CLASSES': (
'rest_framework.authentication.BasicAuthentication',
'rest_framework.authentication.SessionAuthentication',
),
'DEFAULT_PERMISSION_CLASSES': (
'rest_framework.permissions.AllowAny',
),
'DEFAULT_PAGINATION_CLASS':
'rest_framework.pagination.LimitOffsetPagination',
'PAGE_SIZE': 5,
}
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 'app_platform',
'USER': 'www',
'PASSWORD': 'www',
'HOST': 'mysqldb', # Or an IP Address that your DB is hosted on
'PORT': '3306',
}
}
DEBUG_TOOLBAR_CONFIG = {
"SHOW_TOOLBAR_CALLBACK": lambda request: True,
}
My app table info
我的app表信息
CREATE TABLE `app` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(256) NOT NULL,
`version` varchar(256) NOT NULL,
`developer` varchar(256) NOT NULL,
`md5` varchar(256) NOT NULL,
`type` varchar(256) NOT NULL,
`size` int(11) NOT NULL,
`download` varchar(1024) NOT NULL,
`download_md5` varchar(256) NOT NULL,
`download_times` bigint(20) NOT NULL,
`snapshot` varchar(2048) NOT NULL,
`description` varchar(5000) NOT NULL,
`app_update_time` datetime(6) NOT NULL,
`create_time` datetime(6) NOT NULL,
`update_time` datetime(6) NOT NULL,
`store_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `app_store_id_6822fab1_uniq` (`store_id`,`name`,`version`),
KEY `app_7473547c` (`store_id`),
KEY `app_developer_b74bcd8e_uniq` (`developer`),
KEY `app_create_time_a071d977_uniq` (`create_time`),
CONSTRAINT `app_store_id_aef091c6_fk_app_scrapy_appstore_id` FOREIGN KEY (`store_id`) REFERENCES `app_scrapy_appstore` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1870020 DEFAULT CHARSET=utf8;
EDIT 3
Here is EXPLAIN SELECT COUNT(*) FROM app
;
这是EXPLAIN SELECT COUNT(*)FROM app;
mysql> EXPLAIN SELECT COUNT(*) FROM `app`;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
1 row in set, 1 warning (0.00 sec)
EDIT 4
Here is my mysql.cnf
这是我的mysql.cnf
innodb_read_io_threads=12
innodb_write_io_threads=12
innodb_io_capacity=300
innodb_read_io_threads=12
innodb_write_io_threads=12 #To stress the double write buffer
innodb_buffer_pool_size=3G
innodb_log_file_size = 32M #Small log files, more page flush
innodb_log_buffer_size=8M
innodb_flush_method=O_DIRECT
My docker settings is 2 CPUS and 4GB Memory
我的泊坞窗设置是2 CPUS和4GB内存
EDIT 5
When I ran ORM query inside django shell, just took me 0.5-1 second. So the problem is about docker settings? or maybe gunicorn settings?
当我在django shell里面运行ORM查询时,我只花了0.5-1秒。所以问题是关于docker设置?或者也许是gunicorn设置?
1 个解决方案
#1
6
10X -- I like it. That exactly matches my Rule of Thumb: "If the data is not cached, the query will take 10 times as long as if it is cached." (Rick's RoTs)
10X - 我喜欢它。这完全符合我的经验法则:“如果数据没有被缓存,查询将花费10倍于缓存的时间。” (里克的RoTs)
But, let's move on to the real question: "4.1s is too slow, what can I do about it."
但是,让我们继续讨论真正的问题:“4.1s太慢了,我该怎么办呢。”
-
Change your app so you don't need the number of rows. Have you noticed that Search engines no longer say "out of 12345678 hits"?
更改您的应用,这样您就不需要行数。您是否注意到搜索引擎不再说出“12345678点击”?
-
Keep an estimate, rather than recomputing.
保持估计,而不是重新计算。
-
Let's see
EXPLAIN SELECT COUNT(*) FROM app
; it might give some more clues. (One place you sayapp
, another you sayapp_scrapy_appmain
; are they the same??)我们来看看EXPLAIN SELECT COUNT(*)FROM app;它可能会提供更多线索。 (一个地方你说app,另一个你说app_scrapy_appmain;它们是一样的吗??)
-
As long as you never
DELETE
any rows, this would give you the same answer:SELECT MAX(id) FROM app
, and run "instantly". (Once aDELETE
,ROLLBACK
, etc) happens,id(s)
will be lost, so theCOUNT
will be less thanMAX
.)只要你永远不删除任何行,这将给你相同的答案:SELECT MAX(id)FROM app,并立即运行。 (一旦发生DELETE,ROLLBACK等),id(s)将丢失,因此COUNT将小于MAX。)
More
更多
innodb_buffer_pool_size=3G
is probably too much on only 4GB of RAM. If MySQL swaps, performance becomes really bad. Suggest only 2G, unless you can see that it is not swapping.
innodb_buffer_pool_size = 3G可能只有4GB的RAM太多了。如果MySQL交换,性能变得非常糟糕。建议只有2G,除非你可以看到它没有交换。
Note: Scanning 1.8M rows is destined to take at least 0.4s on that hardware, or probably any hardware. It takes time do to the task. Also, doing the 'long' query interferes in two ways with other tasks: It consumes CPU and/or I/O while performing the query, plus it may be bumping other blocks out of cache, causing them to slow down. So, I really think the 'right' thing to do is to take heed of my hints on avoiding COUNT(*)
. Here's another one:
注意:扫描1.8M行在该硬件或任何硬件上注定至少需要0.4秒。这项任务需要时间。此外,执行“长”查询会以两种方式干扰其他任务:它在执行查询时会消耗CPU和/或I / O,而且可能会将其他块从缓存中移出,导致它们变慢。所以,我真的认为“正确”的做法是注意避免COUNT(*)的提示。这是另一个:
- Build and maintain a "Summary Table" of daily subtotals of this (and other) table. Include in it the daily
COUNT(*)
and whatever else you might like to have. This will even shrink the 0.4s timing by usingSUM(subtotal)
from this table. More on Summary Tables. - 构建并维护此(和其他)表的每日小计的“汇总表”。在其中包含每日COUNT(*)以及您可能想要的任何其他内容。这甚至可以通过使用此表中的SUM(小计)缩小0.4s时序。更多关于汇总表。
#1
6
10X -- I like it. That exactly matches my Rule of Thumb: "If the data is not cached, the query will take 10 times as long as if it is cached." (Rick's RoTs)
10X - 我喜欢它。这完全符合我的经验法则:“如果数据没有被缓存,查询将花费10倍于缓存的时间。” (里克的RoTs)
But, let's move on to the real question: "4.1s is too slow, what can I do about it."
但是,让我们继续讨论真正的问题:“4.1s太慢了,我该怎么办呢。”
-
Change your app so you don't need the number of rows. Have you noticed that Search engines no longer say "out of 12345678 hits"?
更改您的应用,这样您就不需要行数。您是否注意到搜索引擎不再说出“12345678点击”?
-
Keep an estimate, rather than recomputing.
保持估计,而不是重新计算。
-
Let's see
EXPLAIN SELECT COUNT(*) FROM app
; it might give some more clues. (One place you sayapp
, another you sayapp_scrapy_appmain
; are they the same??)我们来看看EXPLAIN SELECT COUNT(*)FROM app;它可能会提供更多线索。 (一个地方你说app,另一个你说app_scrapy_appmain;它们是一样的吗??)
-
As long as you never
DELETE
any rows, this would give you the same answer:SELECT MAX(id) FROM app
, and run "instantly". (Once aDELETE
,ROLLBACK
, etc) happens,id(s)
will be lost, so theCOUNT
will be less thanMAX
.)只要你永远不删除任何行,这将给你相同的答案:SELECT MAX(id)FROM app,并立即运行。 (一旦发生DELETE,ROLLBACK等),id(s)将丢失,因此COUNT将小于MAX。)
More
更多
innodb_buffer_pool_size=3G
is probably too much on only 4GB of RAM. If MySQL swaps, performance becomes really bad. Suggest only 2G, unless you can see that it is not swapping.
innodb_buffer_pool_size = 3G可能只有4GB的RAM太多了。如果MySQL交换,性能变得非常糟糕。建议只有2G,除非你可以看到它没有交换。
Note: Scanning 1.8M rows is destined to take at least 0.4s on that hardware, or probably any hardware. It takes time do to the task. Also, doing the 'long' query interferes in two ways with other tasks: It consumes CPU and/or I/O while performing the query, plus it may be bumping other blocks out of cache, causing them to slow down. So, I really think the 'right' thing to do is to take heed of my hints on avoiding COUNT(*)
. Here's another one:
注意:扫描1.8M行在该硬件或任何硬件上注定至少需要0.4秒。这项任务需要时间。此外,执行“长”查询会以两种方式干扰其他任务:它在执行查询时会消耗CPU和/或I / O,而且可能会将其他块从缓存中移出,导致它们变慢。所以,我真的认为“正确”的做法是注意避免COUNT(*)的提示。这是另一个:
- Build and maintain a "Summary Table" of daily subtotals of this (and other) table. Include in it the daily
COUNT(*)
and whatever else you might like to have. This will even shrink the 0.4s timing by usingSUM(subtotal)
from this table. More on Summary Tables. - 构建并维护此(和其他)表的每日小计的“汇总表”。在其中包含每日COUNT(*)以及您可能想要的任何其他内容。这甚至可以通过使用此表中的SUM(小计)缩小0.4s时序。更多关于汇总表。