大量数据会影响MySQL的性能

时间:2022-12-04 16:54:55

I have a software project that I am working on at work that has been driving me crazy. Here's our problem: we have a series data contacts that need to be logged every second. It needs to include time, bearing (array of 360-1080 bytes), range, and a few other fields. Our system also needs the capability to store this data for up to 30 days. In practice, there can be up to 100 different contacts, so at a maximum, there can be anywhere from around 150,000,000 points to about 1,000,000,000 different points in 30 days.

我有一个软件项目,我正在做的工作让我发疯。这是我们的问题:我们有一个系列数据联系人,需要每秒钟记录一次。它需要包括时间、方位(360-1080字节数组)、范围和其他几个字段。我们的系统还需要有能力存储这些数据长达30天。在实践中,可以有多达100个不同的接触点,所以在30天内最多可以有大约1.5亿个点到大约10亿个不同的点。

I'm trying to think of the best method for storing all of this data and retrieving later on. My first thought was to use some RDBMS like MySQL. Being a embedded C/C++ programmer, I have very little experience working with MySQL with such large data sets. I've dabbled with it on small datasets, but nothing nearly as large. I generated the below schema for two tables that will store some of the data:

我正试图想出最好的方法来存储所有这些数据并在以后进行检索。我的第一个想法是使用一些RDBMS,比如MySQL。作为一名嵌入式C/ c++程序员,我几乎没有使用这么大数据集的MySQL经验。我曾在小数据集上涉猎过,但几乎没有。我为两个将存储部分数据的表生成了下面的模式:

CREATE TABLE IF NOT EXISTS `HEADER_TABLE` (
  `header_id` tinyint(3) unsigned NOT NULL auto_increment,
  `sensor` varchar(10) NOT NULL,
  `bytes` smallint(5) unsigned NOT NULL,
  PRIMARY KEY  (`header_id`),
  UNIQUE KEY `header_id_UNIQUE` (`header_id`),
  UNIQUE KEY `sensor_UNIQUE` (`sensor`)
) ENGINE=MyISAM AUTO_INCREMENT=0 DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `RAW_DATA_TABLE` (
  `internal_id` bigint(20) NOT NULL auto_increment,
  `time_sec` bigint(20) unsigned NOT NULL,
  `time_nsec` bigint(20) unsigned NOT NULL,
  `transverse` bit(1) NOT NULL default b'0',
  `data` varbinary(1080) NOT NULL,
  PRIMARY KEY  (`internal_id`,`time_sec`,`time_nsec`),
  UNIQUE KEY `internal_id_UNIQUE` (`internal_id`),
  KEY `time` (`time_sec`)
  KEY `internal_id` (`internal_id`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `rel_RASTER_TABLE` (
  `internal_id` bigint(20) NOT NULL auto_increment,
  `raster_id` int(10) unsigned NOT NULL,
  `time_sec` bigint(20) unsigned NOT NULL,
  `time_nsec` bigint(20) unsigned NOT NULL,
  `header_id` tinyint(3) unsigned NOT NULL,
  `data_id` bigint(20) unsigned NOT NULL,
  PRIMARY KEY  (`internal_id`, `raster_id`,`time_sec`,`time_nsec`),
  KEY `raster_id` (`raster_id`),
  KEY `time` (`time_sec`),
  KEY `data` (`data_id`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;

The header table only contains 10 rows and is static. It just tells what sensor the raw data came from, and the number of bytes output by that type of sensor. The RAW_DATA_TABLE essentially stores the raw bearing data (an array of 360-1080 bytes, it represents up to three samples per degree). The rel_RASTER_TABLE holds meta data for the RAW_DATA_TABLE, there can be multiple contacts that refer to the same raw data row. The data_id found in rel_RASTER_TABLE points to the internal_id of some row in the RAW_DATA_TABLE, I did this to decrease the amount of writes needed.

头表只包含10行,是静态的。它只告诉原始数据来自哪个传感器,以及该类型传感器输出的字节数。RAW_DATA_TABLE本质上存储原始承载数据(360-1080字节的数组,每一度最多表示3个样本)。rel_RASTER_TABLE保存RAW_DATA_TABLE的元数据,可以有多个联系人引用相同的原始数据行。rel_RASTER_TABLE中找到的data_id指向RAW_DATA_TABLE中某个行的internal_id,我这样做是为了减少所需的写入量。

Obviously, as you can probably tell, I'm having performance issues when reading and deleting from this database. An operator to our software can see real time data as it comes across and also go into reconstruction mode and overlay a data range from the past, the past week for example. Our backend logging server grabs the history rows and sends them to a display via a CORBA interface. While all of this is happening, I have a worker thread that deletes 1000 rows at a time for data greater than 30 days. This is there in case a session runs longer than 30 days, which can happen.

显然,正如您可能看到的,我在从这个数据库读取和删除时存在性能问题。我们软件的操作人员可以看到实时数据,也可以进入重构模式,覆盖过去一周的数据范围。我们的后端日志记录服务器获取历史行并通过CORBA接口将它们发送到一个显示。当所有这些都发生时,我有一个worker线程,每次删除1000行,以获取大于30天的数据。如果会话运行时间超过30天,就会出现这种情况。

The system we currently have implemented works well for smaller sets of data, but not for large sets. Our select and delete statements can take upwards of 2 minutes to return results. This completely kills the performance of our real time consumer thread. I suspect we're not designing our schemas correctly, picking the wrong keys, not optimizing our SQL queries correctly, or some subset of each. Our writes don't see to be affected unless the other operations take too long to run.

我们目前实现的系统可以很好地处理较小的数据集,但不适用于大的数据集。我们的select和delete语句可能需要至少2分钟才能返回结果。这完全破坏了我们的实时使用者线程的性能。我怀疑我们没有正确地设计模式,没有选择错误的键,没有正确地优化SQL查询,或者每种查询的某个子集。我们的写入不会受到影响,除非其他操作需要很长时间才能运行。

Here is an example SQL Query we use to get history data:

下面是我们用来获取历史数据的SQL查询示例:

SELECT 
  rel_RASTER_TABLE.time_sec, 
  rel_RASTER_TABLE.time_nsec, 
  RAW_DATA_TABLE.transverse, 
  HEADER_TABLE.bytes, 
  RAW_DATA_TABLE.data 
FROM 
  RASTER_DB.HEADER_TABLE, 
  RASTER_DB.RAW_DATA_TABLE, 
  RASTER_DB.rel_RASTER_TABLE 
WHERE 
  rel_RASTER_TABLE.raster_id = 2952704 AND 
  rel_RASTER_TABLE.time_sec >= 1315849228 AND 
  rel_RASTER_TABLE.time_sec <= 1315935628 AND 
  rel_RASTER_TABLE.data_id = RAW_DATA_TABLE.internal_id AND 
  rel_RASTER_TABLE.header_id = HEADER_TABLE.header_id;

I apologize in advance for this being such a long question, but I've tapped out other resources and this is my last resort. I figure I'd try to be as descriptive as possible Do you guys see of any way I can improve upon our design at first glance? Or, anyway we can optimize our select and delete statements for such large data sets? We're currently running RHEL as the OS and unfortunately can't change our hardware configuration on the server (4 GB RAM, Quad Core). We're using C/C++ and the MySQL API. ANY speed improvements would be EXTREMELY beneficial. If you need me to clarify anything, please let me know. Thanks!

我提前向你道歉,这是一个很长的问题,但我已经利用了其他资源,这是我最后的办法。我想我应该尽可能地描述性你们看到我有什么方法可以改进我们的设计吗?或者,我们可以为这么大的数据集优化select和delete语句?我们目前正在运行RHEL作为操作系统,不幸的是无法更改服务器上的硬件配置(4gb RAM, Quad Core)。我们使用的是C/ c++和MySQL API。任何速度的改进都是非常有益的。如果你需要我澄清什么,请告诉我。谢谢!

EDIT: BTW, if you can't provide specific help, maybe you can link me to some excellent tutorials you've come across for optimizing SQL queries, schema design, or MySQL tuning?

编辑:顺便说一句,如果你不能提供具体的帮助,也许你可以把我链接到你遇到的优化SQL查询、模式设计或MySQL调优的优秀教程?

5 个解决方案

#1


4  

First thing you could try is de-normalizing the data. On a data set of that size, doing a join, even if you have indexes is going to require very intense computation. Turn those three tables into 1 table. Sure there will be duplicate data, but without joins it will be much easier to work with. Second thing, see if you can get a machine with enough memory to fit the whole table in memory. It doesn't cost much ($1000 or less) for a machine with 24GB of RAM. I'm not sure if that will hold your entire data set, but it will help tremendously Get an SSD as well. For anything that isn't stored in memory, an SSD should help you access it with high speed. And thirdly, look into other data storage technologies such as BigTable that are designed to deal with very large data sets.

您可以尝试的第一件事是去规范化数据。对于这样大小的数据集,进行连接,即使有索引,也需要非常密集的计算。把这三个表变成一个表。当然会有重复的数据,但是如果没有连接,将会更容易处理。第二件事,看看你是否能找到一台拥有足够内存的机器,将整个表放在内存中。对于一台拥有24GB内存的机器来说,它的成本并不高(1000美元或更低)。我不确定这是否会保存整个数据集,但它将极大地帮助获得SSD。对于没有存储在内存中的任何东西,SSD应该帮助您高速访问它。第三,研究其他数据存储技术,如BigTable,它们被设计用来处理非常大的数据集。

#2


2  

I would say partitioning is an absolute must in a case like this:

我想说,在这种情况下,分割是绝对必须的:

  • large amount of data
  • 大量的数据
  • new data coming in continuously
  • 新数据不断地进入
  • implicit: old data getting deleted continuously.
  • 隐式:不断删除旧数据。

Check out this for mySQL.

看看这个mySQL。

Looking at your select stmt (which filters on time), I'll say partition on the time column.

查看您的select stmt(它在时间上进行过滤),我将在time列上写入partition。

Of course you might wanna add a few indexes based on the frequent queries you want to use.

当然,您可能想要基于您想要使用的频繁查询添加一些索引。

--edit--

——编辑

I see that many have suggested indexes. My experiences have been that having an index on a table with really large num of rows either kills the performance (eventually) or requires lot of resources (CPU, memory,...) to keep the indexes up to date. So although I also suggest addition of indexes, please note that it's absolutely useless unless you partition the table first. Finally, follow symcbean's advise (optimize your indexes in number and keys) when you add indexes.

我看到很多人提出了指数。我的经验是,在具有大量行数的表上有一个索引,要么会破坏性能(最终),要么需要大量资源(CPU、内存……)来保持索引的最新。因此,尽管我也建议添加索引,但请注意,除非您首先对表进行分区,否则它是绝对无用的。最后,在添加索引时遵循symcbean的建议(在数量和键中优化索引)。

--edit end--

——编辑结束

A quickie on partitioning if you're new to it.

如果你是新手的话,这是一个关于分区的快速游戏。

  • Usually a single table translates to a single data file. A partitioned table translates to one file per partition.
  • 通常一个表转换为一个数据文件。分区表转换为每个分区一个文件。
  • Advantages
    • insertions are faster as physically it's inserted into a smaller file (partition).
    • 插入速度更快,因为它被插入到较小的文件(分区)中。
    • deletion of large number of rows would usually translate to dropping a partition (much much much much cheaper than 'delete from xxx where time > 100 and time < 200');
    • 删除大量的行通常会转化为删除一个分区(比“从xxx删除时间> 100和时间< 200”要便宜得多);
    • queries with a where clause on the key by which the table is partitioned is much much faster.
    • 在表分区的键上带有where子句的查询要快得多。
    • Index building is faster.
    • 索引构建更快。
  • 优点是插入速度更快,因为物理上它被插入到更小的文件(分区)中。删除大量的行通常意味着删除一个分区(比“从xxx删除,其中> 100和时间< 200”要便宜得多);在表分区的键上带有where子句的查询要快得多。索引构建更快。

#3


1  

I don't have much experience with MySQL, but here are some a priori thoughts that jump to mind.

我对MySQL没有太多的经验,但这里有一些先入为主的想法。

Is your select in a stored procedure?

您的选择是否在存储过程中?

The select's predicate is usually searched in the order its asked in. If the data on the disk is reordered to match the primary key, then doing raster id first is fine. You would be paying the cost of reordering on every insert though. If the data is stored in time order on disk, you would probably want to search on time_sec before raster_id.

select的谓词通常按照查询的顺序进行搜索。如果磁盘上的数据被重新排序,以匹配主键,那么首先执行光栅id是可以的。但是,你会在每次插入时支付重新排序的费用。如果数据按时间顺序存储在磁盘上,您可能希望在raster_id之前的time_sec进行搜索。

WHERE 
  rel_RASTER_TABLE.raster_id = 2952704 AND 
  rel_RASTER_TABLE.time_sec >= 1315849228 AND 
  rel_RASTER_TABLE.time_sec <= 1315935628 AND 
  rel_RASTER_TABLE.data_id = RAW_DATA_TABLE.internal_id AND 
  rel_RASTER_TABLE.header_id = HEADER_TABLE.header_id;

Your indexes don't follow the search predicates.

索引不遵循搜索谓词。

It will create indexes based on the keys, generally.

它通常会基于键创建索引。

  PRIMARY KEY  (`internal_id`, `raster_id`,`time_sec`,`time_nsec`),
  KEY `raster_id` (`raster_id`),
  KEY `time` (`time_sec`),
  KEY `data` (`data_id`)

It may not be using the primary index because you aren't using internal_id. You may want to set internal_id as the primary key and create a separate index based on your search parameters. At least on raster_id and time_sec.

它可能不使用主索引,因为您没有使用internal_id。您可能希望将internal_id设置为主键,并根据您的搜索参数创建一个单独的索引。至少在raster_id和time_sec上。

Are the joins too loose?

接缝太松了吗?

This may be my inexperience with MySQL, but I expect to see conditions on the joins. Does using FROM here do a natural join? I don't see any foreign keys specified, so I don't know how it would join these tables rationally.

这可能是我对MySQL缺乏经验,但我希望看到连接的条件。从这里使用是否做一个自然连接?我没有看到指定的任何外键,所以我不知道它如何合理地连接这些表。

FROM 
  RASTER_DB.HEADER_TABLE, 
  RASTER_DB.RAW_DATA_TABLE, 
  RASTER_DB.rel_RASTER_TABLE 

Usually when developing something like this I would work with a smaller set and remove predicates to makes sure that each step meets what I expect. If you accidentally cast a wide net up front, then narrow down later you may mask some inefficiencies.

通常在开发类似这样的东西时,我会使用较小的集合并删除谓词,以确保每一步都符合我的期望。如果你不小心在前面撒了一张大网,然后再缩小范围,你可能会掩盖一些低效率。

Most query optimizers have a way to output how the optimized, make sure it meets your expectations. One of the comments mention Explain plans, I assume that is what it is called.

大多数查询优化器都有一种方法来输出优化后的结果,确保它满足您的期望。其中一个评论提到了解释计划,我猜这就是它的名字。

#4


1  

Without knowing what all the queries are its difficult to give specific advice, however looking at the single query you have provided, there are no indexes which are idealy suited to resolving this.

如果不知道所有的查询是什么,就很难给出特定的建议,但是查看您提供的单个查询,就没有适合解决这个问题的索引。

In fact the structure is a bit messy - if internal_id is an auto-increment value then it is unique - why add other stuff in the primary key? It looks as if a more sensible structure for rel_RASTER_TABLE would be:

实际上,这个结构有点混乱——如果internal_id是一个自动增量值,那么它是惟一的——为什么要在主键中添加其他的东西呢?看起来,rel_RASTER_TABLE的一个更合理的结构是:

PRIMARY KEY  (`internal_id`),
KEY (`raster_id`,`time_sec`,`time_nsec`),

And as for RAW_DATA_TABLE, it should be blindingly obvious that its indexes are far from optimal. And should probably be:

对于RAW_DATA_TABLE,显然它的索引远不是最优的。,应该是:

PRIMARY KEY  (`internal_id`,`time_sec`,`time_nsec`),
KEY `time` (`time_sec`, `time_nsec`)

Note that removing redundant indexes will speed up inserts/updates. Capturing slow queries should help - and learn how to use 'explain' to see what indexes are redundant / needed.

注意,删除冗余索引将加快插入/更新。捕获缓慢的查询应该会有帮助——并且学习如何使用“explain”来查看哪些索引是多余的/需要的。

You may also get a performance boost by tuning the mysql instance - particularly increasing the sort and join buffers - try running mysqltuner

还可以通过调优mysql实例(特别是增加排序和连接缓冲区)来提高性能,尝试运行mysqltuner

#5


0  

First, I would try to create a view with only the necessary info that needs to be selected between the different tables.

首先,我将尝试创建一个视图,只需要在不同的表之间选择所需的信息。

By the way, MySQL is not necessarily the most optimized database system for what you are trying to accomplish... Look into other solutions such Oracle, Microsoft SQL, PostgreSQL etc. Also, the performance will vary depending on the server being used.

顺便说一下,MySQL不一定是您要实现的最优化的数据库系统……查看其他解决方案,如Oracle、Microsoft SQL、PostgreSQL等等。此外,性能也会根据所使用的服务器而有所不同。

#1


4  

First thing you could try is de-normalizing the data. On a data set of that size, doing a join, even if you have indexes is going to require very intense computation. Turn those three tables into 1 table. Sure there will be duplicate data, but without joins it will be much easier to work with. Second thing, see if you can get a machine with enough memory to fit the whole table in memory. It doesn't cost much ($1000 or less) for a machine with 24GB of RAM. I'm not sure if that will hold your entire data set, but it will help tremendously Get an SSD as well. For anything that isn't stored in memory, an SSD should help you access it with high speed. And thirdly, look into other data storage technologies such as BigTable that are designed to deal with very large data sets.

您可以尝试的第一件事是去规范化数据。对于这样大小的数据集,进行连接,即使有索引,也需要非常密集的计算。把这三个表变成一个表。当然会有重复的数据,但是如果没有连接,将会更容易处理。第二件事,看看你是否能找到一台拥有足够内存的机器,将整个表放在内存中。对于一台拥有24GB内存的机器来说,它的成本并不高(1000美元或更低)。我不确定这是否会保存整个数据集,但它将极大地帮助获得SSD。对于没有存储在内存中的任何东西,SSD应该帮助您高速访问它。第三,研究其他数据存储技术,如BigTable,它们被设计用来处理非常大的数据集。

#2


2  

I would say partitioning is an absolute must in a case like this:

我想说,在这种情况下,分割是绝对必须的:

  • large amount of data
  • 大量的数据
  • new data coming in continuously
  • 新数据不断地进入
  • implicit: old data getting deleted continuously.
  • 隐式:不断删除旧数据。

Check out this for mySQL.

看看这个mySQL。

Looking at your select stmt (which filters on time), I'll say partition on the time column.

查看您的select stmt(它在时间上进行过滤),我将在time列上写入partition。

Of course you might wanna add a few indexes based on the frequent queries you want to use.

当然,您可能想要基于您想要使用的频繁查询添加一些索引。

--edit--

——编辑

I see that many have suggested indexes. My experiences have been that having an index on a table with really large num of rows either kills the performance (eventually) or requires lot of resources (CPU, memory,...) to keep the indexes up to date. So although I also suggest addition of indexes, please note that it's absolutely useless unless you partition the table first. Finally, follow symcbean's advise (optimize your indexes in number and keys) when you add indexes.

我看到很多人提出了指数。我的经验是,在具有大量行数的表上有一个索引,要么会破坏性能(最终),要么需要大量资源(CPU、内存……)来保持索引的最新。因此,尽管我也建议添加索引,但请注意,除非您首先对表进行分区,否则它是绝对无用的。最后,在添加索引时遵循symcbean的建议(在数量和键中优化索引)。

--edit end--

——编辑结束

A quickie on partitioning if you're new to it.

如果你是新手的话,这是一个关于分区的快速游戏。

  • Usually a single table translates to a single data file. A partitioned table translates to one file per partition.
  • 通常一个表转换为一个数据文件。分区表转换为每个分区一个文件。
  • Advantages
    • insertions are faster as physically it's inserted into a smaller file (partition).
    • 插入速度更快,因为它被插入到较小的文件(分区)中。
    • deletion of large number of rows would usually translate to dropping a partition (much much much much cheaper than 'delete from xxx where time > 100 and time < 200');
    • 删除大量的行通常会转化为删除一个分区(比“从xxx删除时间> 100和时间< 200”要便宜得多);
    • queries with a where clause on the key by which the table is partitioned is much much faster.
    • 在表分区的键上带有where子句的查询要快得多。
    • Index building is faster.
    • 索引构建更快。
  • 优点是插入速度更快,因为物理上它被插入到更小的文件(分区)中。删除大量的行通常意味着删除一个分区(比“从xxx删除,其中> 100和时间< 200”要便宜得多);在表分区的键上带有where子句的查询要快得多。索引构建更快。

#3


1  

I don't have much experience with MySQL, but here are some a priori thoughts that jump to mind.

我对MySQL没有太多的经验,但这里有一些先入为主的想法。

Is your select in a stored procedure?

您的选择是否在存储过程中?

The select's predicate is usually searched in the order its asked in. If the data on the disk is reordered to match the primary key, then doing raster id first is fine. You would be paying the cost of reordering on every insert though. If the data is stored in time order on disk, you would probably want to search on time_sec before raster_id.

select的谓词通常按照查询的顺序进行搜索。如果磁盘上的数据被重新排序,以匹配主键,那么首先执行光栅id是可以的。但是,你会在每次插入时支付重新排序的费用。如果数据按时间顺序存储在磁盘上,您可能希望在raster_id之前的time_sec进行搜索。

WHERE 
  rel_RASTER_TABLE.raster_id = 2952704 AND 
  rel_RASTER_TABLE.time_sec >= 1315849228 AND 
  rel_RASTER_TABLE.time_sec <= 1315935628 AND 
  rel_RASTER_TABLE.data_id = RAW_DATA_TABLE.internal_id AND 
  rel_RASTER_TABLE.header_id = HEADER_TABLE.header_id;

Your indexes don't follow the search predicates.

索引不遵循搜索谓词。

It will create indexes based on the keys, generally.

它通常会基于键创建索引。

  PRIMARY KEY  (`internal_id`, `raster_id`,`time_sec`,`time_nsec`),
  KEY `raster_id` (`raster_id`),
  KEY `time` (`time_sec`),
  KEY `data` (`data_id`)

It may not be using the primary index because you aren't using internal_id. You may want to set internal_id as the primary key and create a separate index based on your search parameters. At least on raster_id and time_sec.

它可能不使用主索引,因为您没有使用internal_id。您可能希望将internal_id设置为主键,并根据您的搜索参数创建一个单独的索引。至少在raster_id和time_sec上。

Are the joins too loose?

接缝太松了吗?

This may be my inexperience with MySQL, but I expect to see conditions on the joins. Does using FROM here do a natural join? I don't see any foreign keys specified, so I don't know how it would join these tables rationally.

这可能是我对MySQL缺乏经验,但我希望看到连接的条件。从这里使用是否做一个自然连接?我没有看到指定的任何外键,所以我不知道它如何合理地连接这些表。

FROM 
  RASTER_DB.HEADER_TABLE, 
  RASTER_DB.RAW_DATA_TABLE, 
  RASTER_DB.rel_RASTER_TABLE 

Usually when developing something like this I would work with a smaller set and remove predicates to makes sure that each step meets what I expect. If you accidentally cast a wide net up front, then narrow down later you may mask some inefficiencies.

通常在开发类似这样的东西时,我会使用较小的集合并删除谓词,以确保每一步都符合我的期望。如果你不小心在前面撒了一张大网,然后再缩小范围,你可能会掩盖一些低效率。

Most query optimizers have a way to output how the optimized, make sure it meets your expectations. One of the comments mention Explain plans, I assume that is what it is called.

大多数查询优化器都有一种方法来输出优化后的结果,确保它满足您的期望。其中一个评论提到了解释计划,我猜这就是它的名字。

#4


1  

Without knowing what all the queries are its difficult to give specific advice, however looking at the single query you have provided, there are no indexes which are idealy suited to resolving this.

如果不知道所有的查询是什么,就很难给出特定的建议,但是查看您提供的单个查询,就没有适合解决这个问题的索引。

In fact the structure is a bit messy - if internal_id is an auto-increment value then it is unique - why add other stuff in the primary key? It looks as if a more sensible structure for rel_RASTER_TABLE would be:

实际上,这个结构有点混乱——如果internal_id是一个自动增量值,那么它是惟一的——为什么要在主键中添加其他的东西呢?看起来,rel_RASTER_TABLE的一个更合理的结构是:

PRIMARY KEY  (`internal_id`),
KEY (`raster_id`,`time_sec`,`time_nsec`),

And as for RAW_DATA_TABLE, it should be blindingly obvious that its indexes are far from optimal. And should probably be:

对于RAW_DATA_TABLE,显然它的索引远不是最优的。,应该是:

PRIMARY KEY  (`internal_id`,`time_sec`,`time_nsec`),
KEY `time` (`time_sec`, `time_nsec`)

Note that removing redundant indexes will speed up inserts/updates. Capturing slow queries should help - and learn how to use 'explain' to see what indexes are redundant / needed.

注意,删除冗余索引将加快插入/更新。捕获缓慢的查询应该会有帮助——并且学习如何使用“explain”来查看哪些索引是多余的/需要的。

You may also get a performance boost by tuning the mysql instance - particularly increasing the sort and join buffers - try running mysqltuner

还可以通过调优mysql实例(特别是增加排序和连接缓冲区)来提高性能,尝试运行mysqltuner

#5


0  

First, I would try to create a view with only the necessary info that needs to be selected between the different tables.

首先,我将尝试创建一个视图,只需要在不同的表之间选择所需的信息。

By the way, MySQL is not necessarily the most optimized database system for what you are trying to accomplish... Look into other solutions such Oracle, Microsoft SQL, PostgreSQL etc. Also, the performance will vary depending on the server being used.

顺便说一下,MySQL不一定是您要实现的最优化的数据库系统……查看其他解决方案,如Oracle、Microsoft SQL、PostgreSQL等等。此外,性能也会根据所使用的服务器而有所不同。