MySql - WAMP - 巨大的表非常慢(2000万行)

时间:2021-12-12 02:46:47

So I posted this! yesterday and got a perfect answer, which required running this code first: ALTER TABLE mytable AUTO_INCREMENT=10000001;

所以我贴了这个!昨天得到了一个完美的答案,需要先运行此代码:ALTER TABLE mytable AUTO_INCREMENT = 10000001;

I ran it several times, but restarted WAMP after a couple of hours of it not working. After running overnight (12 hours), the code still hadn't run.

我运行了好几次,但是在几个小时不工作后重新启动了WAMP。过夜(12小时)后,代码仍未运行。

I am wondering if my database table size is past the limits of mysql or my computer or both.

我想知道我的数据库表大小是否超过了mysql或我的计算机或两者的限制。

However, I have a sneaky suspicion that proper indexing or some other factor could greatly impact my performance. I know 20 million is a lot of rows, but is it too much?

但是,我有一种偷偷摸摸的怀疑,正确的索引或其他因素可能会对我的表现产生很大影响。我知道2000万行是很多行,但它太多了吗?

I don't know much about indexes, except that they are important. I attempted to add them to the name and state fields, which I believe I did successfully.

我对索引知之甚少,除了它们很重要。我试图将它们添加到名称和州字段中,我相信我成功了。

Incidentally, I am trying to add a unique ID field, which is what my post yesterday was all about.

顺便说一句,我正在尝试添加一个唯一的ID字段,这就是我昨天的帖子。

So, the question is: Is 20 million rows outside the scope of MySql? If not, am I missing an index or some other setting that would help better work with this 20 million rows? Can I put indexes on all the columns and make it super fast?

所以,问题是:在MySql范围之外是否有2000万行?如果没有,我是否缺少一个索引或其他一些有助于更好地使用这2000万行的设置?我可以在所有列上放置索引并使其超快吗?

As always, thanks in advance...

一如既往,提前谢谢......

Here are the specs:

以下是规格:

My PC is XP, running WAMPSERVER, Win32 NTFS, Intel Duo Core, T9300 @ 2.50GHz, 1.17 GHz, 1.98 GB or RAM

我的电脑是XP,运行WAMPSERVER,Win32 NTFS,Intel Duo Core,T9300 @ 2.50GHz,1.17 GHz,1.98 GB或RAM

DB: 1 table, 20 million rows The size of the tables is: Data 4.4 Gigs, Indexes 1.3 Gigs, Total 5.8 Gigs

DB:1表,2000万行表的大小是:数据4.4 Gigs,Indexes 1.3 Gigs,Total 5.8 Gigs

The indexes are set up on the 'BUSINESS NAME' and 'STATE' fields

索引在“BUSINESS NAME”和“STATE”字段中设置

The table fields are like this:

表格字段如下:

`BUSINESS NAME` TEXT NOT NULL, 
`ADDRESS` TEXT NOT NULL, 
`CITY` TEXT NOT NULL, 
`STATE` TEXT NOT NULL, 
`ZIP CODE` TEXT NOT NULL, 
`COUNTY` TEXT NOT NULL, 
`WEB ADDRESS` TEXT NOT NULL, 
`PHONE NUMBER` TEXT NOT NULL, 
`FAX NUMBER` TEXT NOT NULL, 
`CONTACT NAME` TEXT NOT NULL, 
`TITLE` TEXT NOT NULL, 
`GENDER` TEXT NOT NULL, 
`EMPLOYEE` TEXT NOT NULL, 
`SALES` TEXT NOT NULL, 
`MAJOR DIVISION DESCRIPTION` TEXT NOT NULL, 
`SIC 2 CODE DESCRIPTION` TEXT NOT NULL, 
`SIC 4 CODE` TEXT NOT NULL, 
`SIC 4 CODE DESCRIPTION` TEXT NOT NULL 

3 个解决方案

#1


8  

Some answers:

  • 20 million rows is well within the capability of MySQL. I work on a database that has over 500 million rows in one of its tables. It can take hours to restructure a table, but ordinary queries aren't a problem as long as they're assisted by an index.

    2000万行完全在MySQL的能力范围内。我在一个表中工作的数据库超过5亿行。重组表可能需要数小时,但普通查询只要它们由索引辅助就不是问题。

  • Your laptop is pretty out of date and underpowered to use as a high-scale database server. It's going to take a long time to do a table restructure. The low amount of memory and typically slow laptop disk is probably constraining you. You're probably using default settings for MySQL too, which are designed to work on very old computers.

    您的笔记本电脑已经过时,并且不能用作高级数据库服务器。这需要很长时间才能进行表格重组。低内存量和通常较慢的笔记本电脑磁盘可能会限制您。您可能也使用MySQL的默认设置,这些设置适用于非常旧的计算机。

  • I wouldn't recommend using TEXT data type for every column. There's no reason you need TEXT for most of those columns.

    我不建议为每列使用TEXT数据类型。大多数这些列都没有理由需要TEXT。

  • Don't create an index on every column, especially if you insist on using TEXT data types. You can't even index a TEXT column unless you define a prefix index. In general, choose indexes to support specific queries.

    不要在每个列上创建索引,特别是如果您坚持使用TEXT数据类型。除非定义前缀索引,否则甚至无法索引TEXT列。通常,选择索引以支持特定查询。

You probably have many other questions based on the above, but there's too much to cover in a single * post. You might want to take training or read a book if you're going to work with databases.
I recommend High Performance MySQL, 2nd Edition.

基于上述内容,您可能还有许多其他问题,但在单个*帖子中有太多内容无法涵盖。如果您要使用数据库,您可能需要接受培训或阅读书籍。我推荐高性能MySQL,第2版。


Re your followup questions:

重新提出你的后续问题:

For MySQL tuning, here's a good place to start: http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/

对于MySQL调优,这里是一个很好的起点:http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/

Many ALTER TABLE operations cause a table restructure, which means basically lock the table, make a copy of the whole table with the changes applied, and then rename the new and old tables and drop the old table. If the table is very large, this can take a long time.

许多ALTER TABLE操作会导致表重组,这意味着基本上锁定表,使用所应用的更改生成整个表的副本,然后重命名新旧表并删除旧表。如果表非常大,这可能需要很长时间。

A TEXT data type can store up to 64KB, which is overkill for a phone number or a state. I would use CHAR(10) for a typical US phone number. I would use CHAR(2) for a US state. In general, use the most compact and thrifty data type that supports the range of data you need in a given column.

TEXT数据类型最多可以存储64KB,这对于电话号码或状态来说是过度的。我会使用CHAR(10)作为典型的美国电话号码。我会将CHAR(2)用于美国州。通常,使用最紧凑和节俭的数据类型,它支持给定列中所需的数据范围。

#2


2  

It's going to take a long time because you've only got 2GB RAM and 6GB of data/indexes and it's going to force a ton of swapping in/out between RAM and disk. There's not much you can do about that, though.

这需要很长时间,因为你只有2GB内存和6GB数据/索引,这将迫使RAM和磁盘之间进行大量的交换。但是,你无能为力。

You could try running this in batches.

您可以尝试批量运行它。

Create a separate empty table with the auto_increment column included in it. Then insert your records a certain amount at a time (say, 1 state at a time). That might help it go faster since you should be able to handle those smaller datasets completely in memory instead of paging to disk.

创建一个单独的空表,其中包含auto_increment列。然后一次插入一定数量的记录(例如,一次一个州)。这可能有助于它更快,因为你应该能够完全在内存中处理那些较小的数据集而不是分页到磁盘。

You'll probably get a lot better responses for this if it's on dba.stackexchange.com also.

如果它也在dba.stackexchange.com上,你可能会得到更好的响应。

#3


0  

I believe the hardware is fine but you need to spare your resources a lot better.

我相信硬件很好,但你需要更好地利用你的资源。

Db structure optimization!

db结构优化!

  • Do not use TEXT!
  • 不要使用TEXT!

  • For phonenumbers use bigint unsigned. Any signs or alpha must be parsed and converted.
  • 对于phonenumbers使用bigint unsigned。必须解析和转换任何符号或alpha。

  • For any other alpha-numeric column use eg varchar([32-256]).
  • 对于任何其他字母数字列,请使用例如varchar([32-256])。

  • Zip-code is of course mediumint unsigned.
  • 邮政编码当然是未签名的。

  • Gender should be enum('Male','Female')
  • 性别应该是enum('男性','女性')

  • Sales could be an int unsigned
  • 销售可以是无符号的int

  • State should be enum('Alaska',...)
  • 国家应该是枚举('阿拉斯加',......)

  • Country should be enum('Albania',...)
  • 国家应该是枚举('阿尔巴尼亚',...)

When building a large index the fastest way is to create a new table and do INSERT INTO ... SELECT FROM ... rather then ALTER TABLE ....

构建大型索引时,最快的方法是创建一个新表并执行INSERT INTO ... SELECT FROM ...而不是ALTER TABLE ....

Changing the State and Country fields to enum will drastically reduce you indexes size.

将State和Country字段更改为枚举将大大减少索引大小。

#1


8  

Some answers:

  • 20 million rows is well within the capability of MySQL. I work on a database that has over 500 million rows in one of its tables. It can take hours to restructure a table, but ordinary queries aren't a problem as long as they're assisted by an index.

    2000万行完全在MySQL的能力范围内。我在一个表中工作的数据库超过5亿行。重组表可能需要数小时,但普通查询只要它们由索引辅助就不是问题。

  • Your laptop is pretty out of date and underpowered to use as a high-scale database server. It's going to take a long time to do a table restructure. The low amount of memory and typically slow laptop disk is probably constraining you. You're probably using default settings for MySQL too, which are designed to work on very old computers.

    您的笔记本电脑已经过时,并且不能用作高级数据库服务器。这需要很长时间才能进行表格重组。低内存量和通常较慢的笔记本电脑磁盘可能会限制您。您可能也使用MySQL的默认设置,这些设置适用于非常旧的计算机。

  • I wouldn't recommend using TEXT data type for every column. There's no reason you need TEXT for most of those columns.

    我不建议为每列使用TEXT数据类型。大多数这些列都没有理由需要TEXT。

  • Don't create an index on every column, especially if you insist on using TEXT data types. You can't even index a TEXT column unless you define a prefix index. In general, choose indexes to support specific queries.

    不要在每个列上创建索引,特别是如果您坚持使用TEXT数据类型。除非定义前缀索引,否则甚至无法索引TEXT列。通常,选择索引以支持特定查询。

You probably have many other questions based on the above, but there's too much to cover in a single * post. You might want to take training or read a book if you're going to work with databases.
I recommend High Performance MySQL, 2nd Edition.

基于上述内容,您可能还有许多其他问题,但在单个*帖子中有太多内容无法涵盖。如果您要使用数据库,您可能需要接受培训或阅读书籍。我推荐高性能MySQL,第2版。


Re your followup questions:

重新提出你的后续问题:

For MySQL tuning, here's a good place to start: http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/

对于MySQL调优,这里是一个很好的起点:http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/

Many ALTER TABLE operations cause a table restructure, which means basically lock the table, make a copy of the whole table with the changes applied, and then rename the new and old tables and drop the old table. If the table is very large, this can take a long time.

许多ALTER TABLE操作会导致表重组,这意味着基本上锁定表,使用所应用的更改生成整个表的副本,然后重命名新旧表并删除旧表。如果表非常大,这可能需要很长时间。

A TEXT data type can store up to 64KB, which is overkill for a phone number or a state. I would use CHAR(10) for a typical US phone number. I would use CHAR(2) for a US state. In general, use the most compact and thrifty data type that supports the range of data you need in a given column.

TEXT数据类型最多可以存储64KB,这对于电话号码或状态来说是过度的。我会使用CHAR(10)作为典型的美国电话号码。我会将CHAR(2)用于美国州。通常,使用最紧凑和节俭的数据类型,它支持给定列中所需的数据范围。

#2


2  

It's going to take a long time because you've only got 2GB RAM and 6GB of data/indexes and it's going to force a ton of swapping in/out between RAM and disk. There's not much you can do about that, though.

这需要很长时间,因为你只有2GB内存和6GB数据/索引,这将迫使RAM和磁盘之间进行大量的交换。但是,你无能为力。

You could try running this in batches.

您可以尝试批量运行它。

Create a separate empty table with the auto_increment column included in it. Then insert your records a certain amount at a time (say, 1 state at a time). That might help it go faster since you should be able to handle those smaller datasets completely in memory instead of paging to disk.

创建一个单独的空表,其中包含auto_increment列。然后一次插入一定数量的记录(例如,一次一个州)。这可能有助于它更快,因为你应该能够完全在内存中处理那些较小的数据集而不是分页到磁盘。

You'll probably get a lot better responses for this if it's on dba.stackexchange.com also.

如果它也在dba.stackexchange.com上,你可能会得到更好的响应。

#3


0  

I believe the hardware is fine but you need to spare your resources a lot better.

我相信硬件很好,但你需要更好地利用你的资源。

Db structure optimization!

db结构优化!

  • Do not use TEXT!
  • 不要使用TEXT!

  • For phonenumbers use bigint unsigned. Any signs or alpha must be parsed and converted.
  • 对于phonenumbers使用bigint unsigned。必须解析和转换任何符号或alpha。

  • For any other alpha-numeric column use eg varchar([32-256]).
  • 对于任何其他字母数字列,请使用例如varchar([32-256])。

  • Zip-code is of course mediumint unsigned.
  • 邮政编码当然是未签名的。

  • Gender should be enum('Male','Female')
  • 性别应该是enum('男性','女性')

  • Sales could be an int unsigned
  • 销售可以是无符号的int

  • State should be enum('Alaska',...)
  • 国家应该是枚举('阿拉斯加',......)

  • Country should be enum('Albania',...)
  • 国家应该是枚举('阿尔巴尼亚',...)

When building a large index the fastest way is to create a new table and do INSERT INTO ... SELECT FROM ... rather then ALTER TABLE ....

构建大型索引时,最快的方法是创建一个新表并执行INSERT INTO ... SELECT FROM ...而不是ALTER TABLE ....

Changing the State and Country fields to enum will drastically reduce you indexes size.

将State和Country字段更改为枚举将大大减少索引大小。