为子字符串查询优化Mysql表索引

时间:2021-11-06 22:16:19

I have a MySQL indexing question for you guys.

我有一个MySQL索引的问题。

I've got a very large table (~100Million Records) in MySQL that contains information about files. Most of the Queries I do on it involve substring operations on the file path column.

我在MySQL中有一个非常大的表(大约1亿个记录),其中包含关于文件的信息。我在它上做的大多数查询都涉及文件路径列上的子字符串操作。

Here's the table ddl:

这是表的ddl:

CREATE TABLE `filesystem_data`.`$tablename` (
                `file_id` INT( 14 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
                `file_name` VARCHAR( 256 ) NOT NULL ,
                `file_share_name` VARCHAR ( 100 ) NOT NULL,
                `file_path` VARCHAR( 900 ) NOT NULL ,
                `file_size` BIGINT( 14 ) NOT NULL ,
                `file_tier` TINYINT(1) UNSIGNED NULL, 
                `file_last_access` DATETIME NOT NULL ,
                `file_last_change` DATETIME NOT NULL ,
                `file_creation` DATETIME NOT NULL ,
                `file_extension` VARCHAR( 50 ) NULL ,
                INDEX ( `file_path`, `file_share_name` ) 
                ) ENGINE = MYISAM 
             };

So for example ill have a row with a file_path like:

例如,我有一个带有file_path的行:

'\\Server100\share2\Home\Zenshai\My Documents\'

And I'll extract the User's name (Zenshai in this example) with something like

我将用类似的方法提取用户名(本例中为Zenshai)

SELECT substring_index(substring_index(fp.file_path,'\\',6),'\\',-1) as Username
FROM (SELECT '\\\\Server100\\share2\\Home\\Zenshai\\My Documents\\' as file_path) fp

It gets a bit ugly, but that's not really my concern right now.

它变得有点丑陋,但这不是我现在真正关心的。

What I'd like some advice on is what kind of index (if any at all) can help speed up these types of queries on this table. Any other suggestions are welcome too.

我想要一些建议是什么样的索引(如果有的话)可以帮助加快这个表上的这些查询类型。其他建议也欢迎。

Thanks.

谢谢。

PS. Although the table gets very large there is enough space for indexes.

虽然表变得很大,但是有足够的空间来存放索引。

2 个解决方案

#1


3  

You cannot use indices with your current table design.

不能在当前表设计中使用索引。

You may add a column called USERNAME, fill it in the INSERT/UPDATE trigger with the expression you use in SELECT, and search on this column.

您可以添加一个名为USERNAME的列,在INSERT/UPDATE触发器中填充您在SELECT中使用的表达式,并在该列上进行搜索。

P. S. Just curious, you really have 100 mln+ files on your server?

p。s。很好奇,你的服务器上真的有100个mln+文件吗?

#2


2  

I'd create a tiny (columns, not record count) subtable that would have the file path broken out and stored like so:

我将创建一个小的(列,而不是记录计数)子表,将文件路径分解并存储为:

FK_TO_PARENT    PATH_PART
1               Server100
1               share2
1               Home
1               Zenshai
1               My Documents

And then just index PATH_PART. Of course if the parent table is 100 Million plus, then this would be going into the billions of records.

然后索引PATH_PART。当然,如果父表是1亿加,那么这个就会有几十亿条记录。

#1


3  

You cannot use indices with your current table design.

不能在当前表设计中使用索引。

You may add a column called USERNAME, fill it in the INSERT/UPDATE trigger with the expression you use in SELECT, and search on this column.

您可以添加一个名为USERNAME的列,在INSERT/UPDATE触发器中填充您在SELECT中使用的表达式,并在该列上进行搜索。

P. S. Just curious, you really have 100 mln+ files on your server?

p。s。很好奇,你的服务器上真的有100个mln+文件吗?

#2


2  

I'd create a tiny (columns, not record count) subtable that would have the file path broken out and stored like so:

我将创建一个小的(列,而不是记录计数)子表,将文件路径分解并存储为:

FK_TO_PARENT    PATH_PART
1               Server100
1               share2
1               Home
1               Zenshai
1               My Documents

And then just index PATH_PART. Of course if the parent table is 100 Million plus, then this would be going into the billions of records.

然后索引PATH_PART。当然,如果父表是1亿加,那么这个就会有几十亿条记录。