MySQL Query：当另一列中的值与特定条件匹配时，返回一列中具有特定值的所有行

This may be a little difficult to answer given that I'm still learning to write queries and I'm not able to view the database at the moment, but I'll give it a shot.

这可能有点难以回答,因为我仍在学习编写查询,而目前我无法查看数据库,但我会试一试。

The database I'm trying to acquire information from contains a large table (TransactionLineItems) that essentially functions as a store transaction log. This table currently contains about 5 million rows and several columns describing products which are included in each transaction (TLI_ReceiptAlias, TLI_ScanCode, TLI_Quantity and TLI_UnitPrice). This table has a foreign key which is paired with a primary key in another table (Transactions), and this table contains transaction numbers (TRN_ReceiptNumber). When I join these two tables, the query returns one row for every item we've ever sold, and each row has a receipt number. 16 rows might have the same receipt number, meaning that all of these items were sold in a single transaction. Below that might be 12 more rows, each sharing another receipt number. All transactions are broken down into multiple rows like this.

我正在尝试从中获取信息的数据库包含一个大型表(TransactionLineItems),它基本上用作存储事务日志。该表当前包含大约500万行和几列描述每个事务中包含的产品(TLI_ReceiptAlias,TLI_ScanCode,TLI_Quantity和TLI_UnitPrice)。该表有一个外键,它与另一个表(Transactions)中的主键配对,该表包含事务号(TRN_ReceiptNumber)。当我加入这两个表时,查询会为我们销售的每个项目返回一行,每行都有一个收据编号。 16行可能具有相同的收据编号,这意味着所有这些项目都在一次交易中出售。下面可能还有12行,每行共享另一个收据号。所有事务都分解为多行,如下所示。

I'm attempting to build a query which returns all rows sharing a single receipt number where at least one row with that receipt number meets certain criteria in another column. For example, three separate types of gift cards all have values in the TLI_ScanCode column that begin with "740000." I want the query to return rows with values beginning with these six digits in the TLI_ScanCode column, but I would also like to return all rows which share a receipt number with any of the rows which meet the given scan code criteria. Essentially, I need the query to return all rows for every receipt number which is also paired in at least one row with a gift card-related scan code.

我正在尝试构建一个查询,该查询返回共享单个收据编号的所有行,其中至少有一个具有该收据编号的行符合另一列中的某些条件。例如,三种不同类型的礼品卡在TLI_ScanCode列中都具有以“740000”开头的值。我希望查询返回TLI_ScanCode列中以这六个数字开头的值的行,但我还希望返回与任何符合给定扫描代码条件的行共享收据编号的所有行。基本上,我需要查询返回每个收据编号的所有行,这些收据编号也至少在一行中与礼品卡相关的扫描代码配对。

I attempted to use a subquery to return a column of all receipt numbers paired with gift card scan codes, using "WHERE A.TRN_ReceiptAlias IN (subquery..." to return only those rows with a receipt number which matched one of the receipt numbers returned by the subquery. This appeared to run without issue for five minutes before the server ground to a halt for another twenty while it processed the query. The query appeared to complete successfully, but given that I was working with IT to restore normal store operations during this time I failed to obtain the results of the query (apart from the associated shame and embarrassment).

我尝试使用子查询返回与礼品卡扫描代码配对的所有收据编号的列,使用“WHERE A.TRN_ReceiptAlias IN(子查询...”仅返回那些收据编号与其中一个收据编号匹配的行子查询返回。这似乎运行五分钟没有问题,然后服务器在处理查询时停止另外二十分钟。查询似乎成功完成,但鉴于我正在与IT合作恢复正常的存储操作在此期间,我未能获得查询结果(除了相关的羞耻和尴尬)。

I'd like to know if there is a way to write a query to obtain this information without causing the server to hang. I'm assuming that either: a) it wasn't very smart to use a subquery in this manner on such a large table, or b) I don't know enough about SQL to obtain the information I need. I'm assuming the answer is both A and B, but I'd very much like to learn how to do this the right way. Any help would be greatly appreciated. Thanks!

我想知道是否有办法编写查询来获取此信息而不会导致服务器挂起。我假设要么:a)在这么大的表上以这种方式使用子查询不是很聪明,或者b)我不太了解SQL以获取我需要的信息。我假设答案是A和B,但我非常想学习如何以正确的方式做到这一点。任何帮助将不胜感激。谢谢!

2 个解决方案

#1

SELECT *
  FROM a as a1
  JOIN b
    ON b.id = a.id
  JOIN a as a2
    ON a2.id = b.id
 WHERE b.some_criteria = 'something';

Include an index on (b.id,b.some_criteria)

在(b.id,b.some_criteria)上包含索引

#2

You aren't the first person, nor will you be the last to bring down your system with an inefficient query.

您不是第一个人,也不会是最后一个使用低效查询来降低系统的人。

The most important lesson is that "Decision Support" and "Analytics" really don't co-exist with a transaction system. You really want to pull the data into a datamart or datawarehouse or some other database that isn't your transaction database, so that you don't take the business offline.

最重要的一课是“决策支持”和“分析”实际上不与交易系统共存。您确实希望将数据提取到数据集市或数据仓库或其他非交易数据库的数据库中,这样您就不会使业务脱机。

In terms of understanding why your initial query was so inefficient, you want to familiarize yourself with the EXPLAIN EXTENDED syntax that returns you plan information that should help you debug your query and work on making it perform acceptably. If you update your question with the actual explain plan output for it, that would be helpful in determining what the issue is.

在理解为什么初始查询效率太低的情况下,您需要熟悉EXPLAIN EXTENDED语法,该语法会返回计划信息,这些信息可以帮助您调试查询并使其具有可接受的性能。如果您使用实际的解释计划输出更新您的问题,那将有助于确定问题所在。

Just from the outline you provided, it does sound like a self join would make sense rather than the subquery.

就像你提供的大纲一样,它听起来像是自我联接而不是子查询。

#1

SELECT *
  FROM a as a1
  JOIN b
    ON b.id = a.id
  JOIN a as a2
    ON a2.id = b.id
 WHERE b.some_criteria = 'something';