一个SQL查询,或循环中的许多查询?

时间:2023-01-26 23:29:16

I need to pull several rows from a table and process them in two ways:

我需要从表中提取几行并以两种方式处理它们:

  • aggregated on a key
  • 聚合在一个键上

  • row-by-row, sorted by the same key
  • 逐行,按相同的键排序

The table looks roughly like this:

该表看起来大致如下:

table (
   key,
   string_data,
   numeric_data
)

So I'm looking at two approaches to the function I'm writing.

所以我正在研究我正在编写的函数的两种方法。

The first would pull the aggregate data with one query, and then query again inside a loop for each set of row-by-row data (the following is PHP-like pseudocode):

第一个是使用一个查询提取聚合数据,然后在循环内再次查询每组逐行数据(以下是类似PHP的伪代码):

$rows = query(
        "SELECT key,SUM(numeric_data)
         FROM table
         GROUP BY key"
    );

foreach ($rows as $row) {
    <process aggregate data in $row>

    $key = $row['key'];
    $row_by_row_data = handle_individual_rows($key);
}

function handle_individual_rows($key)
{
    $rows = query(
            "SELECT string_data
             FROM table WHERE key=?",
            $key
        );

    <process $rows one row at a time>

    return $processed_data;
}

Or, I could do one big query and let the code do all the work:

或者,我可以做一个大的查询,让代码完成所有工作:

$rows = query(
    "SELECT key, string_data, numeric_data
     FROM table"
);

foreach ($rows as $row) {
    <process rows individually and calculate aggregates as I go>
}

Performance is not a practical concern in this application; I'm just looking to write sensible and maintainable code.

在该应用中,性能不是实际问题;我只是想编写合理且可维护的代码。

I like the first option because it's more modular -- and I like the second option because it seems structurally simple. Is one option better than the other or is it really just a matter of style?

我喜欢第一个选项,因为它更模块化 - 我喜欢第二个选项,因为它看起来结构简单。一种选择比另一种更好还是只是风格问题?

8 个解决方案

#1


One SQL query, for sure.

一个SQL查询,当然。

This will

  • Save you lots of roundtrips to database
  • 为您节省大量的往返数据库

  • Allow to use more efficient GROUP BY methods
  • 允许使用更高效的GROUP BY方法

Since your aggregates may be performed equally well by the database, it will also be better for mainainability: you have all your resultset logic in one place.

由于数据库可以很好地执行聚合,因此对于可维护性也更好:您将所有结果集逻辑放在一个位置。

Here is an example of a query that returns every row and calculates a SUM:

以下是返回每一行并计算SUM的查询示例:

SELECT  string_data, numeric_data, SUM(numeric_data) OVER (PARTITION BY key)
FROM    table

Note that this will most probably use parallel access to calculate SUM's for different key's, which is hardly implementable in PHP.

请注意,这很可能会使用并行访问来计算不同密钥的SUM,这在PHP中很难实现。

Same query in MySQL:

MySQL中的相同查询:

SELECT  key, string_data, numeric_data,
        (
        SELECT  SUM(numeric_data)
        FROM    table ti
        WHERE   ti.key = to.key
        ) AS key_sum
FROM    table to

#2


If performance isn't a concern, I'd go with the second. Seems the tiniest bit friendlier.

如果表现不是一个问题,我会选择第二个。似乎最微小的一点。

If performance were a concern, my answer would be "don't think, profile". :)

如果表现是一个问题,我的回答是“不要思考,简介”。 :)

#3


The second answer is by far more clear, sensible and maintainable. You're saying the same thing with less code, which is usually better.

第二个答案更清晰,明智和可维护。你用更少的代码说同样的事情,这通常更好。

And I know you said performance is not a concern, but why fetch data more than you have to?

而且我知道你说性能不是一个问题,但为什么要获取数据超过你所需要的?

#4


I can't be certain from the example here, but I'd like to know if there's a chance to do the aggregation and other processing right in the SQL query itself. In this case, you'd have to evaluate "more maintainable" with respect to your relative comfort level expressing that processing in SQL code vs. PHP code.

我不能从这里的例子中确定,但我想知道是否有机会在SQL查询本身中进行聚合和其他处理。在这种情况下,您必须评估“更易于维护”的相对舒适度,表示SQL代码与PHP代码的处理。

Is there something about the additional processing you need to do on each row that would prevent you from expressing everything in the SQL query itself?

您是否需要在每一行上执行额外的处理,以防止您在SQL查询本身中表达所有内容?

#5


I don't think you'll find many situations at all where doing a query-per-iteration of a loop is the better choice. In fact, I'd say it's probably a good rule of thumb to never do that.

我不认为你会发现许多情况,在每次迭代循环查询是更好的选择。事实上,我认为从来没有这样做可能是一个很好的经验法则。

In other words, the fewer round trips to the database, the better.

换句话说,到数据库的往返次数越少越好。

Depending on your data and actual tables, you might be able to let SQL do the aggregation work and select all the rows you need with one query.

根据您的数据和实际表,您可以让SQL执行聚合工作,并使用一个查询选择所需的所有行。

#6


one sql query is probably a better idea. It avoids you having to re-write relational operations

一个SQL查询可能是一个更好的主意。它避免了你不得不重写关系操作

#7


I think somehow you've answered your own question, because you say you have two different processings : one aggregation and one row by row.

我想你已经回答了自己的问题,因为你说你有两个不同的处理方式:一个聚合和一个一行。

  • if you want to keep everything readable and maintainable, mixing both in a single query doesn't sound right, the query will answer two different needs so it won't be very readable

    如果你想保持一切可读和可维护,在一个查询中混合两个听起来不对,查询将回答两个不同的需求,因此它不会是非常易读

  • even if perf is not an issue, it's faster to do the aggregation on the DB server instead of doing it in code

    即使perf不是问题,在DB服务器上进行聚合比在代码中进行聚合更快

  • with only one query, the code that will handle the result will mix two processings, handling rows and computing aggregations in the same time, so in time this code will tend to get confusing and buggy

    只有一个查询,处理结果的代码将混合两个处理,同时处理行和计算聚合,所以这段代码将容易混淆和错误

  • the same code might evolve over time, for instance the row-by-row can get complex and could create bugs in the aggregation part or the other way around

    相同的代码可能会随着时间的推移而发展,例如逐行可能会变得复杂并且可能会在聚合部分中产生错误或反过来

  • if in the future you'll need to split these two treatments, it will be harder to disentangle the code that at that moment, somebody else has written ages ago...

    如果将来你需要拆分这两种处理方法,那么很难解开当时很久以前其他人写过的代码......

Performance considerations aside, in terms of maintainability and readability I'd recommend to use two queries.

除了性能方面的考虑,在可维护性和可读性方面,我建议使用两个查询。

But keep in mind that the performance factor might not be an issue at the moment, but it can be in time once the db volume grows or whatever, it's never a negligible factor on long term ...

但请记住,性能因素目前可能不是一个问题,但它可以及时一旦数据库量增长或其他什么,它永远不会是一个可以忽略不计的因素...

#8


Even if perf is not an issue, your mind is. When a musician practices every movement is intended to improve the musician's skill. As a developer, you should develop every procedure to improve your skill. iterative loops though data is sloppy and ugly. SQL queries are elegant. Do you want to develop more elegant code or more sloppy code?

即使perf不是问题,你的思想也是如此。当音乐家练习每一个动作都是为了提高音乐家的技巧。作为开发人员,您应该开发每个程序来提高您的技能。迭代循环,虽然数据是草率和丑陋的。 SQL查询很优雅。您想开发更优雅的代码还是更邋code的代码?

#1


One SQL query, for sure.

一个SQL查询,当然。

This will

  • Save you lots of roundtrips to database
  • 为您节省大量的往返数据库

  • Allow to use more efficient GROUP BY methods
  • 允许使用更高效的GROUP BY方法

Since your aggregates may be performed equally well by the database, it will also be better for mainainability: you have all your resultset logic in one place.

由于数据库可以很好地执行聚合,因此对于可维护性也更好:您将所有结果集逻辑放在一个位置。

Here is an example of a query that returns every row and calculates a SUM:

以下是返回每一行并计算SUM的查询示例:

SELECT  string_data, numeric_data, SUM(numeric_data) OVER (PARTITION BY key)
FROM    table

Note that this will most probably use parallel access to calculate SUM's for different key's, which is hardly implementable in PHP.

请注意,这很可能会使用并行访问来计算不同密钥的SUM,这在PHP中很难实现。

Same query in MySQL:

MySQL中的相同查询:

SELECT  key, string_data, numeric_data,
        (
        SELECT  SUM(numeric_data)
        FROM    table ti
        WHERE   ti.key = to.key
        ) AS key_sum
FROM    table to

#2


If performance isn't a concern, I'd go with the second. Seems the tiniest bit friendlier.

如果表现不是一个问题,我会选择第二个。似乎最微小的一点。

If performance were a concern, my answer would be "don't think, profile". :)

如果表现是一个问题,我的回答是“不要思考,简介”。 :)

#3


The second answer is by far more clear, sensible and maintainable. You're saying the same thing with less code, which is usually better.

第二个答案更清晰,明智和可维护。你用更少的代码说同样的事情,这通常更好。

And I know you said performance is not a concern, but why fetch data more than you have to?

而且我知道你说性能不是一个问题,但为什么要获取数据超过你所需要的?

#4


I can't be certain from the example here, but I'd like to know if there's a chance to do the aggregation and other processing right in the SQL query itself. In this case, you'd have to evaluate "more maintainable" with respect to your relative comfort level expressing that processing in SQL code vs. PHP code.

我不能从这里的例子中确定,但我想知道是否有机会在SQL查询本身中进行聚合和其他处理。在这种情况下,您必须评估“更易于维护”的相对舒适度,表示SQL代码与PHP代码的处理。

Is there something about the additional processing you need to do on each row that would prevent you from expressing everything in the SQL query itself?

您是否需要在每一行上执行额外的处理,以防止您在SQL查询本身中表达所有内容?

#5


I don't think you'll find many situations at all where doing a query-per-iteration of a loop is the better choice. In fact, I'd say it's probably a good rule of thumb to never do that.

我不认为你会发现许多情况,在每次迭代循环查询是更好的选择。事实上,我认为从来没有这样做可能是一个很好的经验法则。

In other words, the fewer round trips to the database, the better.

换句话说,到数据库的往返次数越少越好。

Depending on your data and actual tables, you might be able to let SQL do the aggregation work and select all the rows you need with one query.

根据您的数据和实际表,您可以让SQL执行聚合工作,并使用一个查询选择所需的所有行。

#6


one sql query is probably a better idea. It avoids you having to re-write relational operations

一个SQL查询可能是一个更好的主意。它避免了你不得不重写关系操作

#7


I think somehow you've answered your own question, because you say you have two different processings : one aggregation and one row by row.

我想你已经回答了自己的问题,因为你说你有两个不同的处理方式:一个聚合和一个一行。

  • if you want to keep everything readable and maintainable, mixing both in a single query doesn't sound right, the query will answer two different needs so it won't be very readable

    如果你想保持一切可读和可维护,在一个查询中混合两个听起来不对,查询将回答两个不同的需求,因此它不会是非常易读

  • even if perf is not an issue, it's faster to do the aggregation on the DB server instead of doing it in code

    即使perf不是问题,在DB服务器上进行聚合比在代码中进行聚合更快

  • with only one query, the code that will handle the result will mix two processings, handling rows and computing aggregations in the same time, so in time this code will tend to get confusing and buggy

    只有一个查询,处理结果的代码将混合两个处理,同时处理行和计算聚合,所以这段代码将容易混淆和错误

  • the same code might evolve over time, for instance the row-by-row can get complex and could create bugs in the aggregation part or the other way around

    相同的代码可能会随着时间的推移而发展,例如逐行可能会变得复杂并且可能会在聚合部分中产生错误或反过来

  • if in the future you'll need to split these two treatments, it will be harder to disentangle the code that at that moment, somebody else has written ages ago...

    如果将来你需要拆分这两种处理方法,那么很难解开当时很久以前其他人写过的代码......

Performance considerations aside, in terms of maintainability and readability I'd recommend to use two queries.

除了性能方面的考虑,在可维护性和可读性方面,我建议使用两个查询。

But keep in mind that the performance factor might not be an issue at the moment, but it can be in time once the db volume grows or whatever, it's never a negligible factor on long term ...

但请记住,性能因素目前可能不是一个问题,但它可以及时一旦数据库量增长或其他什么,它永远不会是一个可以忽略不计的因素...

#8


Even if perf is not an issue, your mind is. When a musician practices every movement is intended to improve the musician's skill. As a developer, you should develop every procedure to improve your skill. iterative loops though data is sloppy and ugly. SQL queries are elegant. Do you want to develop more elegant code or more sloppy code?

即使perf不是问题,你的思想也是如此。当音乐家练习每一个动作都是为了提高音乐家的技巧。作为开发人员,您应该开发每个程序来提高您的技能。迭代循环,虽然数据是草率和丑陋的。 SQL查询很优雅。您想开发更优雅的代码还是更邋code的代码?