如何在MYSQL中对行进行配对?

I'm working on a simple time tracking app.

我正在开发一个简单的时间跟踪应用程序。

I've created a table that logs the IN and OUT times of employees.

我创建了一个记录员工进出时间的表。

Here is an example of how my data currently looks:

下面是我的数据的一个例子:

E_ID | In_Out |      Date_Time
------------------------------------
  3  |   I    | 2012-08-19 15:41:52
  3  |   O    | 2012-08-19 17:30:22
  1  |   I    | 2012-08-19 18:51:11
  3  |   I    | 2012-08-19 18:55:52
  1  |   O    | 2012-08-19 20:41:52
  3  |   O    | 2012-08-19 21:50:30

Im trying to create a query that will pair the IN and OUT times of an employee into one row like this:

我正在尝试创建一个查询，将员工的进进出出时间匹配为一行，如下所示:

E_ID |       In_Time       |      Out_Time
------------------------------------------------
  3  | 2012-08-19 15:41:52 | 2012-08-19 17:30:22
  3  | 2012-08-19 18:55:52 | 2012-08-19 21:50:30
  1  | 2012-08-19 18:51:11 | 2012-08-19 20:41:52

I hope I'm being clear in what I'm trying to achieve here. Basically I want to generate a report that had both the in and out time merged into one row.

我希望我能清楚地知道我在这里想要达到的目标。基本上，我希望生成一个报告，该报告将输入和输出时间合并为一行。

Any help with this would be greatly appreciated. Thanks in advance.

如有任何帮助，我们将不胜感激。提前谢谢。

2 个解决方案

#1

There are three basic approaches I can think of.

我可以想到三种基本的方法。

One approach makes use of MySQL user variables, one approach uses a theta JOIN, another uses a subquery in the SELECT list.

一种方法使用MySQL用户变量，一种方法使用theta连接，另一种方法使用SELECT列表中的子查询。

theta-JOIN

theta-JOIN

One approach is to use a theta-JOIN. This approach is a generic SQL approach (no MySQL specific syntax), which can work with multiple RDBMS.

一种方法是使用theta-JOIN。这种方法是一种通用的SQL方法(没有特定于MySQL的语法)，可以使用多个RDBMS。

N.B. With a large number of rows, this approach can create a significantly large intermediate result set, which can lead to problematic performance.

N.B.有大量的行，这种方法可以创建一个非常大的中间结果集，这会导致问题的性能。

SELECT o.e_id, MAX(i.date_time) AS in_time, o.date_time AS out_time    
  FROM e `o`
  LEFT
  JOIN e `i` ON i.e_id = o.e_id AND i.date_time < o.date_time AND i.in_out = 'I'
 WHERE o.in_out = 'O'
 GROUP BY o.e_id, o.date_time
 ORDER BY o.date_time

What this does is match every 'O' row for an employee with every 'I' row that is earlier, and then we use the MAX aggregate to pick out the 'I' record with the closest date time.

它所做的是将员工的每一个O行与之前的每一个I行匹配，然后我们使用最大聚合来挑选最近日期时间的I记录。

This works for perfectly paired data; could produce odd results for imperfect pairs... (two consecutive 'O' records with no intermediate 'I' row, will both get matched to the same 'I' row, etc.)

这适用于完美配对的数据;对于不完美的组合可能会产生奇怪的结果……(没有中间“I”行的两个连续的“O”记录都将被匹配到相同的“I”行，等等。)

correlated subquery in SELECT list

选择列表中的相关子查询

Another approach is to use a correlated subquery in the SELECT list. This can have sub-optimal performance, but is sometimes workable (and is occasionally the fastest way to return the specified result set... this approach works best when we have a limited number of rows returned in the outer query.)

另一种方法是在SELECT列表中使用关联子查询。这可能具有次优性能，但有时是可行的(有时是返回指定结果集的最快方式……)当在外部查询中返回的行数有限时，这种方法最有效。

 SELECT o.e_id
      , (SELECT MAX(i.date_time)
           FROM e `i`
          WHERE i.in_out = 'I'
            AND i.e_id = o.e_id
            AND i.date_time < o.date_time
        ) AS in_time
      , o.date_time AS out_time
   FROM e `o`
  WHERE o.in_out = 'O'
  ORDER BY o.date_time

User variables

用户变量

Another approach is to make use of MySQL user variables. (This is a MySQL-specific approach, and is a workaround to the "missing" analytic functions.)

另一种方法是使用MySQL用户变量。(这是一种特定于mysql的方法，是对“缺失”解析函数的一种变通方法。)

What this query does is order all of the rows by e_id, then by date_time, so we can process them in order. Whenever we encounter an 'O' (out) row, we use the value of date_time from the immediately preceding 'I' row as the 'in_time')

这个查询的作用是通过e_id对所有行进行排序，然后按date_time进行排序，这样我们就可以按顺序处理它们。每当遇到'O' (out)行时，我们使用前面的'I'行中的date_time值作为'in_time')

N.B.: This usage of MySQL user variables is dependent on MySQL performing operations in a specific order, a predictable plan. The use of the inline views (or "derived tables", in MySQL parlance) gets us a predictable execution plan. But this behavior is subject to change in future releases of MySQL.

注意::这种MySQL用户变量的使用依赖于MySQL按照特定的顺序、可预测的计划执行操作。使用内联视图(或“派生表”，用MySQL表示)可以得到一个可预测的执行计划。但是这种行为在以后的MySQL版本中会发生变化。

SELECT c.e_id
     , CAST(c.in_time AS DATETIME) AS in_time
     , c.out_time
  FROM (
         SELECT IF(@prev_e_id = d.e_id,@in_time,@in_time:=NULL) AS reset_in_time
              , @in_time := IF(d.in_out = 'I',d.date_time,@in_time) AS in_time
              , IF(d.in_out = 'O',d.date_time,NULL) AS out_time
              , @prev_e_id := d.e_id  AS e_id
           FROM (
                  SELECT e_id, date_time, in_out 
                    FROM e
                    JOIN (SELECT @prev_e_id := NULL, @in_time := NULL) f
                   ORDER BY e_id, date_time, in_out 
                 ) d
       ) c
 WHERE c.out_time IS NOT NULL
 ORDER BY c.out_time

This works for the set of data you have, it needs more thorough testing and tweaking to ensure you get the result set you want with quirky data, when the rows are not perfectly paired (e.g. two 'O' rows with no 'I' row between them, an 'I' row with no subsequent 'O' row, etc.)

这适用于数据的集合,它需要更彻底的测试和调整,以确保你得到你想要的结果集的数据,当行不完全配对(如两个“O”行,没有“我”行之间,“我”一行没有后续的“O”行,等等)。

SQL Fiddle

SQL小提琴

#2

Unfortunately, MySQL doesn't have ROW_NUMBER() OVER(PARTITION BY ORDER BY() function like SQL Server or this would be incredibly easy.

不幸的是，MySQL并没有ROW_NUMBER() OVER(按ORDER BY()函数进行分区，比如SQL Server，否则这将非常简单。

But, there is a way to do this in MySQL:

但是，在MySQL中有这样的方法:

set @num := 0, @in_out := '';

select emp_in.id,
  emp_in.in_time,
  emp_out.out_time
from 
(
  select id, in_out, date_time in_time, 
     @num := if(@in_out = in_out, @num + 1, 1) as row_number,
     @in_out := in_out as dummy
  from mytable
  where in_out = 'I'
  order by date_time, id
) emp_in
join
(
  select id, in_out, date_time out_time,
     @num := if(@in_out = in_out, @num + 1, 1) as row_number,
     @in_out := in_out as dummy
  from mytable
  where in_out = 'O'
  order by date_time, id
) emp_out
  on emp_in.id = emp_out.id
  and emp_in.row_number = emp_out.row_number
order by emp_in.id, emp_in.in_time

Basically, this creates two sub-queries each one generates a row_number for that particular record - one subquery is for in_time and the other is for out_time.

基本上，这将创建两个子查询，每个子查询为这个特定的记录生成一个row_number——一个子查询是in_time，另一个是out_time。

Then you JOIN the two queries together on the emp_id and the row_number

然后在emp_id和row_number上将这两个查询连接在一起

See SQL Fiddle with Demo

参见SQL小提琴演示

#1