按时间戳之间的间隔对时间戳进行分组,然后从组MySQL进行计算

时间:2021-11-11 21:32:02

To put this question into context, I'm trying to calculate "time in app" based on an event log.

为了把这个问题放到上下文中,我尝试基于事件日志计算“应用中的时间”。

Assume the following table:

假设如下表:

user_id   event_time
2         2012-05-09 07:03:38
3         2012-05-09 07:03:42
4         2012-05-09 07:03:43
2         2012-05-09 07:03:44
2         2012-05-09 07:03:45
4         2012-05-09 07:03:52
2         2012-05-09 07:06:30

I'd like to get the difference between the highest and lowest event_time from a set of timestamps that are within 2 minutes of eachother (and grouped by user). If a timestamp is outside of a 2 minute interval from the set, it should be considered a part of another set.

我想从一组时间戳中得到最高和最低的event_time之间的差异,这些时间戳在彼此之间的2分钟内(并按用户分组)。如果时间戳位于距离集合2分钟间隔之外,则应该将其视为另一个集合的一部分。

Desired output:

期望的输出:

user_id  seconds_interval
2        7     (because 07:03:45 - 07:03:38 is 7 seconds)
3        0     (because 07:03:42)
4        9     (because 07:03:52 - 2012-05-09 07:03:43)
2        0     (because 07:06:30 is outside 2 min interval of 1st user_id=2 set)

This is what I've tried, although I can't group on seconds_interval (even if I could, I'm not sure this is the right direction):

这是我尝试过的,虽然我不能在第二个区间上分组(即使我可以,我不确定这是正确的方向):

SELECT (max(tr.event_time)-min(tr.event_time)) as seconds_interval
FROM some_table tr
INNER JOIN TrackingRaw tr2 ON (tr.event_time BETWEEN 
   tr2.event_time - INTERVAL 2 MINUTE AND tr2.event_time + INTERVAL 2 MINUTE) 
GROUP BY seconds_interval

1 个解决方案

#1


4  

I don't think there's a very straightforward way of querying your existing table to produce the data you want. However, you could maintain a second table of user sessions (of course this has the disadvantage that if you later want a report that uses a different session timeout period, you will need to repopulate the table from scratch):

我不认为有一种非常简单的方法可以查询现有的表来生成您想要的数据。但是,您可以维护用户会话的第二个表(当然,这有一个缺点,如果您以后想要使用不同的会话超时时间的报表,您将需要重新填充该表):

CREATE TABLE Sessions (
  user_id INT,
  session_start TIMESTAMP,
  session_end   TIMESTAMP,
  PRIMARY KEY (user_id, session_start),
  FOREIGN KEY (user_id, session_start) REFERENCES TrackingRaw(user_id, event_time),
  FOREIGN KEY (user_id, session_end  ) REFERENCES TrackingRaw(user_id, event_time)
);

You can automatically populate/update such a table with a trigger that uses INSERT ... SELECT ... ON DUPLICATE KEY UPDATE:

您可以使用使用INSERT…的触发器自动填充/更新此类表。选择……在重复键更新:

CREATE TRIGGER after_insert_TrackingRaw AFTER INSERT ON TrackingRaw FOR EACH ROW
  INSERT INTO Sessions (user_id, session_start, session_end)
    SELECT NEW.user_id,
           IFNULL(MAX(session_start), NEW.event_time),
           NEW.event_time
    FROM   Sessions
    WHERE  user_id = NEW.user_id
       AND session_end >= NEW.event_time - INTERVAL 2 MINUTE
  ON DUPLICATE KEY UPDATE
    session_start = session_start,
    session_end   = NEW.event_time;

Then, to obtain your desired query results:

然后,获取所需的查询结果:

SELECT user_id, session_end - session_start AS seconds_interval FROM Sessions;

See it on sqlfiddle.

sqlfiddle上看到它。


UPDATE

更新

After further reflection, you could of course build such a Sessions table within a stored procedure:

经过进一步的考虑,您当然可以在存储过程中构建这样的会话表:

CREATE PROCEDURE getSessions(IN secs INT) READS SQL DATA BEGIN
  DECLARE no_more_rows BOOLEAN;
  DECLARE cur CURSOR FOR
    SELECT user_id, event_time FROM TrackingRaw ORDER BY event_time ASC;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_more_rows = TRUE;

  DROP   TEMPORARY TABLE IF EXISTS Sessions;
  CREATE TEMPORARY TABLE Sessions (
    user_id INT,
    session_start TIMESTAMP,
    session_end   TIMESTAMP,
    PRIMARY KEY(user_id,session_start),
    FOREIGN KEY(user_id,session_start) REFERENCES TrackingRaw(user_id,event_time),
    FOREIGN KEY(user_id,session_end  ) REFERENCES TrackingRaw(user_id,event_time)
  );

  OPEN cur;
  the_loop: LOOP
    FETCH cur INTO @u, @t;
    IF no_more_rows THEN
      CLOSE cur;
      LEAVE the_loop;
    END IF;

    INSERT INTO Sessions
      SELECT @u, IFNULL(MAX(session_start), @t), @t
      FROM   Sessions
      WHERE  user_id = @u AND session_end >= @t - secs
    ON DUPLICATE KEY UPDATE
      session_start = session_start, session_end = @t
  END LOOP the_loop;

  DEALLOCATE PREPARE stmt;
  SELECT user_id, session_end - session_start AS seconds_interval FROM Sessions;
  DROP TEMPORARY TABLE Sessions;
END;;

And then to obtain your output:

然后得到你的输出:

CALL getSessions(120); -- for a 2 minute (120 second) timeout

#1


4  

I don't think there's a very straightforward way of querying your existing table to produce the data you want. However, you could maintain a second table of user sessions (of course this has the disadvantage that if you later want a report that uses a different session timeout period, you will need to repopulate the table from scratch):

我不认为有一种非常简单的方法可以查询现有的表来生成您想要的数据。但是,您可以维护用户会话的第二个表(当然,这有一个缺点,如果您以后想要使用不同的会话超时时间的报表,您将需要重新填充该表):

CREATE TABLE Sessions (
  user_id INT,
  session_start TIMESTAMP,
  session_end   TIMESTAMP,
  PRIMARY KEY (user_id, session_start),
  FOREIGN KEY (user_id, session_start) REFERENCES TrackingRaw(user_id, event_time),
  FOREIGN KEY (user_id, session_end  ) REFERENCES TrackingRaw(user_id, event_time)
);

You can automatically populate/update such a table with a trigger that uses INSERT ... SELECT ... ON DUPLICATE KEY UPDATE:

您可以使用使用INSERT…的触发器自动填充/更新此类表。选择……在重复键更新:

CREATE TRIGGER after_insert_TrackingRaw AFTER INSERT ON TrackingRaw FOR EACH ROW
  INSERT INTO Sessions (user_id, session_start, session_end)
    SELECT NEW.user_id,
           IFNULL(MAX(session_start), NEW.event_time),
           NEW.event_time
    FROM   Sessions
    WHERE  user_id = NEW.user_id
       AND session_end >= NEW.event_time - INTERVAL 2 MINUTE
  ON DUPLICATE KEY UPDATE
    session_start = session_start,
    session_end   = NEW.event_time;

Then, to obtain your desired query results:

然后,获取所需的查询结果:

SELECT user_id, session_end - session_start AS seconds_interval FROM Sessions;

See it on sqlfiddle.

sqlfiddle上看到它。


UPDATE

更新

After further reflection, you could of course build such a Sessions table within a stored procedure:

经过进一步的考虑,您当然可以在存储过程中构建这样的会话表:

CREATE PROCEDURE getSessions(IN secs INT) READS SQL DATA BEGIN
  DECLARE no_more_rows BOOLEAN;
  DECLARE cur CURSOR FOR
    SELECT user_id, event_time FROM TrackingRaw ORDER BY event_time ASC;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET no_more_rows = TRUE;

  DROP   TEMPORARY TABLE IF EXISTS Sessions;
  CREATE TEMPORARY TABLE Sessions (
    user_id INT,
    session_start TIMESTAMP,
    session_end   TIMESTAMP,
    PRIMARY KEY(user_id,session_start),
    FOREIGN KEY(user_id,session_start) REFERENCES TrackingRaw(user_id,event_time),
    FOREIGN KEY(user_id,session_end  ) REFERENCES TrackingRaw(user_id,event_time)
  );

  OPEN cur;
  the_loop: LOOP
    FETCH cur INTO @u, @t;
    IF no_more_rows THEN
      CLOSE cur;
      LEAVE the_loop;
    END IF;

    INSERT INTO Sessions
      SELECT @u, IFNULL(MAX(session_start), @t), @t
      FROM   Sessions
      WHERE  user_id = @u AND session_end >= @t - secs
    ON DUPLICATE KEY UPDATE
      session_start = session_start, session_end = @t
  END LOOP the_loop;

  DEALLOCATE PREPARE stmt;
  SELECT user_id, session_end - session_start AS seconds_interval FROM Sessions;
  DROP TEMPORARY TABLE Sessions;
END;;

And then to obtain your output:

然后得到你的输出:

CALL getSessions(120); -- for a 2 minute (120 second) timeout