提取会话时保留元素数据

时间:2021-09-11 15:01:16

Similarly to the top wikipedia sessions example I have the following test data

与**会话示例类似,我有以下测试数据

EDITS = [
      json.dumps({'timestamp': 0, 'username': 'user1', 'action': 'a'}),
      json.dumps({'timestamp': 1, 'username': 'user1', 'action': 'b'}),
      json.dumps({'timestamp': 20, 'username': 'user1', 'action': 'a'}),
      json.dumps({'timestamp': 132, 'username': 'user2', 'action': 'a'}),
      json.dumps({'timestamp': 500, 'username': 'user2', 'action': 'b'}),
      json.dumps({'timestamp': 3601, 'username': 'user2', 'action': 'b'}),
      json.dumps({'timestamp': 3602, 'username': 'user2', 'action': 'a'}),
      json.dumps({'timestamp': 8004, 'username': 'user2', 'action': 'a'}),
      json.dumps({'timestamp': 9320, 'username': 'user1', 'action': 'b'})
  ]

I would like to split the dataset into sessions per username and then for each user session count the user actions. So for the previous dataset and one hour max gap (3600 seconds), I want to get the following result:

我想将数据集拆分为每个用户名的会话,然后为每个用户会话计算用户操作。因此,对于之前的数据集和一小时最大间隙(3600秒),我希望得到以下结果:

EXPECTED = [
      'user1 : [0.0, 3620.0), a: 2, b: 1',
      'user2 : [132.0, 7202.0), a: 2, b: 2',
      'user2 : [8004.0, 11604.0), a: 1, b: 0',
      'user1 : [9320.0, 12920.0), a: 0, b: 1',
  ]

Contrary to the wikipedia sessions example I need to keep the complete element data and not only the key in order to use within my custom combiner function.

与*会话示例相反,我需要保留完整的元素数据,而不仅仅是键,以便在我的自定义组合器函数中使用。

1 个解决方案

#1


0  

You should be able to write a CombineFn that counts the number of actions of each type, using a dictionary of counts as the accumulator. Then, you can just use session windows in a collection keyed by user ID with that combiner.

您应该能够使用计数字典作为累加器来编写计算每种类型的操作数的CombineFn。然后,您可以在由该组合器使用用户ID键入的集合中使用会话窗口。

See the Beam programming guide section on Combine Fns for ideas on how to write one.

有关如何编写一个的想法,请参阅Combine Fns上的Beam编程指南部分。

#1


0  

You should be able to write a CombineFn that counts the number of actions of each type, using a dictionary of counts as the accumulator. Then, you can just use session windows in a collection keyed by user ID with that combiner.

您应该能够使用计数字典作为累加器来编写计算每种类型的操作数的CombineFn。然后,您可以在由该组合器使用用户ID键入的集合中使用会话窗口。

See the Beam programming guide section on Combine Fns for ideas on how to write one.

有关如何编写一个的想法,请参阅Combine Fns上的Beam编程指南部分。