I have a datamodel question for a GPS tracking app. When someone uses our app it will save latitude, longitude, current speed, timestamp and burned_calories every 5 seconds. When a workout is completed, the average speed, total time/distance and burned calories of the workout will be stored in a database. So far so good..
我有一个GPS跟踪应用程序的数据模型问题。当有人使用我们的应用程序时,它将每5秒节省一次纬度,经度,当前速度,时间戳和烧伤卡路里。当锻炼完成时,锻炼的平均速度,总时间/距离和燃烧的卡路里将存储在数据库中。到现在为止还挺好..
What we want is to also store the data that is saved those every 5 seconds, so we can utilize this later on to plot graphs/charts of a workout for example.
我们想要的是还存储每5秒保存一次的数据,因此我们稍后可以利用它来绘制锻炼的图形/图表。
How should we store this amount of data in a database? A single workout can contain 720 rows if someone runs for an hour. Perhaps a serialised/gzcompressed data array in a single row. I'm aware though that this is bad practice..
我们应该如何将这些数据存储在数据库中?如果有人跑了一个小时,单个锻炼可以包含720行。也许是一行中的序列化/ gz压缩数据数组。我知道这是不好的做法..
A relational one/many to many model would be undone? I know MySQL can easily handle large amounts of data, but we are talking about 720 * workouts twice a week * 7000 users = over 10 million rows a week. (Ofcourse we could only store the data of every 10 seconds to halve the no. of rows, or every 20 seconds, etc... but it would still be a large amount of data over time + the accuracy of the graphs would decrease)
关系型/多对多模型将被撤消?我知道MySQL可以轻松处理大量数据,但我们每周两次讨论720 *次锻炼* 7000名用户=每周超过1000万行。 (当然,我们只能存储每10秒钟的数据,将行数减半,或者每20秒减半......但是随着时间的推移它仍会是大量的数据+图形的准确性会降低)
How would you do this? Thanks in advance for your input!
你会怎么做?提前感谢您的意见!
1 个解决方案
#1
6
Just some ideas:
只是一些想法:
- Quantize your lat/lon data. I believe that for technical reasons, the data most likely will be quantized already, so if you can detect that quantization, you might use it. The idea here is to turn double numbers into reasonable integers. In the worst case, you may quantize to the precision double numbers provide, which means using 64 bit integers, but I very much doubt your data is even close to that resolution. Perhaps a simple grid with about one meter edge length is enough for you?
- Compute differences. Most numbers will be fairly large in terms of absolute values, but also very close together (unless your members run around half the world…). So this will result in rather small numbers. Furthermore, as long as people run with constant speed into a constant direction, you will quite often see the same differences. The coarser your spatial grid in step 1, the more likely you get exactly the same differences here.
- Compute a Huffman code for these differences. You might try encoding lat and long movement separately, or computing a single code with 2d displacement vectors at its leaves. Try both and compare the results.
- Store the result in a BLOB, together with the dictionary to decode your Huffman code, and the initial position so you can return data to absolute coordinates.
量化纬度/经度数据。我相信由于技术原因,数据很可能已经量化,所以如果你能检测到量化,你可以使用它。这里的想法是将双数转换为合理的整数。在最坏的情况下,您可以量化为精确的双数提供,这意味着使用64位整数,但我非常怀疑您的数据是否接近该分辨率。也许一个大约一米边长的简单网格对你来说足够了吗?
计算差异。大多数数字在绝对值方面都相当大,但也非常接近(除非你的成员在世界的一半左右运行......)。所以这将导致相当小的数字。此外,只要人们以恒定的速度奔向恒定的方向,你就会经常看到相同的差异。在步骤1中,您的空间网格越粗糙,您在此处获得完全相同的差异的可能性就越大。
为这些差异计算霍夫曼代码。您可以尝试单独编码纬度和长度运动,或者在其叶子上计算具有2d位移矢量的单个代码。尝试两者并比较结果。
将结果存储在BLOB中,与字典一起解码您的霍夫曼代码,以及初始位置,以便您可以将数据返回到绝对坐标。
The result should be a fairly small set of data for each data set, which you can retrieve and decompress as a whole. Retrieving individual parts from the database is not possible, but it sounds like you wouldn't be needing that.
结果应该是每个数据集的一小组数据,您可以从中检索和解压缩整个数据集。从数据库中检索单个部分是不可能的,但听起来你不需要这样做。
The benefit of Huffman coding over gzip is that you won't have to artificially introduce an intermediate byte stream. Directly encoding the actual differences you encounter, with their individual properties, should work much better.
通过gzip进行霍夫曼编码的好处是您不必人为地引入中间字节流。直接编码您遇到的实际差异及其各自的属性应该可以更好地工作。
#1
6
Just some ideas:
只是一些想法:
- Quantize your lat/lon data. I believe that for technical reasons, the data most likely will be quantized already, so if you can detect that quantization, you might use it. The idea here is to turn double numbers into reasonable integers. In the worst case, you may quantize to the precision double numbers provide, which means using 64 bit integers, but I very much doubt your data is even close to that resolution. Perhaps a simple grid with about one meter edge length is enough for you?
- Compute differences. Most numbers will be fairly large in terms of absolute values, but also very close together (unless your members run around half the world…). So this will result in rather small numbers. Furthermore, as long as people run with constant speed into a constant direction, you will quite often see the same differences. The coarser your spatial grid in step 1, the more likely you get exactly the same differences here.
- Compute a Huffman code for these differences. You might try encoding lat and long movement separately, or computing a single code with 2d displacement vectors at its leaves. Try both and compare the results.
- Store the result in a BLOB, together with the dictionary to decode your Huffman code, and the initial position so you can return data to absolute coordinates.
量化纬度/经度数据。我相信由于技术原因,数据很可能已经量化,所以如果你能检测到量化,你可以使用它。这里的想法是将双数转换为合理的整数。在最坏的情况下,您可以量化为精确的双数提供,这意味着使用64位整数,但我非常怀疑您的数据是否接近该分辨率。也许一个大约一米边长的简单网格对你来说足够了吗?
计算差异。大多数数字在绝对值方面都相当大,但也非常接近(除非你的成员在世界的一半左右运行......)。所以这将导致相当小的数字。此外,只要人们以恒定的速度奔向恒定的方向,你就会经常看到相同的差异。在步骤1中,您的空间网格越粗糙,您在此处获得完全相同的差异的可能性就越大。
为这些差异计算霍夫曼代码。您可以尝试单独编码纬度和长度运动,或者在其叶子上计算具有2d位移矢量的单个代码。尝试两者并比较结果。
将结果存储在BLOB中,与字典一起解码您的霍夫曼代码,以及初始位置,以便您可以将数据返回到绝对坐标。
The result should be a fairly small set of data for each data set, which you can retrieve and decompress as a whole. Retrieving individual parts from the database is not possible, but it sounds like you wouldn't be needing that.
结果应该是每个数据集的一小组数据,您可以从中检索和解压缩整个数据集。从数据库中检索单个部分是不可能的,但听起来你不需要这样做。
The benefit of Huffman coding over gzip is that you won't have to artificially introduce an intermediate byte stream. Directly encoding the actual differences you encounter, with their individual properties, should work much better.
通过gzip进行霍夫曼编码的好处是您不必人为地引入中间字节流。直接编码您遇到的实际差异及其各自的属性应该可以更好地工作。