数据库数据结构使用2的幂

时间:2022-12-02 21:01:05

I'm designing a data structure and wanted to know if I am missing anything doing it this way.

我正在设计一个数据结构,想知道我是否遗漏了什么。

Lets say I have a column DAY of type int.

假设我有一个int类型的列日。

1  : Monday
2  : Tuesday
4  : Wednesday
8  : Thursday
16 : Friday
32 : Saturday
64 : Sunday

If I wanted to store Monday and Friday i would input 17 into the DAY column. If I wanted to store Tuesday and Wednesday I would enter 6 etc.

如果我想在周一和周五存储,我会输入17到当天的专栏。如果我想存储星期二和星期三,我会输入6等等。

Is this a valid way of storing data. How would a query look if I wanted to select where a record contained Saturday and any variation of days, or Saturday but not Wednesday. Is this possible? Will it be fast?

这是一种有效的数据存储方式吗?如果我想选择一个记录包含周六和任何日期变化的地方,或者周六而不是周三,查询会是什么样子?这是可能的吗?会快?

What is this concept called?

这个概念叫什么?

3 个解决方案

#1


3  

Some people may tell you that this is a code 'smell' because it represents denormalisation, but I think this is a perfectly valid use of a bit-mask field:

有些人可能会告诉你这是一种代码“气味”,因为它代表了非道德化,但我认为这是一个非常有效的使用位掩码字段:

-- Contains Saturday and any other combination of days
SELECT * FROM Table
WHERE (DayBitColumn & 32) = 32

-- Contains Saturday and any other combination of days, except Wednesday
SELECT * FROM Table
WHERE (DayBitColumn & 32) = 32 AND (DayBitColumn & 4) = 0

EDIT: as pointed out by @Andriy M, this can be written more succinctly as:

编辑:正如@Andriy M所指出的,这可以更简洁地写为:

SELECT * FROM Table
WHERE (DayBitColumn & 36) = 32

['&' is bitwise AND]

(“&”是位和)

#2


2  

The crux of the question to me is

对我来说,问题的关键是

Is this possible?
Will it be fast?

这是可能的吗?会快?

Yes it is possible.
Yes and no - it depends on your data distribution.

是的,这是可能的。是和否——这取决于你的数据分布。

If you stored them in bit fields, SQL Server will still internally store them into a single byte, which means you get all the goodness of storage, plus not having to manually do the bit masking. Why duplicate the effort?

如果您将它们存储在位字段中,SQL Server仍然会在内部将它们存储到单个字节中,这意味着您将获得所有的存储优势,并且不必手工进行位屏蔽。为什么重复的工作呢?

Whether you store them separately or as a single field, indexing won't help.

无论将它们分别存储还是作为一个字段存储,索引都没有帮助。

  • as individual fields: bit masks have terrible selectivity since there are only two keys
  • 作为单独的字段:位掩码有可怕的选择性,因为只有两个键
  • as a single field: You cannot index a bit in a field, even if you can, it leads to the above point
  • 作为一个单一的字段:您不能在字段中索引一个位,即使您可以,它也会导致上述的点。

If you however normalize it and store it in a secondary table, say Event_Day something like

如果您对它进行规范化并将其存储在辅助表中,则可以使用Event_Day之类的方法

EventID | Day
1         2
1         4

Storing only the days that an event occurs on, then you have just built a materialized index. Of course you have to balance the benefit of that against having to PIVOT the data all the time to produce a nice weekly schedule.

只存储事件发生的日期,然后您已经构建了一个物化索引。当然,您必须平衡这一好处与必须始终保持数据透视以产生一个良好的每周计划之间的平衡。

#3


1  

1) Is it possible? Yes. I make use of this on my current project database which involves reconciling checks. If an item should be excluded, I mark it in the skip column. Because there are many reasons to skip something, and I want to know why it was skipped, I set the flag with bitwise operators.

1)是可能的吗?是的。我在我当前的项目数据库中使用了这个方法,它涉及到协调检查。如果一个项目应该被排除,我将它标记在跳过列中。因为有很多原因需要跳过某些内容,并且我想知道为什么要跳过这些内容,所以我使用位操作符来设置标记。

2) Is it fast? In limited cases. WHERE skip = 0? Fast. WHERE skip & 4 = 4... well, a table scan is in my future with all values being queried and operated on to fulfill my query.

2)快吗?在有限的情况下。跳过= 0在哪里?快。其中skip = 4 = 4…好吧,表扫描在我的未来,所有值都将被查询和操作以完成我的查询。

Fast to insert, fast for selecting numeric ranges, but dog slow if you want to know everything that has a Monday flag set. Speedy if you want to know everything that has a Sunday flag set and know to query as >= 64.

快速插入,快速选择数字范围,但是如果你想知道所有的东西,如果你想知道所有的东西,如果你想知道所有的东西,如果你想知道所有的东西都有一个周日的标志,并且知道要查询>= 64。

Mike Wheat's answer has the proper queries for your other questions, so I won't duplicate that. Note again they'll need a table scan and will not be speedy. If you do have them as individual columns, AND you index against each of those columns, you'll consume a lot of space making indexes. You will see limited benefit from that in a table unless they are covering indexes... Saturday + anything but Wednesday would still have to scan either all of Saturday or all of Wednesday in a day-per-column config. Table scanning all of them in that scenario may end up being faster than seeking depending on data scattering and such.

Mike Wheat的回答对你的其他问题有适当的疑问,所以我不会重复这个问题。再次注意,他们需要一张表扫描,而且不会很快。如果您将它们作为单独的列,并对每个列进行索引,那么您将消耗大量空间来创建索引。您将在表中看到有限的好处,除非它们涉及索引……星期六+除了星期三之外的任何事情都需要扫描整个星期六或整个星期三,每一栏配置一天。在这种情况下,表扫描所有的数据可能会比搜索速度更快,这取决于数据的分散等等。

#1


3  

Some people may tell you that this is a code 'smell' because it represents denormalisation, but I think this is a perfectly valid use of a bit-mask field:

有些人可能会告诉你这是一种代码“气味”,因为它代表了非道德化,但我认为这是一个非常有效的使用位掩码字段:

-- Contains Saturday and any other combination of days
SELECT * FROM Table
WHERE (DayBitColumn & 32) = 32

-- Contains Saturday and any other combination of days, except Wednesday
SELECT * FROM Table
WHERE (DayBitColumn & 32) = 32 AND (DayBitColumn & 4) = 0

EDIT: as pointed out by @Andriy M, this can be written more succinctly as:

编辑:正如@Andriy M所指出的,这可以更简洁地写为:

SELECT * FROM Table
WHERE (DayBitColumn & 36) = 32

['&' is bitwise AND]

(“&”是位和)

#2


2  

The crux of the question to me is

对我来说,问题的关键是

Is this possible?
Will it be fast?

这是可能的吗?会快?

Yes it is possible.
Yes and no - it depends on your data distribution.

是的,这是可能的。是和否——这取决于你的数据分布。

If you stored them in bit fields, SQL Server will still internally store them into a single byte, which means you get all the goodness of storage, plus not having to manually do the bit masking. Why duplicate the effort?

如果您将它们存储在位字段中,SQL Server仍然会在内部将它们存储到单个字节中,这意味着您将获得所有的存储优势,并且不必手工进行位屏蔽。为什么重复的工作呢?

Whether you store them separately or as a single field, indexing won't help.

无论将它们分别存储还是作为一个字段存储,索引都没有帮助。

  • as individual fields: bit masks have terrible selectivity since there are only two keys
  • 作为单独的字段:位掩码有可怕的选择性,因为只有两个键
  • as a single field: You cannot index a bit in a field, even if you can, it leads to the above point
  • 作为一个单一的字段:您不能在字段中索引一个位,即使您可以,它也会导致上述的点。

If you however normalize it and store it in a secondary table, say Event_Day something like

如果您对它进行规范化并将其存储在辅助表中,则可以使用Event_Day之类的方法

EventID | Day
1         2
1         4

Storing only the days that an event occurs on, then you have just built a materialized index. Of course you have to balance the benefit of that against having to PIVOT the data all the time to produce a nice weekly schedule.

只存储事件发生的日期,然后您已经构建了一个物化索引。当然,您必须平衡这一好处与必须始终保持数据透视以产生一个良好的每周计划之间的平衡。

#3


1  

1) Is it possible? Yes. I make use of this on my current project database which involves reconciling checks. If an item should be excluded, I mark it in the skip column. Because there are many reasons to skip something, and I want to know why it was skipped, I set the flag with bitwise operators.

1)是可能的吗?是的。我在我当前的项目数据库中使用了这个方法,它涉及到协调检查。如果一个项目应该被排除,我将它标记在跳过列中。因为有很多原因需要跳过某些内容,并且我想知道为什么要跳过这些内容,所以我使用位操作符来设置标记。

2) Is it fast? In limited cases. WHERE skip = 0? Fast. WHERE skip & 4 = 4... well, a table scan is in my future with all values being queried and operated on to fulfill my query.

2)快吗?在有限的情况下。跳过= 0在哪里?快。其中skip = 4 = 4…好吧,表扫描在我的未来,所有值都将被查询和操作以完成我的查询。

Fast to insert, fast for selecting numeric ranges, but dog slow if you want to know everything that has a Monday flag set. Speedy if you want to know everything that has a Sunday flag set and know to query as >= 64.

快速插入,快速选择数字范围,但是如果你想知道所有的东西,如果你想知道所有的东西,如果你想知道所有的东西,如果你想知道所有的东西都有一个周日的标志,并且知道要查询>= 64。

Mike Wheat's answer has the proper queries for your other questions, so I won't duplicate that. Note again they'll need a table scan and will not be speedy. If you do have them as individual columns, AND you index against each of those columns, you'll consume a lot of space making indexes. You will see limited benefit from that in a table unless they are covering indexes... Saturday + anything but Wednesday would still have to scan either all of Saturday or all of Wednesday in a day-per-column config. Table scanning all of them in that scenario may end up being faster than seeking depending on data scattering and such.

Mike Wheat的回答对你的其他问题有适当的疑问,所以我不会重复这个问题。再次注意,他们需要一张表扫描,而且不会很快。如果您将它们作为单独的列,并对每个列进行索引,那么您将消耗大量空间来创建索引。您将在表中看到有限的好处,除非它们涉及索引……星期六+除了星期三之外的任何事情都需要扫描整个星期六或整个星期三,每一栏配置一天。在这种情况下,表扫描所有的数据可能会比搜索速度更快,这取决于数据的分散等等。