数据存储设计的大量异构数据

时间:2021-07-25 16:13:48

Here is something I've wondered for quite some time, and have not seen a real (good) solution for yet. It's a problem I imagine many games having, and that I can't easily think of how to solve (well). Ideas are welcome, but since this is not a concrete problem, don't bother asking for more details - just make them up! (and explain what you made up).

这是我一直想知道的事情,并且还没有看到真正的(好的)解决方案。这是一个我想象很多游戏的问题,我不能轻易想到如何解决(好)。我们欢迎您的想法,但由于这不是一个具体的问题,所以不要再询问更多细节了 - 只需要制作它们! (并解释你的构成)。

Ok, so, many games have the concept of (inventory) items, and often, there are hundreds of different kinds of items, all with often very varying data structures - some items are very simple ("a rock"), others can have insane complexity or data behind them ("a book", "a programmed computer chip", "a container with more items"), etc.

好吧,所以,许多游戏都有(库存)物品的概念,而且通常有数百种不同的物品,所有物品的数据结构往往非常不同 - 有些物品非常简单(“摇滚”),其他物品可以有疯狂的复杂性或背后的数据(“一本书”,“一个编程的计算机芯片”,“一个有更多项目的容器”)等。

Now, programming something like that is easy - just have everything implement an interface, or maybe extend an abstract root item. Since objects in the programming world don't have to look the same on the inside as on the outside, there is really no issue with how much and what kind of private fields any type of item has.

现在,编写类似的东西很简单 - 只需让所有东西都实现一个接口,或者扩展一个抽象的根项。由于编程世界中的对象不必在内部看起来与外部看起来相同,因此任何类型的项目具有多少和哪种私有字段都没有问题。

But when it comes to database serialization (binary serialization is of course no problem), you are facing a dilemma: how would you represent that in, say, a typical SQL database ?

但是当谈到数据库序列化(二进制序列化当然没有问题)时,你面临着一个两难的境地:你会在一个典型的SQL数据库中代表什么?

Some attempts at a solution that I have seen, none of which I find satisfying:

在我看到的解决方案中有一些尝试,我发现其中没有一个令人满意:

  1. Binary serialization of the items, the database just holds an ID and a blob.

    对项目进行二进制序列化,数据库只保存一个ID和一个blob。

    • Pro's: takes like 10 seconds to implement.
    • Pro:需要10秒才能实现。
    • Con's: Basically sacrifices every database feature, hard to maintain, near impossible to refactor.
    • Con:基本上牺牲每个数据库功能,难以维护,几乎不可能重构。
  2. A table per item type.

    每个项目类型的表格。

    • Pro's: Clean, flexible.
    • Pro:干净,灵活。
    • Con's: With a wide variety come hundreds of tables, and every search for an item has to query them all since SQL doesn't have the concept of table/type 'reference'.
    • Con:由于SQL不具有表/类型“引用”的概念,因此各种各样的数百个表,以及对项目的每次搜索都必须查询它们。
  3. One table with a lot of fields that aren't used by every item.

    一个表有很多字段,每个项目都没有使用。

    • Pro's: takes like 10 seconds to implement, still searchable.
    • Pro:需要10秒才能实现,仍然可以搜索。
    • Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
    • Con:浪费空间,性能,数据库混淆,告诉我们正在使用哪些字段。
  4. A few tables with a few 'base profiles' for storage where similar items get thrown together and use the same fields for different data.

    一些表有一些“基本配置文件”用于存储,其中类似的项目被抛在一起并对不同的数据使用相同的字段。

    • Pro's: I've got nothing.
    • 亲:我什么都没有。
    • Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
    • Con:浪费空间,性能,数据库混淆,告诉我们正在使用哪些字段。

What ideas do you have? Have you seen another design that works better or worse?

你有什么想法?您是否看到过另一种效果更好或更差的设计?

6 个解决方案

#1


4  

It depends if you need to sort, filter, count, or analyze those attribute.

这取决于您是否需要对这些属性进行排序,过滤,计数或分析。

If you use EAV, then you will screw yourself nicely. Try doing reports on an EAV schema.

如果你使用EAV,那么你会很好地搞砸自己。尝试对EAV架构进行报告。

The best option is to use Table Inheritance:

最好的选择是使用表继承:

PRODUCT
id pk
type
att1

PRODUCT_X
id pk fk PRODUCT
att2
att3

PRODUCT_Y
id pk fk PRODUCT
att4
att 5

For attributes that you don't need to search/sort/analyze, then use a blob or xml

对于不需要搜索/排序/分析的属性,请使用blob或xml

#2


2  

I have two alternatives for you:

我有两种选择:

  1. One table for the base type and supplemental tables for each “class” of specialized types.

    一个表用于基本类型和每个“类”专用类型的补充表。

    In this schema, properties common to all “objects” are stored in one table, so you have a unique record for every object in the game. For special types like books, containers, usable items, etc, you have another table for each unique set of properties or relationships those items need. Every special type will therefore be represented by two records: the base object record and the supplemental record in a particular special type table.

    在此模式中,所有“对象”共有的属性存储在一个表中,因此您可以为游戏中的每个对象创建唯一的记录。对于书籍,容器,可用物品等特殊类型,您可以为这些项目所需的每个唯一属性或关系设置另一个表。因此,每个特殊类型将由两个记录表示:基本对象记录和特定特殊类型表中的补充记录。

    PROS: You can use column-based features of your database like custom domains, checks, and xml processing; you can have simpler triggers on certain types; your queries differ exactly at the point of diverging concerns.

    PROS:您可以使用数据库的基于列的功能,如自定义域,检查和xml处理;你可以在某些类型上使用更简单的触发器;您的问题在不同的问题上完全不同。

    CONS: You need two inserts for many objects.

    缺点:对于许多对象,您需要两个插入。

  2. Use a “kind” enum field and a JSONB-like field for the special type data.

    对特殊类型数据使用“kind”枚举字段和类似JSONB的字段。

    This is kind of like your #1 or #3, except with some database help. Postgres added JSONB, giving you an improvement over the old EAV pattern. Other databases have a similar complex field type. In this strategy you roll your own mini schema that you stash in the JSONB field. The kind field declares what you expect to find in that JSONB field.

    这有点像#1或#3,除了一些数据库帮助。 Postgres添加了JSONB,让您对旧的EAV模式有所改进。其他数据库具有类似的复杂字段类型。在此策略中,您将滚动自己的迷你架构,并将其存储在JSONB字段中。 kind字段声明了您希望在该JSONB字段中找到的内容。

    PROS: You can extract special type data in your queries; can add check constraints and have a simple schema to deal with; you can benefit from indexing even though your data is heterogenous; your queries and inserts are simple.

    PROS:您可以在查询中提取特殊类型的数据;可以添加检查约束并具有简单的模式来处理;即使您的数据是异质的,您也可以从索引中受益;您的查询和插入很简单。

    CONS: Your data types within JSONB-like fields are pretty limited and you have to roll your own validation.

    缺点:您在类似JSONB的字段中的数据类型非常有限,您必须自己进行验证。

#3


1  

Yes, it is a pain to design database formats like this. I'm designing a notification system and reached the same problem. My notification system is however less complex than yours - the data it holds is at most ids and usernames. My current solution is a mix of 1 and 3 - I serialize data that is different from every notification, and use a column for the 2 usernames (some may have 2 or 1). I shy away from method 2 because I hate that design, but it's probably just me.

是的,设计这样的数据库格式是一件痛苦的事。我正在设计一个通知系统并遇到了同样的问题。然而,我的通知系统不像你的那么复杂 - 它拥有的数据最多是id和用户名。我目前的解决方案是1和3的混合 - 我序列化与每个通知不同的数据,并使用一个列用于2个用户名(一些可能有2或1)。我回避方法2,因为我讨厌那种设计,但它可能只是我。

However, if you can afford it, I would suggest thinking outside the realm of RDBMS - it sounds like Non-RDBMS (especially key/value storage ones) may be a better fit to store these data, especially if item 1 and item 2 differ from each item a lot.

但是,如果你能负担得起,我建议在RDBMS范围之外思考 - 听起来像非RDBMS(特别是键/值存储的)可能更适合存储这些数据,特别是如果第1项和第2项不同从每个项目很多。

#4


1  

I'm sure this has been asked here a million times before, but in addition to the options which you have discussed in your question, you can look at EAV schema which is very flexible, but which has its own sets of cons.

我确信之前已经有一百万次问过这个问题,但除了你在问题中讨论过的选项之外,你可以看看EAV架构,它非常灵活,但它有自己的缺点。

Another alternative is database systems which are not relational. There are object databases as well as various key/value stores and document databases.

另一种选择是不是关系的数据库系统。有对象数据库以及各种键/值存储和文档数据库。

Typically all these things break down to some extent when you need to query against the flexible attributes. This is kind of an intrinsic problem, however. Conceptually, what does it really mean to query things accurately which are unstructured?

通常,当您需要查询灵活属性时,所有这些事情都会在某种程度上分解。然而,这是一种内在问题。从概念上讲,准确地查询非结构化的东西真正意味着什么?

#5


1  

First of all, do you actually need the concurrency, scalability and ACID transactions of a real database? Unless you are building a MMO, your game structures will likely fit in memory anyway, so you can search and otherwise manipulate them there directly. In a scenario like this, the "database" is just a store for serialized objects, and you can replace it with the file system.

首先,您真的需要真实数据库的并发性,可伸缩性和ACID事务吗?除非您正在构建MMO,否则您的游戏结构无论如何都可能适合内存,因此您可以直接搜索并以其他方式操作它们。在这种情况下,“数据库”只是序列化对象的存储,您可以将其替换为文件系统。


If you conclude that you do (need a database), then the key is in figuring out what "atomicity" means from the perspective of the data management.

如果你得出结论(需要一个数据库),那么关键在于从数据管理的角度弄清楚“原子性”意味着什么。

For example, if a game item has a bunch of attributes, but none of these attributes are manipulated individually at the database level (even though they could well be at the application level), then it can be considered as "atomic" from the data management perspective. OTOH, if the item needs to be searched on some of these attributes, then you'll need a good way to index them in the database, which typically means they'll have to be separate fields.

例如,如果游戏项具有一堆属性,但这些属性都不是在数据库级别单独操作的(即使它们很可能处于应用程序级别),那么它可以被视为来自数据的“原子”管理视角。 OTOH,如果需要在某些属性上搜索项目,那么您需要一种好方法在数据库中对它们进行索引,这通常意味着它们必须是单独的字段。

Once you have identified attributes that should be "visible" versus the attributes that should be "invisible" from the database perspective, serialize the latter to BLOBs (or whatever), then forget about them and concentrate on structuring the former.

一旦你从数据库的角度确定了应该“可见”的属性与应该是“不可见”的属性,将后者序列化为BLOB(或其他),然后忘记它们并专注于构造前者。

That's where the fun starts and you'll probably need to use "all of the above" strategy for reasonable results.

这就是乐趣开始的地方,你可能需要使用“所有上述”策略来获得合理的结果。

BTW, some databases support "deep" indexes that can go into heterogeneous data structures. For example, take a look at Oracle's XMLIndex, though I doubt you'll use Oracle for a game.

顺便说一句,一些数据库支持可以进入异构数据结构的“深层”索引。例如,看看Oracle的XMLIndex,虽然我怀疑你会在游戏中使用Oracle。

#6


1  

You seem to be trying to solve this for a gaming context, so maybe you could consider a component-based approach. I have to say that I personally haven't tried this yet, but I've been looking into it for a while and it seems to me something similar could be applied.

您似乎正在尝试为游戏环境解决此问题,因此您可以考虑采用基于组件的方法。我不得不说我个人还没有尝试过这个,但我已经研究了一段时间,在我看来类似的东西可以应用。

The idea would be that all the entities in your game would basically be a bag of components. These components can be Position, Energy or for your inventory case, Collectable, for example. Then, for this Collectable component you can add custom fields such as category, numItems, etc.

我们的想法是游戏中的所有实体基本上都是一包组件。例如,这些组件可以是位置,能源或库存案例,可收集。然后,对于此Collectable组件,您可以添加自定义字段,例如category,numItems等。

When you're going to render the inventory, you can simply query your entity system for items that have the Collectable component.

当您要渲染清单时,您只需在实体系统中查询具有Collectable组件的项目。

How can you save this into a DB? You can define the components independently in their own table and then for the entities (each in their own table as well) you would add a "Components" column which would hold an array of IDs referencing these components. These IDs would effectively be like foreign keys, though I'm aware that this is not exactly how you can model things in relational databases, but you get the idea.

如何将其保存到数据库中?您可以在它们自己的表中单独定义组件,然后为实体(每个也在它们自己的表中)定义组件,您可以添加一个“组件”列,该列将包含引用这些组件的ID数组。这些ID实际上就像外键一样,但我知道这并不是你如何在关系数据库中建模,但你明白了。

Then, when you load the entities and their components at runtime, based on the component being loaded you can set the corresponding flag in their bag of components so that you know which components this entity has, and they'll then become queryable.

然后,当您在运行时加载实体及其组件时,基于正在加载的组件,您可以在其组件包中设置相应的标志,以便您知道该实体具有哪些组件,然后它们将变为可查询的。

Here's an interesting read about component-based entity systems.

这是关于基于组件的实体系统的有趣读物。

#1


4  

It depends if you need to sort, filter, count, or analyze those attribute.

这取决于您是否需要对这些属性进行排序,过滤,计数或分析。

If you use EAV, then you will screw yourself nicely. Try doing reports on an EAV schema.

如果你使用EAV,那么你会很好地搞砸自己。尝试对EAV架构进行报告。

The best option is to use Table Inheritance:

最好的选择是使用表继承:

PRODUCT
id pk
type
att1

PRODUCT_X
id pk fk PRODUCT
att2
att3

PRODUCT_Y
id pk fk PRODUCT
att4
att 5

For attributes that you don't need to search/sort/analyze, then use a blob or xml

对于不需要搜索/排序/分析的属性,请使用blob或xml

#2


2  

I have two alternatives for you:

我有两种选择:

  1. One table for the base type and supplemental tables for each “class” of specialized types.

    一个表用于基本类型和每个“类”专用类型的补充表。

    In this schema, properties common to all “objects” are stored in one table, so you have a unique record for every object in the game. For special types like books, containers, usable items, etc, you have another table for each unique set of properties or relationships those items need. Every special type will therefore be represented by two records: the base object record and the supplemental record in a particular special type table.

    在此模式中,所有“对象”共有的属性存储在一个表中,因此您可以为游戏中的每个对象创建唯一的记录。对于书籍,容器,可用物品等特殊类型,您可以为这些项目所需的每个唯一属性或关系设置另一个表。因此,每个特殊类型将由两个记录表示:基本对象记录和特定特殊类型表中的补充记录。

    PROS: You can use column-based features of your database like custom domains, checks, and xml processing; you can have simpler triggers on certain types; your queries differ exactly at the point of diverging concerns.

    PROS:您可以使用数据库的基于列的功能,如自定义域,检查和xml处理;你可以在某些类型上使用更简单的触发器;您的问题在不同的问题上完全不同。

    CONS: You need two inserts for many objects.

    缺点:对于许多对象,您需要两个插入。

  2. Use a “kind” enum field and a JSONB-like field for the special type data.

    对特殊类型数据使用“kind”枚举字段和类似JSONB的字段。

    This is kind of like your #1 or #3, except with some database help. Postgres added JSONB, giving you an improvement over the old EAV pattern. Other databases have a similar complex field type. In this strategy you roll your own mini schema that you stash in the JSONB field. The kind field declares what you expect to find in that JSONB field.

    这有点像#1或#3,除了一些数据库帮助。 Postgres添加了JSONB,让您对旧的EAV模式有所改进。其他数据库具有类似的复杂字段类型。在此策略中,您将滚动自己的迷你架构,并将其存储在JSONB字段中。 kind字段声明了您希望在该JSONB字段中找到的内容。

    PROS: You can extract special type data in your queries; can add check constraints and have a simple schema to deal with; you can benefit from indexing even though your data is heterogenous; your queries and inserts are simple.

    PROS:您可以在查询中提取特殊类型的数据;可以添加检查约束并具有简单的模式来处理;即使您的数据是异质的,您也可以从索引中受益;您的查询和插入很简单。

    CONS: Your data types within JSONB-like fields are pretty limited and you have to roll your own validation.

    缺点:您在类似JSONB的字段中的数据类型非常有限,您必须自己进行验证。

#3


1  

Yes, it is a pain to design database formats like this. I'm designing a notification system and reached the same problem. My notification system is however less complex than yours - the data it holds is at most ids and usernames. My current solution is a mix of 1 and 3 - I serialize data that is different from every notification, and use a column for the 2 usernames (some may have 2 or 1). I shy away from method 2 because I hate that design, but it's probably just me.

是的,设计这样的数据库格式是一件痛苦的事。我正在设计一个通知系统并遇到了同样的问题。然而,我的通知系统不像你的那么复杂 - 它拥有的数据最多是id和用户名。我目前的解决方案是1和3的混合 - 我序列化与每个通知不同的数据,并使用一个列用于2个用户名(一些可能有2或1)。我回避方法2,因为我讨厌那种设计,但它可能只是我。

However, if you can afford it, I would suggest thinking outside the realm of RDBMS - it sounds like Non-RDBMS (especially key/value storage ones) may be a better fit to store these data, especially if item 1 and item 2 differ from each item a lot.

但是,如果你能负担得起,我建议在RDBMS范围之外思考 - 听起来像非RDBMS(特别是键/值存储的)可能更适合存储这些数据,特别是如果第1项和第2项不同从每个项目很多。

#4


1  

I'm sure this has been asked here a million times before, but in addition to the options which you have discussed in your question, you can look at EAV schema which is very flexible, but which has its own sets of cons.

我确信之前已经有一百万次问过这个问题,但除了你在问题中讨论过的选项之外,你可以看看EAV架构,它非常灵活,但它有自己的缺点。

Another alternative is database systems which are not relational. There are object databases as well as various key/value stores and document databases.

另一种选择是不是关系的数据库系统。有对象数据库以及各种键/值存储和文档数据库。

Typically all these things break down to some extent when you need to query against the flexible attributes. This is kind of an intrinsic problem, however. Conceptually, what does it really mean to query things accurately which are unstructured?

通常,当您需要查询灵活属性时,所有这些事情都会在某种程度上分解。然而,这是一种内在问题。从概念上讲,准确地查询非结构化的东西真正意味着什么?

#5


1  

First of all, do you actually need the concurrency, scalability and ACID transactions of a real database? Unless you are building a MMO, your game structures will likely fit in memory anyway, so you can search and otherwise manipulate them there directly. In a scenario like this, the "database" is just a store for serialized objects, and you can replace it with the file system.

首先,您真的需要真实数据库的并发性,可伸缩性和ACID事务吗?除非您正在构建MMO,否则您的游戏结构无论如何都可能适合内存,因此您可以直接搜索并以其他方式操作它们。在这种情况下,“数据库”只是序列化对象的存储,您可以将其替换为文件系统。


If you conclude that you do (need a database), then the key is in figuring out what "atomicity" means from the perspective of the data management.

如果你得出结论(需要一个数据库),那么关键在于从数据管理的角度弄清楚“原子性”意味着什么。

For example, if a game item has a bunch of attributes, but none of these attributes are manipulated individually at the database level (even though they could well be at the application level), then it can be considered as "atomic" from the data management perspective. OTOH, if the item needs to be searched on some of these attributes, then you'll need a good way to index them in the database, which typically means they'll have to be separate fields.

例如,如果游戏项具有一堆属性,但这些属性都不是在数据库级别单独操作的(即使它们很可能处于应用程序级别),那么它可以被视为来自数据的“原子”管理视角。 OTOH,如果需要在某些属性上搜索项目,那么您需要一种好方法在数据库中对它们进行索引,这通常意味着它们必须是单独的字段。

Once you have identified attributes that should be "visible" versus the attributes that should be "invisible" from the database perspective, serialize the latter to BLOBs (or whatever), then forget about them and concentrate on structuring the former.

一旦你从数据库的角度确定了应该“可见”的属性与应该是“不可见”的属性,将后者序列化为BLOB(或其他),然后忘记它们并专注于构造前者。

That's where the fun starts and you'll probably need to use "all of the above" strategy for reasonable results.

这就是乐趣开始的地方,你可能需要使用“所有上述”策略来获得合理的结果。

BTW, some databases support "deep" indexes that can go into heterogeneous data structures. For example, take a look at Oracle's XMLIndex, though I doubt you'll use Oracle for a game.

顺便说一句,一些数据库支持可以进入异构数据结构的“深层”索引。例如,看看Oracle的XMLIndex,虽然我怀疑你会在游戏中使用Oracle。

#6


1  

You seem to be trying to solve this for a gaming context, so maybe you could consider a component-based approach. I have to say that I personally haven't tried this yet, but I've been looking into it for a while and it seems to me something similar could be applied.

您似乎正在尝试为游戏环境解决此问题,因此您可以考虑采用基于组件的方法。我不得不说我个人还没有尝试过这个,但我已经研究了一段时间,在我看来类似的东西可以应用。

The idea would be that all the entities in your game would basically be a bag of components. These components can be Position, Energy or for your inventory case, Collectable, for example. Then, for this Collectable component you can add custom fields such as category, numItems, etc.

我们的想法是游戏中的所有实体基本上都是一包组件。例如,这些组件可以是位置,能源或库存案例,可收集。然后,对于此Collectable组件,您可以添加自定义字段,例如category,numItems等。

When you're going to render the inventory, you can simply query your entity system for items that have the Collectable component.

当您要渲染清单时,您只需在实体系统中查询具有Collectable组件的项目。

How can you save this into a DB? You can define the components independently in their own table and then for the entities (each in their own table as well) you would add a "Components" column which would hold an array of IDs referencing these components. These IDs would effectively be like foreign keys, though I'm aware that this is not exactly how you can model things in relational databases, but you get the idea.

如何将其保存到数据库中?您可以在它们自己的表中单独定义组件,然后为实体(每个也在它们自己的表中)定义组件,您可以添加一个“组件”列,该列将包含引用这些组件的ID数组。这些ID实际上就像外键一样,但我知道这并不是你如何在关系数据库中建模,但你明白了。

Then, when you load the entities and their components at runtime, based on the component being loaded you can set the corresponding flag in their bag of components so that you know which components this entity has, and they'll then become queryable.

然后,当您在运行时加载实体及其组件时,基于正在加载的组件,您可以在其组件包中设置相应的标志,以便您知道该实体具有哪些组件,然后它们将变为可查询的。

Here's an interesting read about component-based entity systems.

这是关于基于组件的实体系统的有趣读物。