For a customer we where developing a big application that where open to all users if you will, meaning, all users could see each others data.
对于客户,我们正在开发一个大的应用程序,向所有用户开放,也就是说,所有用户都可以看到彼此的数据。
Now suddenly the customer is saying that they want only users belonging to the same organization to be able to view each others data.
现在,客户突然表示,他们只希望属于同一组织的用户能够查看彼此的数据。
So we came up with this data model:
所以我们提出了这个数据模型
So now the question is: How is it best to separate the data?
This is the only alternative I see:
现在的问题是:如何最好地分离数据?这是我看到的唯一选择:
- SQL JOIN on ALL relevant tables (All tables that have data should no always join on Organization) -- All queries should now add an extra join to Organization, and if the join doesn't exists, we need to create a new foreign key.
- 所有相关表上的SQL JOIN(所有具有数据的表在组织上不应该总是连接)——所有查询现在都应该向组织添加一个额外的连接,如果连接不存在,我们需要创建一个新的外键。
But I feel an extra join (We have around 20 tables that needs extra join) is quite costly.
I hope there are some other best practices or solutions we can consider.
但是我觉得额外的连接(我们有大约20个表需要额外的连接)是非常昂贵的。我希望我们可以考虑一些其他的最佳实践或解决方案。
PS: This is a Web application developed using Java/JSF/Seam (but I don't know if that is relevant)
PS:这是一个使用Java/JSF/Seam开发的Web应用程序(但我不知道这是否相关)
UPDATE
更新
I want to clarify something. My consurn is not security but performance. We have added the foreign key to organization to all relevant tables that has shared data, and we are using user's logged in organization to filter the data.
我想澄清一些事情。我的意见不是安全,而是性能。我们已经将外键添加到具有共享数据的所有相关表中,并使用用户登录的组织来过滤数据。
All I want to know is if this is a good architectural solution (inner join) or if we should do something else (ie: Load all shared data, and filter in memory instead of sql join).
我想知道的是,这是一个好的体系结构解决方案(内部连接),还是我们应该做其他事情(比如:加载所有共享数据,并在内存中过滤而不是sql连接)。
3 个解决方案
#1
4
You really have to understand the difference between the persistency layer and the application layer.
您必须真正理解持久性层和应用层之间的区别。
It doesn't matter how you define your database tables, as anyone with database access will have access to all the users data. What does matter is how you define the behavior in your application.
如何定义数据库表并不重要,因为任何具有数据库访问权限的人都可以访问所有用户数据。重要的是如何定义应用程序中的行为。
Changing the database design should only be done for performance reasons, not for security - which should be handled in the application.
更改数据库设计应该只出于性能考虑,而不是出于安全性考虑——应该在应用程序中进行处理。
#2
1
I would reckon that the best pattern would be to only expose the user details through the web application, so at that point its a case of restricting the data exposed to each user. This will allow you to build in the required security inside the application.
我认为最好的模式是通过web应用程序公开用户的详细信息,因此在这一点上,它将限制向每个用户公开的数据。这将允许您在应用程序内部构建所需的安全性。
Alternatively if you are allowing direct database access then you will need to create a login/user (depends on database used) for each organization or user and then restrict the access of these login/user entities to parameterized stored procedures rather than the base tables. This will push security back onto the database, which is riskier but still do-able.
另外,如果允许直接访问数据库,则需要为每个组织或用户创建一个登录/用户(取决于使用的数据库),然后将这些登录/用户实体的访问限制为参数化存储过程,而不是基本表。这将把安全性重新推回到数据库中,该数据库风险更大,但仍然可以实现。
As to meta changes to support the organization column, parameterizing the stored procedures will be fairly trivial:
对于支持组织列的元更改,对存储过程进行参数化将非常简单:
select @organizationId = organizationId from User where User.id = @currentUserId
select * from User where organizationId = @organizationId
(depending on the sql flavour you will need to enclose some entities eg ``User, [User] etc)
(根据sql风格的不同,您需要包含一些实体,如' ' ' User, [User]等)
#3
0
I see no reason that Organization has to be 'joined' at all.
我认为根本没有理由必须“加入”组织。
If your 'data' tables all have OrganizationID columns, then you can lookup the 'organizationID' from the user and then add this as a condition to the join.
如果“数据”表都有OrganizationID列,那么可以从用户查找“OrganizationID”,然后将其作为条件添加到join中。
EX:
例:
select @OrganizationId = organizationId from User where User.id = @currentUserId
select * from datatable a .... where .... AND a.organizationID = @organizationID
See; no join.
看到;没有加入。
With respect to performance, there are different types of joins, and SQLServer allows you to hint at the type of join. So in some cases, a merge join is the best, whereas in something like this scenario, a loop join would be the best. Not sure if these choices are available in MySQL.
关于性能,有不同类型的连接,SQLServer允许您提示连接的类型。在某些情况下,合并连接是最好的,而在这种情况下,循环连接是最好的。不确定这些选项在MySQL中是否可用。
With respect to all of your tables needing a join, or condition (see above), there is a logical answer, and an implementation answer. The implementation answer depends on your indexing. If you can limit the dataset the most by adding that condition, then you will benefit. But if the join with the other table that has already been filtered does a better job at reducing rows, then the condition will be worthless (or worst case, it will use the wrong index). Assuming you have indexes on your join and condition columns.
对于所有需要连接或条件的表(参见上面),有一个逻辑答案和一个实现答案。实现的答案取决于您的索引。如果您可以通过添加条件来最大限度地限制数据集,那么您将从中受益。但是,如果与已被过滤的另一个表的连接在减少行方面做得更好,那么条件将毫无价值(或者最坏的情况是,它将使用错误的索引)。假设在连接和条件列上有索引。
Logically, only data that isn't fully dependent on a table that is filtered by organizationID needs that extra condition. If you have a car table, and carparts table, then you only have to filter the car table. Unless for some reason you don't need to join with the car table for some joins, in which case you will need that organizationID on the parts table too.
逻辑上,只有不完全依赖于organizationID过滤的表的数据才需要这个额外条件。如果你有一个小车表和小车表,那么你只需要过滤小车表。除非出于某种原因,您不需要与car表进行某些连接,在这种情况下,您还需要在parts表上使用organizationID。
#1
4
You really have to understand the difference between the persistency layer and the application layer.
您必须真正理解持久性层和应用层之间的区别。
It doesn't matter how you define your database tables, as anyone with database access will have access to all the users data. What does matter is how you define the behavior in your application.
如何定义数据库表并不重要,因为任何具有数据库访问权限的人都可以访问所有用户数据。重要的是如何定义应用程序中的行为。
Changing the database design should only be done for performance reasons, not for security - which should be handled in the application.
更改数据库设计应该只出于性能考虑,而不是出于安全性考虑——应该在应用程序中进行处理。
#2
1
I would reckon that the best pattern would be to only expose the user details through the web application, so at that point its a case of restricting the data exposed to each user. This will allow you to build in the required security inside the application.
我认为最好的模式是通过web应用程序公开用户的详细信息,因此在这一点上,它将限制向每个用户公开的数据。这将允许您在应用程序内部构建所需的安全性。
Alternatively if you are allowing direct database access then you will need to create a login/user (depends on database used) for each organization or user and then restrict the access of these login/user entities to parameterized stored procedures rather than the base tables. This will push security back onto the database, which is riskier but still do-able.
另外,如果允许直接访问数据库,则需要为每个组织或用户创建一个登录/用户(取决于使用的数据库),然后将这些登录/用户实体的访问限制为参数化存储过程,而不是基本表。这将把安全性重新推回到数据库中,该数据库风险更大,但仍然可以实现。
As to meta changes to support the organization column, parameterizing the stored procedures will be fairly trivial:
对于支持组织列的元更改,对存储过程进行参数化将非常简单:
select @organizationId = organizationId from User where User.id = @currentUserId
select * from User where organizationId = @organizationId
(depending on the sql flavour you will need to enclose some entities eg ``User, [User] etc)
(根据sql风格的不同,您需要包含一些实体,如' ' ' User, [User]等)
#3
0
I see no reason that Organization has to be 'joined' at all.
我认为根本没有理由必须“加入”组织。
If your 'data' tables all have OrganizationID columns, then you can lookup the 'organizationID' from the user and then add this as a condition to the join.
如果“数据”表都有OrganizationID列,那么可以从用户查找“OrganizationID”,然后将其作为条件添加到join中。
EX:
例:
select @OrganizationId = organizationId from User where User.id = @currentUserId
select * from datatable a .... where .... AND a.organizationID = @organizationID
See; no join.
看到;没有加入。
With respect to performance, there are different types of joins, and SQLServer allows you to hint at the type of join. So in some cases, a merge join is the best, whereas in something like this scenario, a loop join would be the best. Not sure if these choices are available in MySQL.
关于性能,有不同类型的连接,SQLServer允许您提示连接的类型。在某些情况下,合并连接是最好的,而在这种情况下,循环连接是最好的。不确定这些选项在MySQL中是否可用。
With respect to all of your tables needing a join, or condition (see above), there is a logical answer, and an implementation answer. The implementation answer depends on your indexing. If you can limit the dataset the most by adding that condition, then you will benefit. But if the join with the other table that has already been filtered does a better job at reducing rows, then the condition will be worthless (or worst case, it will use the wrong index). Assuming you have indexes on your join and condition columns.
对于所有需要连接或条件的表(参见上面),有一个逻辑答案和一个实现答案。实现的答案取决于您的索引。如果您可以通过添加条件来最大限度地限制数据集,那么您将从中受益。但是,如果与已被过滤的另一个表的连接在减少行方面做得更好,那么条件将毫无价值(或者最坏的情况是,它将使用错误的索引)。假设在连接和条件列上有索引。
Logically, only data that isn't fully dependent on a table that is filtered by organizationID needs that extra condition. If you have a car table, and carparts table, then you only have to filter the car table. Unless for some reason you don't need to join with the car table for some joins, in which case you will need that organizationID on the parts table too.
逻辑上,只有不完全依赖于organizationID过滤的表的数据才需要这个额外条件。如果你有一个小车表和小车表,那么你只需要过滤小车表。除非出于某种原因,您不需要与car表进行某些连接,在这种情况下,您还需要在parts表上使用organizationID。