Overview
I'm working on some Emergency Services reporting and mapping application for California (kind of weird, considering the fires there, right now...). We need to map demographic and emergency data for an internal govt unit.
我正在为加利福尼亚州制作一些紧急服务报告和地图应用程序(有点奇怪,考虑到那里的火灾,现在......)。我们需要为内部*单位绘制人口统计和紧急数据。
What we have are all the streets, cities and neighborhoods in California. Each neighborhood also has it's relevant shapefile (lat long that defines it's boundaries). This was given to us by the US Census board (all public domain stuff) website.
我们所拥有的是加利福尼亚州的所有街道,城市和社区。每个邻域也有它的相关shapefile(lat long定义它的边界)。这是由美国人口普查局(所有公共领域的东西)网站提供给我们的。
Problem
I'm not sure how to best design the DB tables. We haven't been told what type of DB we need to use .. so we're open to suggestions if that helps. We have experience with MS SQL 2005 and 2008 (and the spatial stuff in '08).
我不确定如何最好地设计数据库表。我们还没有被告知我们需要使用哪种类型的数据库..所以如果有帮助,我们愿意接受建议。我们有MS SQL 2005和2008的经验(以及'08中的空间内容)。
We can have the following legit data.
我们可以拥有以下合法数据。
- Street, City, State
- City, State
- Neighborhood, State
- State
街道,城市,州
The reason why State is a legit location is because we're told this might be sold to other states, so we need to plan for that now.
国家是一个合法的位置的原因是因为我们被告知这可能被出售给其他州,所以我们现在需要为此做好计划。
So, originally, i thought of this...
所以,原来,我想到了......
- LocationId INTEGER PK Identity
- Street NVARCHAR(100)
- Neighbourhood NVARCHAR(100)
- City NVARCHAR(100)
- State NVARCHAR(100)
- Latitude VARCHAR(15)
- Longitude VARCHAR(15)
- Shapefile
LocationId INTEGER PK身份
None of those are nullable, btw. But after a short while, i thought that it was a waste to have so many 'California' text or 'San Diego' text in the fields. So i changed the table to be more normalised by making the Neighborhood, City and State fields a foreign key to their own new table (eg. lookups) .. and those two fields are now NULLABLE.
这些都不是可空的,顺便说一下。但过了一会儿,我认为在田野中放置这么多“加州”文字或“圣地亚哥”文字是浪费。所以我通过使邻域,城市和州字段成为他们自己的新表的外键(例如,查找)来更改表格以使其更加规范化。并且这两个字段现在是NULLABLE。
So .. that all works fine. except when i try and do some Sql statements on them. Because of the NULLABLE FK's, it's a nightmare to make all these outer join queries :(
所以......一切正常。除非我尝试对它们做一些Sql语句。由于NULLABLE FK,做出所有这些外连接查询是一场噩梦:(
What about having the main table, the sub-lookup tables (eg. Neighbourhoods, Cities and States) linked via ID's and then place all this in a view? Remember, NeighborhoodID and CitiyID would be NULLABLE.. ???
如何通过ID链接主表,子查找表(例如,邻域,城市和州),然后将所有这些放在视图中?请记住,NeighborhoodID和CitiyID将是NULLABLE .. ???
I just want to see people's thoughts on this and the reasons they made their suggestions, please. I'm really worried and confused but are eager to learn.
我只是想看看人们对此的看法以及他们提出建议的原因。我真的很担心和困惑,但渴望学习。
Please help!
edit 1: I need to stick to an RDBMS Database.
编辑1:我需要坚持使用RDBMS数据库。
edit 2: I'm thinking about going a single table (de-normalized) with constraints to keep the sum of the fields unqiue OR multi-tables with nullable FK's on the main table (eg. Locations (main table), Neighborhoods, Cities, States ... normalized db schema).
编辑2:我正在考虑使用约束来单个表(去规范化)以保持字段的总和unqiue或多表与主表上可以为空的FK(例如,位置(主表),邻域,城市,States ...规范化的数据库模式)。
edit 3: Added City to the sample, second list.
编辑3:向样本添加城市,第二个列表。
edit 4: Added view question.
编辑4:添加了查看问题。
5 个解决方案
#1
4
Taking the example:
举个例子:
- Street, City, State
- City, State
- Neighborhood, State
- State
街道,城市,州
Firstly go back to basic principles, all of the above are distinct geospatial entities, so your address is composed of a name, and one or many geospatial specifiers. This tells us that we really should be storing them in a single table. The key here is to think of the data more abstractly,
首先回到基本原则,以上所有都是不同的地理空间实体,因此您的地址由一个名称和一个或多个地理空间说明符组成。这告诉我们,我们真的应该将它们存储在一个表中。这里的关键是更抽象地考虑数据,
So your address table needs a 1-many relationship to another table, called address_entities which is as follows:
因此,您的地址表需要与另一个表的1-many关系,称为address_entities,如下所示:
- int ID
- varchar() name
- varchar() type
- int parentID
- geography position.
- int parentID
This means that you will obviously need a table to link the address to the address entity table above. Now, each geospatial entity is inherently hierarchical, and whilst it makes the SQL harder, and personally I try to avoid self referencing tables there are times when it is a good solution and this is one of them.
这意味着您显然需要一个表来将地址链接到上面的地址实体表。现在,每个地理空间实体本质上都是分层的,虽然它使SQL更难,但我个人试图避免自引用表,有时它是一个很好的解决方案而且这是其中之一。
The benefits are huge, even though it makes the code harder, it is worth it in the long run.
好处是巨大的,即使它使代码更难,从长远来看它是值得的。
Also, even when it isn't an immediate requirement, think globally, not all addresses in the world have a street, or state, for example,in france a valid address could be
此外,即使不是一个直接的要求,全球思考,并非世界上的所有地址都有街道或州,例如,在法国,有效的地址可能是
- la Maison des Fou
- 24500 Eymet
So, bear that in mind when designing schemas.
因此,在设计模式时要牢记这一点。
#2
2
As @Oddthinking noted in a comment, your problems started at:
正如@Oddthinking在评论中指出的那样,您的问题始于:
So I changed the table to be more normalised by making the Neighborhood, City and State fields a foreign key to their own new table (eg. lookups) .. and those two fields are now NULLABLE.
因此,我通过使邻域,城市和州字段成为他们自己的新表的外键(例如,查找)来更改表格以使其更加规范化。并且这两个字段现在是NULLABLE。
So .. that all works fine. except when I try and do some SQL statements on them. Because of the NULLABLE FK's, it's a nightmare to make all these outer join queries.
所以......一切正常。除非我尝试对它们做一些SQL语句。由于NULLABLE FK,所有这些外连接查询都是一场噩梦。
This reminds me of the "Doctor, doctor, it hurts when I hit myself like this" joke.
这让我想起了“医生,医生,当我像这样开玩笑时,这很痛”。
Why exactly did you make the foreign key fields nullable? They were mandatory before, so you should keep them as mandatory, precisely to avoid the nightmares of outer join queries.
为什么要使外键字段可以为空?它们之前是强制性的,所以你应该将它们保持为强制性,正是为了避免外连接查询的噩梦。
Your explanation (question) is somewhat confusing in that you list three fields (Neighborhood, City and State) and then say "those two fields are now nullable". Which two are? And why? And what is in the lookup table? Or is there more than one lookup table? There might be an argument for some sort of NeighbourhoodID number which is a foreign key to a Neighbourhood table, which defines the City and State as well as Neighbourhood name. You might then decide that there is a closed list of cities and the cities have an ID number too, and that number determines the state too. You are probably as well off using a two-letter state code as creating a (probably 4-byte) state ID number. However, do not forget that a check criterion that ensures that the state code is one of the 50 or so valid state codes is harder to write than a foreign key that references a table of states. Since neither states nor cities changes very often, I'd probably use the table of states with a foreign key - but the key column would be the state code.
您的解释(问题)有些令人困惑,因为您列出了三个字段(邻域,城市和州),然后说“这两个字段现在可以为空”。哪两个是?为什么?查找表中的内容是什么?或者是否有多个查找表?某种NeighbourhoodID号码可能存在争议,该号码是邻居表的外键,它定义了城市和州以及邻居名称。然后,您可能会确定存在已关闭的城市列表,并且城市也具有ID号,该数字也决定了州。您可能还需要使用双字母状态代码来创建(可能是4字节)状态ID号。但是,不要忘记确保状态代码是50个有效状态代码之一的检查标准比引用状态表的外键更难编写。由于州和城市都没有经常变化,我可能会使用具有外键的状态表 - 但关键列将是州代码。
That means you might have a table of Neighbourhoods with columns NeighbourhoodID, Name, CityID; a table of Cities with columns CityID, Name, State; and a table of States with columns State and Name. You can add other columns as you see fit. And your primary table would contain a NeighbourhoodID column that is a foreign key to Neighbourhoods table.
这意味着您可能有一个邻居表,其中包含NeighbourhoodID,Name,CityID列;包含CityID,Name,State列的城市表;以及具有列状态和名称的列的状态表。您可以根据需要添加其他列。并且您的主表将包含NeighbourhoodID列,该列是Neighborhoods表的外键。
#3
1
This is a nice place to start. A whole #$(#$-load of database schemas to check out:
这是一个不错的起点。整个#$(#$ - 要检查的数据库模式的加载:
#4
1
This a problem I've had to deal with and RDBMS systems aren't the best at storing hierarchical data. You might want to look at using an object database since these have to deal with nested objects and are optimized for the problem.
这是我必须处理的问题,而RDBMS系统并不是存储分层数据的最佳选择。您可能希望查看使用对象数据库,因为它们必须处理嵌套对象并针对该问题进行了优化。
If you need to use an RDBMS, you may have to stick with a de-normalized schema though. Having separate tables to maintain your cities, streets etc may be handy for tracking changes though. If a city or street needs to be renamed, you can update the master record in the respective table and schedule a job to update a text copy of the string in your 'main' table. This will prevent you from having to run updates on 10's 100's of thousands of rows during prime time, but still lets you store the most up-to-data data in the db. Of course, this makes the data duplication situation even worse, but it's the price to pay for performance.
如果您需要使用RDBMS,您可能必须坚持使用非规范化架构。有单独的表来维护你的城市,街道等可能很方便跟踪变化。如果需要重命名城市或街道,您可以更新相应表格中的主记录,并安排作业更新“主”表格中字符串的文本副本。这将阻止您在黄金时段内必须在10个100的数千行中运行更新,但仍允许您在数据库中存储最多的数据。当然,这会使数据重复情况更糟,但这是为性能付出的代价。
#5
0
Is this an OLTP system and reporting system or only a reporting system? If it's only a reporting system, you can denormalize the data in a data warehouse fashion (with snowflake dimensions or not for the hierachies of geographic jurisdictions) and you'll find the reporting to be easier.
这是OLTP系统和报告系统还是仅报告系统?如果它只是一个报告系统,您可以以数据仓库的方式对数据进行非规范化(对于地理管辖区的层次结构是否具有雪花维度),您会发现报告更容易。
I would start from the results and work back, because it sounds to me like you are getting fed the data and you are trying to bring it into a database to support the reporting and mapping. In this case, the database schema being a traditional normalized system is not important because redundancy in the data is not something that will cause maintenance problems for users, etc.
我会从结果开始并重新开始工作,因为听起来像是你正在获取数据并且你正试图将它带入数据库以支持报告和映射。在这种情况下,作为传统规范化系统的数据库模式并不重要,因为数据中的冗余不会导致用户的维护问题等。
If this seems appropriate, you want to look into the Kimball books.
如果这看起来合适,你想看一下Kimball的书。
#1
4
Taking the example:
举个例子:
- Street, City, State
- City, State
- Neighborhood, State
- State
街道,城市,州
Firstly go back to basic principles, all of the above are distinct geospatial entities, so your address is composed of a name, and one or many geospatial specifiers. This tells us that we really should be storing them in a single table. The key here is to think of the data more abstractly,
首先回到基本原则,以上所有都是不同的地理空间实体,因此您的地址由一个名称和一个或多个地理空间说明符组成。这告诉我们,我们真的应该将它们存储在一个表中。这里的关键是更抽象地考虑数据,
So your address table needs a 1-many relationship to another table, called address_entities which is as follows:
因此,您的地址表需要与另一个表的1-many关系,称为address_entities,如下所示:
- int ID
- varchar() name
- varchar() type
- int parentID
- geography position.
- int parentID
This means that you will obviously need a table to link the address to the address entity table above. Now, each geospatial entity is inherently hierarchical, and whilst it makes the SQL harder, and personally I try to avoid self referencing tables there are times when it is a good solution and this is one of them.
这意味着您显然需要一个表来将地址链接到上面的地址实体表。现在,每个地理空间实体本质上都是分层的,虽然它使SQL更难,但我个人试图避免自引用表,有时它是一个很好的解决方案而且这是其中之一。
The benefits are huge, even though it makes the code harder, it is worth it in the long run.
好处是巨大的,即使它使代码更难,从长远来看它是值得的。
Also, even when it isn't an immediate requirement, think globally, not all addresses in the world have a street, or state, for example,in france a valid address could be
此外,即使不是一个直接的要求,全球思考,并非世界上的所有地址都有街道或州,例如,在法国,有效的地址可能是
- la Maison des Fou
- 24500 Eymet
So, bear that in mind when designing schemas.
因此,在设计模式时要牢记这一点。
#2
2
As @Oddthinking noted in a comment, your problems started at:
正如@Oddthinking在评论中指出的那样,您的问题始于:
So I changed the table to be more normalised by making the Neighborhood, City and State fields a foreign key to their own new table (eg. lookups) .. and those two fields are now NULLABLE.
因此,我通过使邻域,城市和州字段成为他们自己的新表的外键(例如,查找)来更改表格以使其更加规范化。并且这两个字段现在是NULLABLE。
So .. that all works fine. except when I try and do some SQL statements on them. Because of the NULLABLE FK's, it's a nightmare to make all these outer join queries.
所以......一切正常。除非我尝试对它们做一些SQL语句。由于NULLABLE FK,所有这些外连接查询都是一场噩梦。
This reminds me of the "Doctor, doctor, it hurts when I hit myself like this" joke.
这让我想起了“医生,医生,当我像这样开玩笑时,这很痛”。
Why exactly did you make the foreign key fields nullable? They were mandatory before, so you should keep them as mandatory, precisely to avoid the nightmares of outer join queries.
为什么要使外键字段可以为空?它们之前是强制性的,所以你应该将它们保持为强制性,正是为了避免外连接查询的噩梦。
Your explanation (question) is somewhat confusing in that you list three fields (Neighborhood, City and State) and then say "those two fields are now nullable". Which two are? And why? And what is in the lookup table? Or is there more than one lookup table? There might be an argument for some sort of NeighbourhoodID number which is a foreign key to a Neighbourhood table, which defines the City and State as well as Neighbourhood name. You might then decide that there is a closed list of cities and the cities have an ID number too, and that number determines the state too. You are probably as well off using a two-letter state code as creating a (probably 4-byte) state ID number. However, do not forget that a check criterion that ensures that the state code is one of the 50 or so valid state codes is harder to write than a foreign key that references a table of states. Since neither states nor cities changes very often, I'd probably use the table of states with a foreign key - but the key column would be the state code.
您的解释(问题)有些令人困惑,因为您列出了三个字段(邻域,城市和州),然后说“这两个字段现在可以为空”。哪两个是?为什么?查找表中的内容是什么?或者是否有多个查找表?某种NeighbourhoodID号码可能存在争议,该号码是邻居表的外键,它定义了城市和州以及邻居名称。然后,您可能会确定存在已关闭的城市列表,并且城市也具有ID号,该数字也决定了州。您可能还需要使用双字母状态代码来创建(可能是4字节)状态ID号。但是,不要忘记确保状态代码是50个有效状态代码之一的检查标准比引用状态表的外键更难编写。由于州和城市都没有经常变化,我可能会使用具有外键的状态表 - 但关键列将是州代码。
That means you might have a table of Neighbourhoods with columns NeighbourhoodID, Name, CityID; a table of Cities with columns CityID, Name, State; and a table of States with columns State and Name. You can add other columns as you see fit. And your primary table would contain a NeighbourhoodID column that is a foreign key to Neighbourhoods table.
这意味着您可能有一个邻居表,其中包含NeighbourhoodID,Name,CityID列;包含CityID,Name,State列的城市表;以及具有列状态和名称的列的状态表。您可以根据需要添加其他列。并且您的主表将包含NeighbourhoodID列,该列是Neighborhoods表的外键。
#3
1
This is a nice place to start. A whole #$(#$-load of database schemas to check out:
这是一个不错的起点。整个#$(#$ - 要检查的数据库模式的加载:
#4
1
This a problem I've had to deal with and RDBMS systems aren't the best at storing hierarchical data. You might want to look at using an object database since these have to deal with nested objects and are optimized for the problem.
这是我必须处理的问题,而RDBMS系统并不是存储分层数据的最佳选择。您可能希望查看使用对象数据库,因为它们必须处理嵌套对象并针对该问题进行了优化。
If you need to use an RDBMS, you may have to stick with a de-normalized schema though. Having separate tables to maintain your cities, streets etc may be handy for tracking changes though. If a city or street needs to be renamed, you can update the master record in the respective table and schedule a job to update a text copy of the string in your 'main' table. This will prevent you from having to run updates on 10's 100's of thousands of rows during prime time, but still lets you store the most up-to-data data in the db. Of course, this makes the data duplication situation even worse, but it's the price to pay for performance.
如果您需要使用RDBMS,您可能必须坚持使用非规范化架构。有单独的表来维护你的城市,街道等可能很方便跟踪变化。如果需要重命名城市或街道,您可以更新相应表格中的主记录,并安排作业更新“主”表格中字符串的文本副本。这将阻止您在黄金时段内必须在10个100的数千行中运行更新,但仍允许您在数据库中存储最多的数据。当然,这会使数据重复情况更糟,但这是为性能付出的代价。
#5
0
Is this an OLTP system and reporting system or only a reporting system? If it's only a reporting system, you can denormalize the data in a data warehouse fashion (with snowflake dimensions or not for the hierachies of geographic jurisdictions) and you'll find the reporting to be easier.
这是OLTP系统和报告系统还是仅报告系统?如果它只是一个报告系统,您可以以数据仓库的方式对数据进行非规范化(对于地理管辖区的层次结构是否具有雪花维度),您会发现报告更容易。
I would start from the results and work back, because it sounds to me like you are getting fed the data and you are trying to bring it into a database to support the reporting and mapping. In this case, the database schema being a traditional normalized system is not important because redundancy in the data is not something that will cause maintenance problems for users, etc.
我会从结果开始并重新开始工作,因为听起来像是你正在获取数据并且你正试图将它带入数据库以支持报告和映射。在这种情况下,作为传统规范化系统的数据库模式并不重要,因为数据中的冗余不会导致用户的维护问题等。
If this seems appropriate, you want to look into the Kimball books.
如果这看起来合适,你想看一下Kimball的书。