在关系数据库中建模地理位置

时间:2022-08-20 16:54:12

I am designing a contact management system and have come across an interesting issue regarding modeling geographic locations in a consistent way. I would like to be able to record locations associated with a particular person (mailing address(es) for work, school, home, etc.) My thought is to create a table of locales such as the following:

我正在设计一个联系人管理系统,并且遇到了一个有趣的问题,即以一致的方式建模地理位置。我希望能够记录与特定人员相关的位置(工作,学校,家庭等的邮寄地址)。我的想法是创建一个区域表,如下所示:

Locales (ID, LocationName, ParentID) where autonomous locations (such as countries, e.g. USA) are parents of themselves. This way I can have an arbitrarily deep nesting of 'political units' (COUNTRY > STATE > CITY or COUNTRY > STATE > CITY > UNIVERSITY). Some queries will necessarily involve recursion.

区域(ID,LocationName,ParentID),其中自治位置(例如国家,例如美国)是其自身的父母。通过这种方式,我可以任意深度嵌套“政治单位”(国家>州>城市或国家>州>城市>大学)。一些查询必然涉及递归。

I would appreciate any other recommendations or perhaps advice regarding predictable issues that I am likely to encounter with such a scheme.

对于我可能会遇到的可预测问题的任何其他建议或建议,我将不胜感激。

8 个解决方案

#1


5  

You might want to have a look at Freebase.com as a site that's had some open discussion about what a "location" means and what it means when a location is included in another. These sorts of questions can generate a lot of discussion.

您可能希望将Freebase.com视为一个网站,该网站对“位置”的含义以及位置包含在另一个位置时的含义进行了一些公开讨论。这些问题可以产生很多讨论。

For example, there is the obvious "geographic nesting", but there are less obvious logical nestings. For example, in a strictly geographic sense, Vatican City is nested within Italy. But it's not nested politically. Similarly, if your user is located in a research center that belongs to a university, but isn't located on the University's property, do you model that relationship or not?

例如,有明显的“地理嵌套”,但逻辑嵌套不太明显。例如,在严格的地理意义上,梵蒂冈城嵌套在意大利境内。但它并没有在政治上嵌套。同样,如果您的用户位于属于大学的研究中心,但不在大学的财产中,那么您是否建立了这种关系的模型?

#2


5  

Sounds like a good approach to me. The one thing that I'm not clear on when reading you post is what "parents of themselves" means - if this is to indicate that the locale does not have a parent, you're better off using null than the ID of itself.

听起来对我很好。在阅读帖子时我不清楚的一件事是“父母自己”的意思 - 如果这表明语言环境没有父母,那么你最好使用null而不是自己的ID。

#3


4  

I think you might be overthinking this. There's a reason most systems just store addresses and maybe a table of countries. Here are some things to look out for:

我想你可能会过度思考这一点。大多数系统只存储地址和国家表格是有原因的。以下是一些需要注意的事项:

  1. Would an address in the Bronx include the borough as a level in the hierarchy? Would an address in an unincorporated area eliminate the "city" level of the hierarchy? How do you model an address within a university vs an address that's not within one? You'll end up with a ragged hierarchy which will force you to traverse the tree every time you need to display an address in your application. If you have an "address book" page the performance hit could be significant.

    布朗克斯的地址是否会将自治市镇作为等级中的一个级别?非法人区域的地址是否会消除层级的“城市”级别?你如何模拟大学内的地址与不在一个地址内的地址?每次需要在应用程序中显示地址时,最终会出现一个粗糙的层次结构,这会强制您遍历树。如果您有一个“地址簿”页面,性能影响可能很大。

  2. I'm not sure that you even have just one hierarchy. Brown University has facilities in Providence, RI and Bristol, RI. The only clean solution would be to have a double hierarchy with two campuses that each belong to their respective cities in one hierarchy but that both belong to Brown University on the other hierarchy. (A university is fundamentally unlike a political region. You shouldn't really mix them.)

    我不确定你是否只有一个层次结构。布朗大学在罗德岛普罗维登斯和罗德岛布里斯托尔设有工厂。唯一干净的解决方案是拥有一个具有两个校园的双层次结构,每个校区属于一个层次结构中各自的城市,但两者都属于另一个层次结构中的布朗大学。 (一所大学基本上不同于一个政治区域。你不应该真的混在一起。)

  3. What about zip codes? Some zip codes encompass multiple towns, other times a city is broken into multiple zip codes. And (rarely) some zip codes even cross state lines. (According to Wikipedia, at least...)

    邮政编码怎么样?有些邮政编码包含多个城镇,有时城市被分为多个邮政编码。并且(很少)一些邮政编码甚至跨州线。 (根据*,至少......)

  4. How will you enter the data? Building out the database by parsing conventionally-formatted addresses can be difficult when you take into account vanity addresses, alternate names for certain streets, different international formats, etc. And I think that entering every address hierarchically would be a PITA.

    你将如何输入数据?当你考虑虚荣地址,某些街道的替代名称,不同的国际格式等时,通过解析传统格式的地址构建数据库可能很困难。我认为分层次地输入每个地址将是PITA。

  5. It sounds like you're trying to model the entire world in your application. Do you really want or need to maintain a table that could conceivable contain every city, state, province, postal code, and country in the world? (Or at least every one where you know somebody?) The only thing I can think of that this scheme would buy you is proximity, but if that's what you want I'd just store state and country separately (and maybe the zip code) and add latitude and longitude data from Google.

    听起来你正试图在你的应用程序中模拟整个世界。您是否真的想要或需要维护一个可以想象包含世界上所有城市,州,省,邮政编码和国家/地区的表格? (或者至少每一个你认识某人的人?)我唯一可以想到的是这个方案会给你带来的是接近,但如果这就是你想要的,我只会分别存储州和国家(也许是邮政编码)并添加来自Google的经度和纬度数据。

Sorry for the extreme pessimism, but I've gone down that road myself. It's logically beautiful and elegant, but it doesn't work so well in practice.

抱歉极度悲观,但我自己走了这条路。它在逻辑上美观而优雅,但在实践中效果不佳。

#4


3  

Here's a suggestion for a pretty flexible schema. An immediate warning: it could be too flexible/complex for what you actually need

这是一个非常灵活的架构的建议。立即警告:它可能过于灵活/复杂,无法满足您的实际需求

Location (LocationID, LocationName) -- Basic building block

位置(LocationID,LocationName) - 基本构建块

LocationGroup (LocationGroupID, LocationGroupName, ParentLocationGroupID) -- This can effective encapsulate multiple hierarchies. You have one root node and then you can create multiple independent branches. E.g. you can split by state first and then create several sub-hierarchies e.g. ZIP/city/xxxx

LocationGroup(LocationGroupID,LocationGroupName,ParentLocationGroupID) - 这可以有效地封装多个层次结构。您有一个根节点,然后您可以创建多个独立分支。例如。您可以先按州拆分,然后创建多个子层次结构,例如ZIP /城市/ XXXX

LocationGroupLocation (LocationID, LocationGroupID) -- Here's how you link Location with one or more hierarchies. E.g. you can link your house to a ZIP, as well as a City... What you need to implement is a constraint that you should not be able to link up a location with any two hierarchies where one of them is a parent of the other (as the relationship is already implicit).

LocationGroupLocation(LocationID,LocationGroupID) - 以下是将Location与一个或多个层次结构相关联的方法。例如。你可以将你的房子链接到一个ZIP,以及一个城市......你需要实现的是一个约束条件,你不应该将一个位置与任何两个层次结构连接起来,其中一个层次结构是另一个层次结构的父节点。 (因为这种关系已经隐含了)。

#5


2  

I would think carefully about this since it may not be a necessary feature. Why not just use a text field and let users type in an address?

我会仔细考虑这个,因为它可能不是必要的功能。为什么不使用文本字段让用户输入地址?

Remember the KISS principle (Keep It Simple, Stupid).

记住KISS原则(保持简单,愚蠢)。

#6


1  

I agree with the other posts that you need to be very careful here about your requirements. Location can become a tricky issue and this is why GIS systems are so complicted.

我同意其他帖子,你需要在这里非常小心你的要求。位置可能成为一个棘手的问题,这就是GIS系统如此复杂的原因。

If you are sure you just need a basic heirarchy structure, I have the following suggestions:

如果你确定你只需要一个基本的层次结构,我有以下建议:

  • I support the previous comment that root level items should not have themselves as the parent. Root level items should have a null value for the parent. Always be careful about putting data into a field that has no meaning (i.e. "special" value to represent no data). This practice is rarely necessarily and way overused in the devleoper community.
  • 我支持以前的评论,根级别的项目不应该将自己作为父级。根级别项目应具有父级的空值。始终要小心将数据放入没有意义的字段(即“特殊”值表示无数据)。在devleoper社区中,这种做法很少被必然和过度使用。

  • Consider XPath / XML. This is Something to consider for bother recording the heirarchy structure, and for processing / parsing the data at retrieval. If you are using MSSQL Server, the XPath expressions in select statements are perfect for tasks such as returning the full location/heirarchy path of a record as the code is simple and the results are fast.
  • 考虑XPath / XML。这是为了记录层次结构以及在检索时处理/解析数据而需要考虑的事项。如果您使用的是MSSQL Server,则select语句中的XPath表达式非常适合于返回记录的完整位置/层次路径等任务,因为代码很简单且结果很快。

#7


1  

For Geographic locations you may wish to resolve an address to a Latitude, Longitude array (perhaps using Google maps etc.) to calculate proximities etc.. For Geopolitical nesting ... I'd go with the KISS response.

对于地理位置,您可能希望将地址解析为纬度,经度阵列(可能使用Google地图等)来计算邻近度等。对于地缘政治嵌套......我会使用KISS响应。

If you really want to model it, perhaps you need the types to be more generic ... Country -> State -> County -> Borough -> Locality -> City -> Suburb -> Street or PO Box -> Number -> -> Appartment etc. -> Institution (University or Employer) -> Division -> Subdivision-1 -> subdivision-n ... Are you sure you can't do KISS?

如果你真的想要建模,也许你需要更通用的类型...国家 - >州 - >县 - >自治市镇 - >地区 - >城市 - >郊区 - >街道或邮政信箱 - >数量 - > - >公寓等 - >机构(大学或雇主) - >分部 - >细分-1 - >细分 - 你确定你不能做KISS吗?

#8


0  

I'm modeling an apps for global users and I have the same problems, but I think that this approach could already be in use in many enterprise. But why this problem don't have an universal solution? Or, has this problem one best solution that can be the start point or anybody in the world need think in a solution for it since beginnig? In IT, we are making the same things any times and in many places, unfortunately. For exemplo, who are not have made more than one user, customer or product's database? And the worst, all enterprise in the world has made it. I think that could have universal solutions for universal problems.

我正在为全球用户建模应用程序并且我遇到了同样的问题,但我认为这种方法已经在许多企业中使用。但为什么这个问题没有通用的解决方案呢?或者,这个问题是一个最好的解决方案,可以作为起点,或者世界上任何人都需要在解决方案中思考,因为beginnig?在IT部门,不幸的是,我们在任何时候和许多地方都在做同样的事情。例如,谁没有生成多个用户,客户或产品的数据库?最糟糕的是,世界上所有的企业都做到了。我认为这可以为普遍问题提供普遍的解决方案。

#1


5  

You might want to have a look at Freebase.com as a site that's had some open discussion about what a "location" means and what it means when a location is included in another. These sorts of questions can generate a lot of discussion.

您可能希望将Freebase.com视为一个网站,该网站对“位置”的含义以及位置包含在另一个位置时的含义进行了一些公开讨论。这些问题可以产生很多讨论。

For example, there is the obvious "geographic nesting", but there are less obvious logical nestings. For example, in a strictly geographic sense, Vatican City is nested within Italy. But it's not nested politically. Similarly, if your user is located in a research center that belongs to a university, but isn't located on the University's property, do you model that relationship or not?

例如,有明显的“地理嵌套”,但逻辑嵌套不太明显。例如,在严格的地理意义上,梵蒂冈城嵌套在意大利境内。但它并没有在政治上嵌套。同样,如果您的用户位于属于大学的研究中心,但不在大学的财产中,那么您是否建立了这种关系的模型?

#2


5  

Sounds like a good approach to me. The one thing that I'm not clear on when reading you post is what "parents of themselves" means - if this is to indicate that the locale does not have a parent, you're better off using null than the ID of itself.

听起来对我很好。在阅读帖子时我不清楚的一件事是“父母自己”的意思 - 如果这表明语言环境没有父母,那么你最好使用null而不是自己的ID。

#3


4  

I think you might be overthinking this. There's a reason most systems just store addresses and maybe a table of countries. Here are some things to look out for:

我想你可能会过度思考这一点。大多数系统只存储地址和国家表格是有原因的。以下是一些需要注意的事项:

  1. Would an address in the Bronx include the borough as a level in the hierarchy? Would an address in an unincorporated area eliminate the "city" level of the hierarchy? How do you model an address within a university vs an address that's not within one? You'll end up with a ragged hierarchy which will force you to traverse the tree every time you need to display an address in your application. If you have an "address book" page the performance hit could be significant.

    布朗克斯的地址是否会将自治市镇作为等级中的一个级别?非法人区域的地址是否会消除层级的“城市”级别?你如何模拟大学内的地址与不在一个地址内的地址?每次需要在应用程序中显示地址时,最终会出现一个粗糙的层次结构,这会强制您遍历树。如果您有一个“地址簿”页面,性能影响可能很大。

  2. I'm not sure that you even have just one hierarchy. Brown University has facilities in Providence, RI and Bristol, RI. The only clean solution would be to have a double hierarchy with two campuses that each belong to their respective cities in one hierarchy but that both belong to Brown University on the other hierarchy. (A university is fundamentally unlike a political region. You shouldn't really mix them.)

    我不确定你是否只有一个层次结构。布朗大学在罗德岛普罗维登斯和罗德岛布里斯托尔设有工厂。唯一干净的解决方案是拥有一个具有两个校园的双层次结构,每个校区属于一个层次结构中各自的城市,但两者都属于另一个层次结构中的布朗大学。 (一所大学基本上不同于一个政治区域。你不应该真的混在一起。)

  3. What about zip codes? Some zip codes encompass multiple towns, other times a city is broken into multiple zip codes. And (rarely) some zip codes even cross state lines. (According to Wikipedia, at least...)

    邮政编码怎么样?有些邮政编码包含多个城镇,有时城市被分为多个邮政编码。并且(很少)一些邮政编码甚至跨州线。 (根据*,至少......)

  4. How will you enter the data? Building out the database by parsing conventionally-formatted addresses can be difficult when you take into account vanity addresses, alternate names for certain streets, different international formats, etc. And I think that entering every address hierarchically would be a PITA.

    你将如何输入数据?当你考虑虚荣地址,某些街道的替代名称,不同的国际格式等时,通过解析传统格式的地址构建数据库可能很困难。我认为分层次地输入每个地址将是PITA。

  5. It sounds like you're trying to model the entire world in your application. Do you really want or need to maintain a table that could conceivable contain every city, state, province, postal code, and country in the world? (Or at least every one where you know somebody?) The only thing I can think of that this scheme would buy you is proximity, but if that's what you want I'd just store state and country separately (and maybe the zip code) and add latitude and longitude data from Google.

    听起来你正试图在你的应用程序中模拟整个世界。您是否真的想要或需要维护一个可以想象包含世界上所有城市,州,省,邮政编码和国家/地区的表格? (或者至少每一个你认识某人的人?)我唯一可以想到的是这个方案会给你带来的是接近,但如果这就是你想要的,我只会分别存储州和国家(也许是邮政编码)并添加来自Google的经度和纬度数据。

Sorry for the extreme pessimism, but I've gone down that road myself. It's logically beautiful and elegant, but it doesn't work so well in practice.

抱歉极度悲观,但我自己走了这条路。它在逻辑上美观而优雅,但在实践中效果不佳。

#4


3  

Here's a suggestion for a pretty flexible schema. An immediate warning: it could be too flexible/complex for what you actually need

这是一个非常灵活的架构的建议。立即警告:它可能过于灵活/复杂,无法满足您的实际需求

Location (LocationID, LocationName) -- Basic building block

位置(LocationID,LocationName) - 基本构建块

LocationGroup (LocationGroupID, LocationGroupName, ParentLocationGroupID) -- This can effective encapsulate multiple hierarchies. You have one root node and then you can create multiple independent branches. E.g. you can split by state first and then create several sub-hierarchies e.g. ZIP/city/xxxx

LocationGroup(LocationGroupID,LocationGroupName,ParentLocationGroupID) - 这可以有效地封装多个层次结构。您有一个根节点,然后您可以创建多个独立分支。例如。您可以先按州拆分,然后创建多个子层次结构,例如ZIP /城市/ XXXX

LocationGroupLocation (LocationID, LocationGroupID) -- Here's how you link Location with one or more hierarchies. E.g. you can link your house to a ZIP, as well as a City... What you need to implement is a constraint that you should not be able to link up a location with any two hierarchies where one of them is a parent of the other (as the relationship is already implicit).

LocationGroupLocation(LocationID,LocationGroupID) - 以下是将Location与一个或多个层次结构相关联的方法。例如。你可以将你的房子链接到一个ZIP,以及一个城市......你需要实现的是一个约束条件,你不应该将一个位置与任何两个层次结构连接起来,其中一个层次结构是另一个层次结构的父节点。 (因为这种关系已经隐含了)。

#5


2  

I would think carefully about this since it may not be a necessary feature. Why not just use a text field and let users type in an address?

我会仔细考虑这个,因为它可能不是必要的功能。为什么不使用文本字段让用户输入地址?

Remember the KISS principle (Keep It Simple, Stupid).

记住KISS原则(保持简单,愚蠢)。

#6


1  

I agree with the other posts that you need to be very careful here about your requirements. Location can become a tricky issue and this is why GIS systems are so complicted.

我同意其他帖子,你需要在这里非常小心你的要求。位置可能成为一个棘手的问题,这就是GIS系统如此复杂的原因。

If you are sure you just need a basic heirarchy structure, I have the following suggestions:

如果你确定你只需要一个基本的层次结构,我有以下建议:

  • I support the previous comment that root level items should not have themselves as the parent. Root level items should have a null value for the parent. Always be careful about putting data into a field that has no meaning (i.e. "special" value to represent no data). This practice is rarely necessarily and way overused in the devleoper community.
  • 我支持以前的评论,根级别的项目不应该将自己作为父级。根级别项目应具有父级的空值。始终要小心将数据放入没有意义的字段(即“特殊”值表示无数据)。在devleoper社区中,这种做法很少被必然和过度使用。

  • Consider XPath / XML. This is Something to consider for bother recording the heirarchy structure, and for processing / parsing the data at retrieval. If you are using MSSQL Server, the XPath expressions in select statements are perfect for tasks such as returning the full location/heirarchy path of a record as the code is simple and the results are fast.
  • 考虑XPath / XML。这是为了记录层次结构以及在检索时处理/解析数据而需要考虑的事项。如果您使用的是MSSQL Server,则select语句中的XPath表达式非常适合于返回记录的完整位置/层次路径等任务,因为代码很简单且结果很快。

#7


1  

For Geographic locations you may wish to resolve an address to a Latitude, Longitude array (perhaps using Google maps etc.) to calculate proximities etc.. For Geopolitical nesting ... I'd go with the KISS response.

对于地理位置,您可能希望将地址解析为纬度,经度阵列(可能使用Google地图等)来计算邻近度等。对于地缘政治嵌套......我会使用KISS响应。

If you really want to model it, perhaps you need the types to be more generic ... Country -> State -> County -> Borough -> Locality -> City -> Suburb -> Street or PO Box -> Number -> -> Appartment etc. -> Institution (University or Employer) -> Division -> Subdivision-1 -> subdivision-n ... Are you sure you can't do KISS?

如果你真的想要建模,也许你需要更通用的类型...国家 - >州 - >县 - >自治市镇 - >地区 - >城市 - >郊区 - >街道或邮政信箱 - >数量 - > - >公寓等 - >机构(大学或雇主) - >分部 - >细分-1 - >细分 - 你确定你不能做KISS吗?

#8


0  

I'm modeling an apps for global users and I have the same problems, but I think that this approach could already be in use in many enterprise. But why this problem don't have an universal solution? Or, has this problem one best solution that can be the start point or anybody in the world need think in a solution for it since beginnig? In IT, we are making the same things any times and in many places, unfortunately. For exemplo, who are not have made more than one user, customer or product's database? And the worst, all enterprise in the world has made it. I think that could have universal solutions for universal problems.

我正在为全球用户建模应用程序并且我遇到了同样的问题,但我认为这种方法已经在许多企业中使用。但为什么这个问题没有通用的解决方案呢?或者,这个问题是一个最好的解决方案,可以作为起点,或者世界上任何人都需要在解决方案中思考,因为beginnig?在IT部门,不幸的是,我们在任何时候和许多地方都在做同样的事情。例如,谁没有生成多个用户,客户或产品的数据库?最糟糕的是,世界上所有的企业都做到了。我认为这可以为普遍问题提供普遍的解决方案。