I am wondering if there is some sort of "standard" for storing US addresses in a database? It seems this is a common task, and there should be some sort of a standard.
我想知道是否有某种“标准”将我们的地址存储在数据库中?这似乎是一个共同的任务,应该有某种标准。
What I am looking for is a specific schema of how the database tables should work and interact, already in third normal form, including data types (MySQL). A good UML document would work.
我要寻找的是数据库表应该如何工作和交互的特定模式,它已经是第三种范式,包括数据类型(MySQL)。一个好的UML文档可以工作。
Maybe I'm just being lazy, but this is a very common task, and I am sure someone has published an efficient way to do this somewhere. I just don't know where to look and Google isn't helping. Please point me to the resource. Thanks.
也许我只是在偷懒,但这是一项非常常见的任务,我确信有人已经在某个地方发布了一种高效的方法来完成这项任务。我只是不知道去哪里看,谷歌没有帮助。请指给我资源。谢谢。
EDIT
编辑
Although this is more of a general question, I would like to clarify my specific needs.
虽然这是一个比较笼统的问题,但我想澄清一下我的具体需求。
Addresses will be used to specify road addresses of locations of events. These addresses will need to be in a format that can be best broken down and searched, and also used by any third-party applications I may end up linking my data source to.
地址将用于指定事件位置的道路地址。这些地址将需要以一种可以最好地分解和搜索的格式,也可以被任何第三方应用程序使用,我可能最终将数据源链接到这些应用程序。
ALSO. Data will be geo-coded (long, lat) on entry and stored separately, so it must fit the (yet undecided) protocol of whatever geocoder / application / library does that.
也。数据将在输入时进行地理编码(long, lat),并分别存储,因此它必须符合(但尚未确定的)地理编码器/应用程序/库的协议。
6 个解决方案
#1
12
http://www.upu.int has the format standards for international addresses. Publication 28 at http://usps.com has the U.S. format standards.
http://www.upu.int有国际地址的格式标准。发表于http://usps.com的28有美国格式标准。
The USPS wants the following unpunctuated address components concatenated on a single line:
USPS希望将以下未加标点的地址组件连接到一行:
* house number
* predirectional (N, SE, etc)
* street
* suffix (AVE, BLVD, etc)
* postdirectional (SW, E, etc)
* unit (APT, STE, etc)
* apartment/suite number
Eg, 102 N MAIN ST SE APT B.
主要的,主要的
If you keep the entire address line as a single field in your database, input and editing is easy, but searches can be more difficult (eg, in the case SOUTH EAST LANE is the street EAST as in S EAST LN or is it LANE as in SE LANE ST?).
如果您将整个地址线作为数据库中的一个字段保存,那么输入和编辑是很容易的,但是搜索可能会更困难(例如,在南东巷是在南东巷,在南东巷是在南东巷,在南东巷还是在南巷街?)
If you keep the address parsed into separate fields, searches for components like street name or apartments become easier, but you have to append everything together for output, you need CASS software to parse correctly, and PO boxes, rural route addresses, and APO/FPO addresses have special parsings.
如果将解析后的地址保存到单独的字段中,搜索街道名或公寓等组件将变得更容易,但您必须将所有内容添加到一起以进行输出,您需要使用CASS软件进行正确解析,而PO box、乡村路径地址和APO/FPO地址具有特殊解析。
A physical location with multiple addresses at that location is either a multiunit building, in which case letters/numbers after units like APT and STE designate the address, or it's a Commercial Mail Receiving Agency (eg, UPS store) and a maildrop/private mailbox number is appended (like 100 MAIN ST STE B PMB 102), or it's a business with one USPS delivery point and mail is routed after USPS delivery (which usually requires a separate mailstop field which the company might need but the USPS won't want on the address line).
物理位置有多个地址在这个位置是一个复合的建筑,在这种情况下字母/数字后单位APT和STE指定地址,或它是一个商业邮件接收机构(如UPS)和maildrop /私人邮箱号码是附加(如100主圣STE B PMB 102),或者是有一个USPS投递点的业务,邮件在USPS投递后被路由(通常需要一个单独的mailstop字段,公司可能需要这个字段,但是USPS不希望在地址行上)。
A contact with more than one physical address is usually a business or person with a street address and a PO box. Note that it's common for each address to have a different ZIP code.
与多个物理地址的联系人通常是拥有街道地址和邮政信箱的企业或个人。注意,每个地址都有不同的邮政编码。
It's quite typical that one business transaction might have a shipping address and a billing address (again, with different ZIP codes). The information I keep for EACH address is:
一个业务事务可能有一个送货地址和一个账单地址(同样,邮政编码不同),这是很典型的。我为每个地址保存的信息是:
* name prefix (DR, MS, etc)
* first name and initial
* last name
* name suffix (III, PHD, etc)
* mail stop
* company name
* address (one line only per Pub 28 for USA)
* city
* state/province
* ZIP/postal code
* country
I typically print mail stops somewhere between the person's name and company because the country contains the state/ZIP which contains the city which contains the address which contains the company which contains the mail stop which contains the person. I use CASS software to validate and standardize addresses when entered or edited.
我通常会在收件人的名字和公司之间的某个地方打印邮件停止,因为国家包含州/邮政编码,其中包含包含包含公司地址的城市,包含包含包含收件人的邮件停止。在输入或编辑地址时,我使用CASS软件来验证和标准化地址。
#2
3
First, as a person who spend most of there professional day working with addresses, they are hard to manage from a data perspective.
首先,作为一个每天大部分时间都在处理地址的人,从数据的角度来看,这些地址很难管理。
If you ask 5 people what address they live at; you will find that you get 5 different answers. While you and I can tell that 123 Main Street Apt 1 and Apt 1 123 Main Street are the same address, the database program will have a challenge.
如果你问5个人他们住在哪里;你会发现你会得到5个不同的答案。虽然您和我都知道123 Main Street Apt 1和Apt 1,123 Main Street是同一个地址,但是数据库程序将面临挑战。
If you are using United States centric addresses CASS certified software from almost any vendor will standardize your addresses reasonably well. I would recommend a simple format as follows:
如果您正在使用以美国为中心的地址卡斯认证软件,几乎任何供应商都会将您的地址标准化得相当好。我建议一种简单的格式如下:
- Address 1
- 地址1
- Address 2
- 地址2
- Address 3
- 地址3
- City
- 城市
- State
- 状态
- Zip
- 邮政编码
- Zip+4 (I would carry this so lookups are easier when checking for duplicates)
- Zip+4(我将带这个,这样在检查副本时查找更容易)
However, if you want a universal address I would look at the ADIS standard from IdeaAlliance. This standard can be used to breakdown (parse) addresses from almost any country into the relevant parts. Then they can be put back together using templates/components based on the Universal Postal Union standards (UPU S42 Standard on International Postal Address Components and Templates).
然而,如果你想要一个通用地址,我可以参考IdeaAlliance的ADIS标准。这个标准可以用来将几乎任何国家的地址分解成相关的部分。然后可以使用基于万国邮政联盟标准(万国邮政联盟国际邮政地址组件和模板的万国邮政联盟S42标准)的模板/组件将它们重新组合在一起。
The big plus of this format is that addresses that dont exist in a postal database like CASS can be entered and stored as separate parts.
这种格式的最大优点是,在像CASS这样的邮政数据库中不存在的地址可以作为独立的部分输入和存储。
#3
2
Very similar questions have been asked before.
以前也有人问过类似的问题。
Addresses are messy - at best.
地址是混乱的——充其量是混乱的。
It partly depends on what you want to do with the addresses. If you're going to use them to mail thing to people, then you simply need to record the image that will appear on the address label in a convenient form. If you're going to analyze the address, you have to work a lot harder.
这在一定程度上取决于您希望如何处理这些地址。如果要使用它们向人们邮寄东西,那么只需以方便的形式记录地址标签上出现的图像。如果你要分析地址,你必须更加努力。
Remember that the first time you have to deal with someone outside the US, all previous rules go astray. You may be strictly US-only, but beware.
记住,当你第一次不得不与美国以外的人打交道时,所有先前的规则都会被打乱。你可能只属于我们,但要小心。
#4
1
I looked into this a while ago, but for international addresses. I didn't find much in the way of a consensus. However, for the US, I found the succinctly named United States Thoroughfare, Landmark, and Postal Address Data Standard (Draft):
我不久前研究过这个问题,但只是为了国际地址。我并没有找到多少共识。然而,对于美国来说,我发现了简洁命名的美国大道、地标和邮政地址数据标准(草案):
http://www.fgdc.gov/standards/projects/FGDC-standards-projects/street-address/index_html
http://www.fgdc.gov/standards/projects/FGDC-standards-projects/street-address/index_html
I don't think that they actually provide any specific database schema ideas, but it might be a good starting point.
我不认为它们实际上提供了任何特定的数据库模式思想,但这可能是一个很好的起点。
#5
1
First, the "best" means of storing an address depends greatly on how it will be used. Is it just for reference or searches on say city? Do you plan on addressing envelopes? Are you going to integrate with a shipping system like FedEx or UPS? Will you store non-US addresses? Once you get into the realm of integrating with something that ships, you should start looking at CASS. This is a specification for handling the USPS addresses. There are applications out there that are CASS certified which will store and verify addresses. Thus, the second best practice would be to try to avoid reinventing the wheel and see if there is a system out there that will solve your problem especially if you are going to go international. You want to leverage the fact that someone else has worked out all the details about how to properly and efficiently store addresses for many countries around the world instead of having to do that investigation yourself.
首先,存储地址的“最佳”方法在很大程度上取决于如何使用它。它仅仅是一个城市的参考或搜索吗?你打算把地址写在信封上吗?你打算和联邦快递或UPS这样的运输系统整合吗?你会储存非美国的地址吗?一旦你进入到与舰船集成的领域,你应该开始关注卡斯。这是一个处理USPS地址的规范。有一些应用程序是经过CASS认证的,它们将存储和验证地址。因此,第二种最佳实践是尽量避免重新发明*,看看是否有一个系统可以解决你的问题,特别是如果你要走向国际化。你想利用其他人已经为世界上许多国家找到了正确、有效地存储地址的所有细节的事实,而不是自己去做调查。
#6
1
I've had to try to do this before and I'd found this document that gives you some pointers. I ended up shelving my schema since my application does have to deal with international addresses.
我之前试过这么做,我找到了这个文档,它给了你一些指针。由于我的应用程序必须处理国际地址,所以我最终搁置了我的模式。
#1
12
http://www.upu.int has the format standards for international addresses. Publication 28 at http://usps.com has the U.S. format standards.
http://www.upu.int有国际地址的格式标准。发表于http://usps.com的28有美国格式标准。
The USPS wants the following unpunctuated address components concatenated on a single line:
USPS希望将以下未加标点的地址组件连接到一行:
* house number
* predirectional (N, SE, etc)
* street
* suffix (AVE, BLVD, etc)
* postdirectional (SW, E, etc)
* unit (APT, STE, etc)
* apartment/suite number
Eg, 102 N MAIN ST SE APT B.
主要的,主要的
If you keep the entire address line as a single field in your database, input and editing is easy, but searches can be more difficult (eg, in the case SOUTH EAST LANE is the street EAST as in S EAST LN or is it LANE as in SE LANE ST?).
如果您将整个地址线作为数据库中的一个字段保存,那么输入和编辑是很容易的,但是搜索可能会更困难(例如,在南东巷是在南东巷,在南东巷是在南东巷,在南东巷还是在南巷街?)
If you keep the address parsed into separate fields, searches for components like street name or apartments become easier, but you have to append everything together for output, you need CASS software to parse correctly, and PO boxes, rural route addresses, and APO/FPO addresses have special parsings.
如果将解析后的地址保存到单独的字段中,搜索街道名或公寓等组件将变得更容易,但您必须将所有内容添加到一起以进行输出,您需要使用CASS软件进行正确解析,而PO box、乡村路径地址和APO/FPO地址具有特殊解析。
A physical location with multiple addresses at that location is either a multiunit building, in which case letters/numbers after units like APT and STE designate the address, or it's a Commercial Mail Receiving Agency (eg, UPS store) and a maildrop/private mailbox number is appended (like 100 MAIN ST STE B PMB 102), or it's a business with one USPS delivery point and mail is routed after USPS delivery (which usually requires a separate mailstop field which the company might need but the USPS won't want on the address line).
物理位置有多个地址在这个位置是一个复合的建筑,在这种情况下字母/数字后单位APT和STE指定地址,或它是一个商业邮件接收机构(如UPS)和maildrop /私人邮箱号码是附加(如100主圣STE B PMB 102),或者是有一个USPS投递点的业务,邮件在USPS投递后被路由(通常需要一个单独的mailstop字段,公司可能需要这个字段,但是USPS不希望在地址行上)。
A contact with more than one physical address is usually a business or person with a street address and a PO box. Note that it's common for each address to have a different ZIP code.
与多个物理地址的联系人通常是拥有街道地址和邮政信箱的企业或个人。注意,每个地址都有不同的邮政编码。
It's quite typical that one business transaction might have a shipping address and a billing address (again, with different ZIP codes). The information I keep for EACH address is:
一个业务事务可能有一个送货地址和一个账单地址(同样,邮政编码不同),这是很典型的。我为每个地址保存的信息是:
* name prefix (DR, MS, etc)
* first name and initial
* last name
* name suffix (III, PHD, etc)
* mail stop
* company name
* address (one line only per Pub 28 for USA)
* city
* state/province
* ZIP/postal code
* country
I typically print mail stops somewhere between the person's name and company because the country contains the state/ZIP which contains the city which contains the address which contains the company which contains the mail stop which contains the person. I use CASS software to validate and standardize addresses when entered or edited.
我通常会在收件人的名字和公司之间的某个地方打印邮件停止,因为国家包含州/邮政编码,其中包含包含包含公司地址的城市,包含包含包含收件人的邮件停止。在输入或编辑地址时,我使用CASS软件来验证和标准化地址。
#2
3
First, as a person who spend most of there professional day working with addresses, they are hard to manage from a data perspective.
首先,作为一个每天大部分时间都在处理地址的人,从数据的角度来看,这些地址很难管理。
If you ask 5 people what address they live at; you will find that you get 5 different answers. While you and I can tell that 123 Main Street Apt 1 and Apt 1 123 Main Street are the same address, the database program will have a challenge.
如果你问5个人他们住在哪里;你会发现你会得到5个不同的答案。虽然您和我都知道123 Main Street Apt 1和Apt 1,123 Main Street是同一个地址,但是数据库程序将面临挑战。
If you are using United States centric addresses CASS certified software from almost any vendor will standardize your addresses reasonably well. I would recommend a simple format as follows:
如果您正在使用以美国为中心的地址卡斯认证软件,几乎任何供应商都会将您的地址标准化得相当好。我建议一种简单的格式如下:
- Address 1
- 地址1
- Address 2
- 地址2
- Address 3
- 地址3
- City
- 城市
- State
- 状态
- Zip
- 邮政编码
- Zip+4 (I would carry this so lookups are easier when checking for duplicates)
- Zip+4(我将带这个,这样在检查副本时查找更容易)
However, if you want a universal address I would look at the ADIS standard from IdeaAlliance. This standard can be used to breakdown (parse) addresses from almost any country into the relevant parts. Then they can be put back together using templates/components based on the Universal Postal Union standards (UPU S42 Standard on International Postal Address Components and Templates).
然而,如果你想要一个通用地址,我可以参考IdeaAlliance的ADIS标准。这个标准可以用来将几乎任何国家的地址分解成相关的部分。然后可以使用基于万国邮政联盟标准(万国邮政联盟国际邮政地址组件和模板的万国邮政联盟S42标准)的模板/组件将它们重新组合在一起。
The big plus of this format is that addresses that dont exist in a postal database like CASS can be entered and stored as separate parts.
这种格式的最大优点是,在像CASS这样的邮政数据库中不存在的地址可以作为独立的部分输入和存储。
#3
2
Very similar questions have been asked before.
以前也有人问过类似的问题。
Addresses are messy - at best.
地址是混乱的——充其量是混乱的。
It partly depends on what you want to do with the addresses. If you're going to use them to mail thing to people, then you simply need to record the image that will appear on the address label in a convenient form. If you're going to analyze the address, you have to work a lot harder.
这在一定程度上取决于您希望如何处理这些地址。如果要使用它们向人们邮寄东西,那么只需以方便的形式记录地址标签上出现的图像。如果你要分析地址,你必须更加努力。
Remember that the first time you have to deal with someone outside the US, all previous rules go astray. You may be strictly US-only, but beware.
记住,当你第一次不得不与美国以外的人打交道时,所有先前的规则都会被打乱。你可能只属于我们,但要小心。
#4
1
I looked into this a while ago, but for international addresses. I didn't find much in the way of a consensus. However, for the US, I found the succinctly named United States Thoroughfare, Landmark, and Postal Address Data Standard (Draft):
我不久前研究过这个问题,但只是为了国际地址。我并没有找到多少共识。然而,对于美国来说,我发现了简洁命名的美国大道、地标和邮政地址数据标准(草案):
http://www.fgdc.gov/standards/projects/FGDC-standards-projects/street-address/index_html
http://www.fgdc.gov/standards/projects/FGDC-standards-projects/street-address/index_html
I don't think that they actually provide any specific database schema ideas, but it might be a good starting point.
我不认为它们实际上提供了任何特定的数据库模式思想,但这可能是一个很好的起点。
#5
1
First, the "best" means of storing an address depends greatly on how it will be used. Is it just for reference or searches on say city? Do you plan on addressing envelopes? Are you going to integrate with a shipping system like FedEx or UPS? Will you store non-US addresses? Once you get into the realm of integrating with something that ships, you should start looking at CASS. This is a specification for handling the USPS addresses. There are applications out there that are CASS certified which will store and verify addresses. Thus, the second best practice would be to try to avoid reinventing the wheel and see if there is a system out there that will solve your problem especially if you are going to go international. You want to leverage the fact that someone else has worked out all the details about how to properly and efficiently store addresses for many countries around the world instead of having to do that investigation yourself.
首先,存储地址的“最佳”方法在很大程度上取决于如何使用它。它仅仅是一个城市的参考或搜索吗?你打算把地址写在信封上吗?你打算和联邦快递或UPS这样的运输系统整合吗?你会储存非美国的地址吗?一旦你进入到与舰船集成的领域,你应该开始关注卡斯。这是一个处理USPS地址的规范。有一些应用程序是经过CASS认证的,它们将存储和验证地址。因此,第二种最佳实践是尽量避免重新发明*,看看是否有一个系统可以解决你的问题,特别是如果你要走向国际化。你想利用其他人已经为世界上许多国家找到了正确、有效地存储地址的所有细节的事实,而不是自己去做调查。
#6
1
I've had to try to do this before and I'd found this document that gives you some pointers. I ended up shelving my schema since my application does have to deal with international addresses.
我之前试过这么做,我找到了这个文档,它给了你一些指针。由于我的应用程序必须处理国际地址,所以我最终搁置了我的模式。