我怎样才能找到类似的地址记录?

时间:2021-01-16 19:14:33

The workflow is like this:

工作流程如下:

  1. I receive a scan of a coupon with data (firstname, lastname, zip, city + misc information) on it.
  2. 我收到一张优惠券扫描件,上面有数据(名字,姓氏,邮编,城市+ misc信息)。

  3. Before I create a new customer, I have to search the database if the customer might exist already.
  4. 在创建新客户之前,如果客户可能已经存在,我必须搜索数据库。

Now my question: What's the best way to find an existing customer, when there is no unique ID available?

现在我的问题是:当没有可用的唯一ID时,找到现有客户的最佳方法是什么?

PS: I do have a unique ID in the database, just not on the coupons we receive ;)

PS:我在数据库中确实有一个唯一的ID,而不是我们收到的优惠券;)

5 个解决方案

#1


3  

We are using the Levenshtein distance algorithm to check users for duplication. However we have quite strict rules to enter the data itself, so we have to check only for misstyping, case differences and such.

我们使用Levenshtein距离算法检查用户是否有重复。但是,我们有非常严格的规则来输入数据本身,因此我们只需检查错误分析,案例差异等。

#2


2  

See this previous question: Parse usable Street Address, City, State, Zip from a string.

请参阅上一个问题:从字符串中解析可用的街道地址,城市,州,邮编。

Soundex would help you if you require similiar matches.

如果您需要类似的匹配,Soundex会帮助您。

#3


2  

If you really want to do this the right way, the easy way, the complete way you'll buy Netrics.

如果你真的想以正确的方式,简单的方式,完整的方式购买Netrics。

http://www.netrics.com/

We bought it, and wrapped an application around it that lets our employees match anything they want. The can configure confidence intervals for each column, build thesauri where you can map Robert to Bob, and John to Jack. It's amazing and used by some of the larger institutions in the country for scrubing various lists.

我们买了它,并在它周围包裹了一个应用程序,让我们的员工可以匹配他们想要的任可以为每列配置置信区间,构建叙述,您可以将Robert映射到Bob,将John映射到Jack。令人惊讶的是,该国的一些大型机构使用它来清理各种清单。

#4


0  

If you have SQL Server 2005, you can bring your data in through SSIS and use a fuzzy lookup to check for sameness.

如果您有SQL Server 2005,则可以通过SSIS引入数据并使用模糊查找来检查相同性。

#5


-3  

You query the database for all customers that match the given data, e.g.

您在数据库中查询与给定数据匹配的所有客户,例如

SELECT ID FROM tbl_customers WHERE 
   first_name LIKE 'JOHN' 
   AND last_name LIKE 'Doe' 
   AND zip_code=12345 
   AND city LIKE 'Ducktown'

If the number of rows returned is 0, create a new entry in the database. If it is 1, the query will give you the ID. If it is > 1 you may have several customers of the same name living in the same area, need to find a way to deal with this situation. But that would justify a new question here ;-)

如果返回的行数为0,则在数据库中创建一个新条目。如果为1,查询将为您提供ID。如果它> 1你可能有几个同名的客户住在同一地区,需要找到一种方法来处理这种情况。但这样可以证明这里有一个新问题;-)


p.s.: If you have no unique ID at all, redesign your database.

p.s。:如果您根本没有唯一ID,请重新设计数据库。

#1


3  

We are using the Levenshtein distance algorithm to check users for duplication. However we have quite strict rules to enter the data itself, so we have to check only for misstyping, case differences and such.

我们使用Levenshtein距离算法检查用户是否有重复。但是,我们有非常严格的规则来输入数据本身,因此我们只需检查错误分析,案例差异等。

#2


2  

See this previous question: Parse usable Street Address, City, State, Zip from a string.

请参阅上一个问题:从字符串中解析可用的街道地址,城市,州,邮编。

Soundex would help you if you require similiar matches.

如果您需要类似的匹配,Soundex会帮助您。

#3


2  

If you really want to do this the right way, the easy way, the complete way you'll buy Netrics.

如果你真的想以正确的方式,简单的方式,完整的方式购买Netrics。

http://www.netrics.com/

We bought it, and wrapped an application around it that lets our employees match anything they want. The can configure confidence intervals for each column, build thesauri where you can map Robert to Bob, and John to Jack. It's amazing and used by some of the larger institutions in the country for scrubing various lists.

我们买了它,并在它周围包裹了一个应用程序,让我们的员工可以匹配他们想要的任可以为每列配置置信区间,构建叙述,您可以将Robert映射到Bob,将John映射到Jack。令人惊讶的是,该国的一些大型机构使用它来清理各种清单。

#4


0  

If you have SQL Server 2005, you can bring your data in through SSIS and use a fuzzy lookup to check for sameness.

如果您有SQL Server 2005,则可以通过SSIS引入数据并使用模糊查找来检查相同性。

#5


-3  

You query the database for all customers that match the given data, e.g.

您在数据库中查询与给定数据匹配的所有客户,例如

SELECT ID FROM tbl_customers WHERE 
   first_name LIKE 'JOHN' 
   AND last_name LIKE 'Doe' 
   AND zip_code=12345 
   AND city LIKE 'Ducktown'

If the number of rows returned is 0, create a new entry in the database. If it is 1, the query will give you the ID. If it is > 1 you may have several customers of the same name living in the same area, need to find a way to deal with this situation. But that would justify a new question here ;-)

如果返回的行数为0,则在数据库中创建一个新条目。如果为1,查询将为您提供ID。如果它> 1你可能有几个同名的客户住在同一地区,需要找到一种方法来处理这种情况。但这样可以证明这里有一个新问题;-)


p.s.: If you have no unique ID at all, redesign your database.

p.s。:如果您根本没有唯一ID,请重新设计数据库。