作者:disappearedgod
时间:2014-6-18
题目
Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
解法
借用这道题来讨论一下编辑距离这个话题。
思路
微软实习在2014年春天第一次网上提交代码时候,曾经出过一道跟编辑距离有关的题目。
拿到这道题之前,我们的思路是:
- 了解什么是编辑距离
- 编辑距离建模
- 因为是DP系列题目,所以跳过一些判断直接想能否用DP解
根据这个步骤,我们首先先来看看什么是编辑距离
编辑距离
我们首先想到的是算法导论上关于编辑距离的定义,但是在此之前我们先看看wikipedia上面的描述:
Given two strings a and b on an alphabet Σ (e.g. the set of ASCII characters, the set of bytes [0..255], etc.), the edit distance d(a, b) is the minimum-weight series of edit operations that transforms a into b. One of the simplest sets of edit operations is that defined by Levenshtein in 1966:
- Insertion of a single symbol. If a = uv, then inserting the symbol x produces uxv. This can also be denoted ε→x, using ε to denote the empty string.
- Deletion of a single symbol changes uxv to uv (x→ε).
- Substitution of a single symbol x for a symbol y ≠ x changes uxv to uyv (x→y).
由于这个是一道跟字符串有关的题目,按照LeetCode传统的思路就是按字符比较。
我们再看看wikipedia上面的一个例子(是关于了是否有substitution的例子)
The Levenshtein distance between "kitten" and "sitting" is 3. The minimal edit script that transforms the former into the latter is:
- kitten → sitten (substitution of "s" for "k")
- sitten → sittin (substitution of "i" for "e")
- sittin → sitting (insertion of "g" at the end).
LCS distance(insertions and deletions only) gives a different distance and minimal edit script:
- delete k at 0
- insert s at 0
- delete e at 4
- insert i at 4
- insert g at 6
for a total cost/distance of 5 operations.
然后我们考虑一下传统DP的解法(算导中的一节,为什么做DP就不详细说)。
DP解法
DP操作建模
对于编辑距离的三个操作来说:
插入举例:X:=ex VS Y:=exp
前面的两个ex都是相同,则编辑距离不变为0,X没有第三个字符,这里如果我们插入了一个字符即可。
删除举例: X:= exp VS Y:= ex
我故意用了这个例子,也就是说插入和删除是互逆的,就是看我们以谁为对象来看待这个问题。
替代举例: X:= sitten VS Y:=sitting
这里例子的第五个字符需要替代,为什么要替代呢? 因为第四个字符和第六个字符相同(这里就需要用DP,也就是说,需要一个空间来记录之前的比较结果和之后的比较结果)。
DP实现:
考虑了几个例子之后,我们再想一下这三个步骤是同时判断的,还是有先后顺序来判断的?
我们再看看wikipedia里面的基本的算法:
Basic algorithm
Main article: Wagner–Fischer algorithm
Using Levenshtein's original operations, the edit distance between and is given by , defined by the recurrence
This algorithm can be generalized to handle transpositions by adding an additional term in the recursive clause's minimization
Main article: Wagner–Fischer algorithm
Using Levenshtein's original operations, the edit distance between and is given by , defined by the recurrence
This algorithm can be generalized to handle transpositions by adding an additional term in the recursive clause's minimization
public class Solution { public int minDistance(String word1, String word2) { int len1 = word1.length(); int len2 = word2.length(); int[][] dp = new int[len1+1][len2+1]; if(len1 == 0) return len2; if(len2 == 0) return len1; for(int i = 0; i < len1+1; i++) dp[i][0] = i; for(int i = 0 ; i < len2+1; i++) dp[0][i] = i; for(int i = 1; i < len1+1; i++){ for(int j = 1 ; j <len2+1;j++){ //dp[i][j] VS dp[i][j-1]+1(cost) for delete VS dp[i-1][j] +1(cost) for insert VS dp[i-1][j-1] + cost for substitution int cost = word1.charAt(i-1)==word2.charAt(j-1) ? 0 : 1; dp[i][j] = Math.min(dp[i-1][j-1]+cost,Math.min(dp[i][j-1]+1,dp[i-1][j]+1)); } } return dp[len1][len2]; } }
博客参考:
博客查阅:
[leetcode]Edit Distance
代码优化:
进阶学习: