minimum deletion

🏠

This is a dynamic programming question, very similar to levenshtein distance. Given two strings, e.g frog and dog, determine how many characters need to be deleted for the words to match.

 1 from functools import lru_cache
 2 
 3 
 4 @lru_cache(None)
 5 def dist(a, b):
 6     if a == "" or b == "":
 7         return len(b) or len(a)
 8     if a[-1] == b[-1]:
 9         return dist(a[:-1], b[:-1])
10     return min(dist(a[:-1], b), dist(a, b[:-1])) + 1
11 
12 
13 print(dist("dog", "frog"))
 1 """
 2 DP, with deletions until min distance is found
 3 Time: O(M*N) 
 4 Space: O(mn)
 5 O(1)
 6 
 7    '' H E A T
 8 ''  0 1 2 3 4
 9 H   1 0 1 0 0
10 I   2 0 0 0 0 
11 T   3 0 0 0 
12 
13 """
14 
15 def deletion_distance(a, b):
16   def dp(a, b):
17     if (a, b) in mem:
18       return mem[(a, b)]
19     if a == '' or b == '':
20       mem[(a, b)] = len(a) or len(b)
21       return mem[(a, b)]
22     if a[-1] == b[-1]:
23       mem[(a, b)] = dp(a[:-1], b[:-1])
24       return mem[(a, b)]
25     mem[(a, b)] = min(dp(a, b[:-1]), dp(a[:-1], b)) + 1
26     return mem[(a, b)]
27   mem = {}  
28   return dp(a, b)
29 
30 
31 
32 
33 """
34 Deletion Distance
35 
36 The deletion distance of two strings is the minimum number of characters you need to delete in the two strings in order to get the same string. For instance, the deletion distance between "heat" and "hit" is 3:
37 By deleting 'e' and 'a' in "heat", and 'i' in "hit", we get the string "ht" in both cases.
38 We cannot get the same string from both strings by deleting 2 letters or fewer.
39 
40 Given the strings str1 and str2, write an efficient function deletionDistance that returns the deletion distance between them. Explain how your function works, and analyze its time and space complexities.
41 
42 STR1 = HEAT
43 STR2 = HIT
44 ANS - 3
45 
46    '' H E A T
47 ''  0 1 2 3 4
48 H   1 0 1 2 3
49 I   2 1 2 3 4
50 T   3 2 3 4 3
51 
52 base case:
53 empty string - don't delete - 0
54 
55 if chars match:
56 dp[i][j] = dp[i-1][j-1]
57 
58 if chars mismatch:
59 dp[i][j] = min(dp[i][j-1], dp[i-1][j]) + 1
60 """
61 
62 
63 def deletion_distance(str1, str2):
64     dp = [[0 for _ in range(len(str2)+1)] for _ in range(len(str1)+1)]
65 
66     for i in range(len(str1)+1):
67         dp[i][0] = i
68 
69     for j in range(len(str2)+1):
70         dp[0][j] = j
71 
72     for i in range(1, len(str1)+1):
73         for j in range(1, len(str2)+1):
74             if (str1[i-1] == str2[j-1]):
75                 dp[i][j] = dp[i-1][j-1]
76             else:
77                 dp[i][j] = min(dp[i][j-1], dp[i-1][j]) + 1
78 
79     return dp[len(str1)][len(str2)]

See also:

levenshtein distance