I have a complex problem of in-place editing of an array at hand. I have an array in which some of the elements are sub-strings of other elements.I want to delete all the sub-strings and keep the supersets/strings only. i.e. Array => ['1', '1 1', '1 1 1', '1 1 1 2', '1 2 3 1', '1 2', '2 3']
After operation I should have a sanitized array => ['1 1 1 2', '1 2 3 1']
我有一个复杂的问题,即手头编辑一个数组。我有一个数组,其中一些元素是其他元素的子字符串。我想删除所有子字符串并仅保留超集/字符串。即Array => ['1','1 1','1 1 1','1 1 1 2','1 2 3 1','1 2','2 3']术后我应该有一个sanitized array => ['1 1 1 2','1 2 3 1']
Is there an efficient algorithm to achieve the same ?
是否有一种有效的算法来实现相同的目标?
3 个解决方案
#1
3
This approach uses some array math to remove itself from the array and then checks to see if it shows up as a substring. I have no idea how performant this is.
这种方法使用一些数组数学从数组中删除自己,然后检查它是否显示为子字符串。我不知道这是多么高效。
a = ['1', '1 1', '1 1 1', '1 1 1 2', '1 2 3 1', '1 2', '2 3']
a.uniq.delete_if { |i| (a-[i]).any? {|j| j.include? i } }
I changed to using a delete_if as it will improve performance as you are shortening your array anytime a substring is found making subsequent checks slightly faster.
我改为使用delete_if,因为它可以提高性能,因为您可以在找到子字符串时缩短数组,从而使后续检查更快一些。
UPDATE: Cary Swoveland found an issue when the array contains duplicates. I have added a uniq to dedupe the array first although it's not entirely clear what should happen if an element is repeated, should both be removed since they are substrings of each other? I have solved this problem on the assumption that duplicates result in only one item showing in the output, but this could be wrong.
更新:当数组包含重复项时,Cary Swoveland发现了一个问题。我已经添加了一个uniq来重复数组,尽管不完全清楚如果一个元素重复会发生什么,两者都应该被删除,因为它们是彼此的子串?我已经解决了这个问题,假设重复导致输出中只显示一个项目,但这可能是错误的。
#2
2
It uses less memory, performs less computations. This one will delete substrings both ways, looping will be less. Brought
它使用更少的内存,执行更少的计算。这个将以两种方式删除子串,循环将更少。带
user system total real
First 0.000000 0.000000 0.000000 ( 0.000076)
Second 0.010000 0.000000 0.010000 ( 0.000037)
Third 0.000000 0.000000 0.000000 ( 0.000019)
Above mentioned is the benchmark results for the 2 algos mentioned above(First and Second) and this one(Third).
上面提到的是上面提到的2个算法(第一个和第二个)和第一个(第三个)的基准结果。
array = ['1 1 1', '1', '1 1', '1 1 1 2', '1 2 3 1', '1 2', '2 3', '1 2 3', '1 1 1']
i1 = 0
arr_len = array.length
last_index = arr_len - 1
while i1 <= last_index
w1 = array[i1]
i2 = i1 + 1
while i2 <= last_index
w2 = array[i2]
# If w2 is a subset of w1
if w1.include? w2
# Delete from index i2
array.delete_at(i2)
# Decrement the array_length as one element is deleted
arr_len -= 1
# Decrement last index, as one element is deleted
last_index -= 1
next
end
# If w1 comes out to be a subset of w2
if w2.include? w1
# Delete the value from that index
array.delete_at(i1)
# Decrement the array_length as one element is deleted
arr_len -= 1
# Decrement last index, as one element is deleted
last_index -= 1
# Reset value of w1 as it is deleted in this operation
w1 = array[i1]
# Reset index of 2nd loop to start matching again
i2 = i1 + 1
# Move next from here only
next
end
i2 += 1
end
i1 += 1
end
#3
1
Here's a way that removes substrings as they are found.
这是一种在找到子字符串时删除子字符串的方法。
a = ['1', '1 1', '1 1 1', '1 1 1 2', '1 2 3 1', '1 2', '2 3']
b = a.dup
b.size.times do
first, *rest = b
(rest.any? { |t| t.include? first }) ? b.shift : b.rotate!
end
b #=> ["1 1 1 2", "1 2 3 1"]
To see what's happening, insert
要查看发生了什么,请插入
puts "first=\"#{first}\n, rest=#{rest}"
after first,*rest = b
. That prints the following (before I reformatted).
首先,* rest = b。这打印出以下内容(在我重新格式化之前)。
first="1", rest=["1 1", "1 1 1", "1 1 1 2", "1 2 3 1", "1 2", "2 3"]
first="1 1", rest=["1 1 1", "1 1 1 2", "1 2 3 1", "1 2", "2 3"]
first="1 1 1", rest=["1 1 1 2", "1 2 3 1", "1 2", "2 3"]
first="1 1 1 2", rest=["1 2 3 1", "1 2", "2 3"]
first="1 2 3 1", rest=["1 2", "2 3", "1 1 1 2"]
first="1 2", rest=["2 3", "1 1 1 2", "1 2 3 1"]
first="2 3", rest=["1 1 1 2", "1 2 3 1"]
#1
3
This approach uses some array math to remove itself from the array and then checks to see if it shows up as a substring. I have no idea how performant this is.
这种方法使用一些数组数学从数组中删除自己,然后检查它是否显示为子字符串。我不知道这是多么高效。
a = ['1', '1 1', '1 1 1', '1 1 1 2', '1 2 3 1', '1 2', '2 3']
a.uniq.delete_if { |i| (a-[i]).any? {|j| j.include? i } }
I changed to using a delete_if as it will improve performance as you are shortening your array anytime a substring is found making subsequent checks slightly faster.
我改为使用delete_if,因为它可以提高性能,因为您可以在找到子字符串时缩短数组,从而使后续检查更快一些。
UPDATE: Cary Swoveland found an issue when the array contains duplicates. I have added a uniq to dedupe the array first although it's not entirely clear what should happen if an element is repeated, should both be removed since they are substrings of each other? I have solved this problem on the assumption that duplicates result in only one item showing in the output, but this could be wrong.
更新:当数组包含重复项时,Cary Swoveland发现了一个问题。我已经添加了一个uniq来重复数组,尽管不完全清楚如果一个元素重复会发生什么,两者都应该被删除,因为它们是彼此的子串?我已经解决了这个问题,假设重复导致输出中只显示一个项目,但这可能是错误的。
#2
2
It uses less memory, performs less computations. This one will delete substrings both ways, looping will be less. Brought
它使用更少的内存,执行更少的计算。这个将以两种方式删除子串,循环将更少。带
user system total real
First 0.000000 0.000000 0.000000 ( 0.000076)
Second 0.010000 0.000000 0.010000 ( 0.000037)
Third 0.000000 0.000000 0.000000 ( 0.000019)
Above mentioned is the benchmark results for the 2 algos mentioned above(First and Second) and this one(Third).
上面提到的是上面提到的2个算法(第一个和第二个)和第一个(第三个)的基准结果。
array = ['1 1 1', '1', '1 1', '1 1 1 2', '1 2 3 1', '1 2', '2 3', '1 2 3', '1 1 1']
i1 = 0
arr_len = array.length
last_index = arr_len - 1
while i1 <= last_index
w1 = array[i1]
i2 = i1 + 1
while i2 <= last_index
w2 = array[i2]
# If w2 is a subset of w1
if w1.include? w2
# Delete from index i2
array.delete_at(i2)
# Decrement the array_length as one element is deleted
arr_len -= 1
# Decrement last index, as one element is deleted
last_index -= 1
next
end
# If w1 comes out to be a subset of w2
if w2.include? w1
# Delete the value from that index
array.delete_at(i1)
# Decrement the array_length as one element is deleted
arr_len -= 1
# Decrement last index, as one element is deleted
last_index -= 1
# Reset value of w1 as it is deleted in this operation
w1 = array[i1]
# Reset index of 2nd loop to start matching again
i2 = i1 + 1
# Move next from here only
next
end
i2 += 1
end
i1 += 1
end
#3
1
Here's a way that removes substrings as they are found.
这是一种在找到子字符串时删除子字符串的方法。
a = ['1', '1 1', '1 1 1', '1 1 1 2', '1 2 3 1', '1 2', '2 3']
b = a.dup
b.size.times do
first, *rest = b
(rest.any? { |t| t.include? first }) ? b.shift : b.rotate!
end
b #=> ["1 1 1 2", "1 2 3 1"]
To see what's happening, insert
要查看发生了什么,请插入
puts "first=\"#{first}\n, rest=#{rest}"
after first,*rest = b
. That prints the following (before I reformatted).
首先,* rest = b。这打印出以下内容(在我重新格式化之前)。
first="1", rest=["1 1", "1 1 1", "1 1 1 2", "1 2 3 1", "1 2", "2 3"]
first="1 1", rest=["1 1 1", "1 1 1 2", "1 2 3 1", "1 2", "2 3"]
first="1 1 1", rest=["1 1 1 2", "1 2 3 1", "1 2", "2 3"]
first="1 1 1 2", rest=["1 2 3 1", "1 2", "2 3"]
first="1 2 3 1", rest=["1 2", "2 3", "1 1 1 2"]
first="1 2", rest=["2 3", "1 1 1 2", "1 2 3 1"]
first="2 3", rest=["1 1 1 2", "1 2 3 1"]