In fact, regarding to the title in the question, I have a solution for this, but my approach seems to waste resources to create a List objects.
事实上,关于问题中的标题,我有一个解决方案,但我的方法似乎浪费资源来创建一个List对象。
So my question is: Do we have a more efficient approach for this?
所以我的问题是:我们有更有效的方法吗?
From the case, I want to remove the extra space " " and extra "a" from a Vector.
从案例中,我想从Vector中删除额外的空格“”和额外的“a”。
My vector includes:
我的载体包括:
{"a", "rainy", " ", "day", "with", " ", "a", "cold", "wind", "day", "a"}
{“a”,“rainy”,“”,“day”,“with”,“”,“a”,“cold”,“wind”,“day”,“a”}
Here is my code:
这是我的代码:
List lt = new LinkedList();
lt = new ArrayList();
lt.add("a");
lt.add(" ");
vec1.removeAll(lt);
As you can see the extra spaces in the list of Vector
, the reason that happens is that I use Vector
to read and chunk the word from word document, and sometimes the document may contain some extra spaces that caused by human error.
正如你可以看到Vector列表中的额外空格一样,发生的原因是我使用Vector来读取word文档中的单词,有时文档可能包含一些由人为错误引起的额外空格。
1 个解决方案
#1
0
Your current approach does suffer the problem that deleting an element from a Vector
is an O(N)
operation ... and you are potentially doing this M times (5 in your example).
您当前的方法确实遇到了从Vector中删除元素是O(N)操作的问题......并且您可能会执行此操作M次(在您的示例中为5)。
Assuming that you have multiple "stop words" and that you can change the data structures, here's a version that should (in theory) be more efficient:
假设您有多个“停用词”并且您可以更改数据结构,这里的版本应该(理论上)更有效:
public List<String> removeStopWords(
List<String> input, HashSet<String> stopWords) {
List<String> output = new ArrayList<String>(input.size());
for (String elem : input) {
if (!stopWords.contains(elem)) {
output.append(elem);
}
}
return res;
}
// This could be saved somewhere, assuming that you are always filtering
// out the same stopwords.
HashSet<String> stopWords = new HashSet<String>();
stopWords.add(" ");
stopWords.add("a");
... // and more
List<String> newList = removeStopwords(list, stopWords);
Points of note:
注意事项:
-
The above creates a new list. If you have to reuse the existing list, clear it and then
addAll
the new list elements. (This anotherO(N-M)
step ... so don't if you don't have to.)以上创建了一个新列表。如果必须重用现有列表,请清除它,然后添加所有新列表元素。 (这是另一个O(N-M)步骤......如果你不需要,那就不要了。)
-
If there are multiple stop words then using a HashSet will be more efficient; e.g. if done as above. I'm not sure exactly where the break even point is (versus using a List), but I suspect it is between 2 and 3 stopwords.
如果有多个停用词,那么使用HashSet会更有效;例如如果按上述方式完成。我不确定收支平衡点的确切位置(与使用List相比),但我怀疑它是2到3个停用词。
-
The above creates a new list, but it only copies
N - M
elements. By contrast, theremoveAll
algorithm when applied to aVector
could copyO(NM)
elements.以上创建了一个新列表,但它只复制了N-M个元素。相比之下,应用于Vector时的removeAll算法可以复制O(NM)元素。
-
Don't use a
Vector
unless you need a thread-safe data structure. AnArrayList
has a similar internal data structure, and doesn't incur synchronization overheads on each call.除非您需要线程安全的数据结构,否则不要使用Vector。 ArrayList具有类似的内部数据结构,并且不会在每次调用时产生同步开销。
#1
0
Your current approach does suffer the problem that deleting an element from a Vector
is an O(N)
operation ... and you are potentially doing this M times (5 in your example).
您当前的方法确实遇到了从Vector中删除元素是O(N)操作的问题......并且您可能会执行此操作M次(在您的示例中为5)。
Assuming that you have multiple "stop words" and that you can change the data structures, here's a version that should (in theory) be more efficient:
假设您有多个“停用词”并且您可以更改数据结构,这里的版本应该(理论上)更有效:
public List<String> removeStopWords(
List<String> input, HashSet<String> stopWords) {
List<String> output = new ArrayList<String>(input.size());
for (String elem : input) {
if (!stopWords.contains(elem)) {
output.append(elem);
}
}
return res;
}
// This could be saved somewhere, assuming that you are always filtering
// out the same stopwords.
HashSet<String> stopWords = new HashSet<String>();
stopWords.add(" ");
stopWords.add("a");
... // and more
List<String> newList = removeStopwords(list, stopWords);
Points of note:
注意事项:
-
The above creates a new list. If you have to reuse the existing list, clear it and then
addAll
the new list elements. (This anotherO(N-M)
step ... so don't if you don't have to.)以上创建了一个新列表。如果必须重用现有列表,请清除它,然后添加所有新列表元素。 (这是另一个O(N-M)步骤......如果你不需要,那就不要了。)
-
If there are multiple stop words then using a HashSet will be more efficient; e.g. if done as above. I'm not sure exactly where the break even point is (versus using a List), but I suspect it is between 2 and 3 stopwords.
如果有多个停用词,那么使用HashSet会更有效;例如如果按上述方式完成。我不确定收支平衡点的确切位置(与使用List相比),但我怀疑它是2到3个停用词。
-
The above creates a new list, but it only copies
N - M
elements. By contrast, theremoveAll
algorithm when applied to aVector
could copyO(NM)
elements.以上创建了一个新列表,但它只复制了N-M个元素。相比之下,应用于Vector时的removeAll算法可以复制O(NM)元素。
-
Don't use a
Vector
unless you need a thread-safe data structure. AnArrayList
has a similar internal data structure, and doesn't incur synchronization overheads on each call.除非您需要线程安全的数据结构,否则不要使用Vector。 ArrayList具有类似的内部数据结构,并且不会在每次调用时产生同步开销。