I am trying to retrieve a list of french stopwords from Lucene 4.0. The only available method is FrenchAnalyzer.getDefaultStopSet()
which returns a CharArraySet. I need to convert this into a String Set.
我正试图从Lucene 4.0中检索一份法语停工词列表。惟一可用的方法是FrenchAnalyzer.getDefaultStopSet(),它返回CharArraySet。我需要把它转换成一个字符串集。
My quick and dirty working code looks like:
我快速而又肮脏的工作代码如下:
Set<String> stopWords = new HashSet<String>();
for (String stopWord : FrenchAnalyzer.getDefaultStopSet().toString().split(", ")) {
stopWords.add(stopWord);
};
And returns:
并返回:
[eues, serais, fûtes, serait, eussions, est, étant, pour, avez, on, avions, ceci, serez, avec, moi, ou, eue, mon, son, eussiez, aurez, notre, nos, avais, avait, soi, une, seraient, eûmes, aurais, aurait, ait, fûmes, du, eusse, étées, serions, des, aurions, [lui, fût, seront, sois, seriez, serons, soit, eût, aie, avons, ces, cet, de, eut, eus, ma, me, eusses, furent, eux, fus, fut, eu, leurs, d, ayez, les, aviez, c, n, auront, l, aurons, m, j, un, fussiez, elle, nous, t, eûtes, tu, s, soyez, ne, sans, en, et, es, y, étée, même, seras, cette, auraient, sommes, te, aux, quels, soyons, êtes, étais, quelles, était, étés, celà , leur, aies, ta, serai, fusse, fussions, auras, fussent, votre, se, auriez, aurai, le, étiez, sa, ce, tes, été, ses, toi, vous, la], sera, aient, par, étions, ici, pas, sur, avaient, ayant, ont, mes, quelle, étaient, ton, que, qu, eurent, vos, qui, fusses, mais, as, dans, il, à, au, je, ai, sont, quel, aura, soient, suis, ayons, ils, eussent]
I tried with an iterator:
我尝试了一个迭代器:
Iterator iter = FrenchAnalyzer.getDefaultStopSet().iterator();
while(iter.hasNext()) {
Object stopWord = iter.next();
stopWords.add(stopWord.toString());
}
But it returns an encrypted set:
但它返回一个加密集:
[[C@2fb3f8f6, [C@464c4975, [C@6fc5f743, [C@5705b99f, [C@26ee7a14, [C@5a9e29fb, [C@b41b571, [C@47315d34, [C@3e110003, [C@210a6ae2, [C@82a6f16, [C@70cb6009, [C@575fadcf, [C@1342a80d, [C@7f09fd93, [C@58ecb281, [C@1a84da23, [C@165973ea, [C@4ac9131c, [C@6fb000e7, [C@34fbb7cb, [C@603b1d04, [C@630045eb, [C@159b5217, [C@5975d6ab, [C@ac980c9, [C@a94884d, [C@5557c2bd, [C@16ba8602, [C@1b016632, [C@36b8bef7, [C@744a6cbf, [C@1c93d6bc, [C@509df6f1, [C@7d26f75b, [C@80d3d6f, [C@2b76e552, [C@7825d2b2, [C@3c1d332b, [C@38dda25b, [C@6cb8, [C@7f2ad19e, [C@5328f6ee, [C@51b48197, [C@4e17e4ca, [C@2acdb06e, [C@32ef2c60, [C@f01a1e, [C@5e7808b9, [C@2a9df354, [C@2b275d39, [C@b6e39f, [C@46b8c8e6, [C@36ff057f, [C@7290cb03, [C@16bdb503, [C@288051, [C@502cb49d, [C@5a5e179a, [C@50c4fe76, [C@4229ab3e, [C@266bade9, [C@45d64c37, [C@5ef4f44a, [C@29c56c60, [C@6719dc16, [C@35549f94, [C@44b01d43, [C@5ece2187, [C@3f77b3cd, [C@6766afb3, [C@596e1fb1, [C@76f8968f, [C@14d6a05e, [C@6ef137d, [C@2087c268, [C@67d225a7, [C@7b2be1bd, [C@79df8b99, [C@2dec8909, [C@3b835282, [C@4bbd7848, [C@423e5d1, [C@76497934, [C@48ee22f7, [C@4ed1e89e, [C@19e3118a, [C@851052d, [C@54281d4b, [C@1bbb60c3, [C@1f4384c2, [C@21a80a69, [C@2d5253d5, [C@4ce32802, [C@939b78e, [C@79a5f739, [C@1d807ca8, [C@2393385d, [C@79de256f, [C@c0b76fa, [C@2abe0e27, [C@604e280c, [C@2fcac6db, [C@e4ac00c, [C@23d256fa, [C@38b5dac4, [C@2b2d96f2, [C@3a6ac461, [C@6910fe28, [C@488e32e7, [C@2adb1d4, [C@676bd8ea, [C@32bf7190, [C@2c41d05d, [C@509f5011, [C@a39ab89, [C@39e87719, [C@332611a7, [C@1572e449, [C@418c56d, [C@78dc6a77, [C@25fa1bb6, [C@20c1f10e, [C@7ad81784, [C@6513cf0, [C@29e97f9f, [C@9c0ec97, [C@2b5ac3c9, [C@2d342ba4, [C@4a8c1dd9, [C@1d3c468a, [C@3782da3d, [C@25595f51, [C@4d865b28, [C@4c5e176f, [C@15a62c31, [C@1cb8deef, [C@d8d9850, [C@380e28b9, [C@7df17e77, [C@3da99561, [C@2df6df4c, [C@2efb56b1, [C@68e6ff0d, [C@33010058, [C@69945ce, [C@53ebd75b, [C@3d9360e2, [C@351e1e67, [C@2705d88a, [C@2993a66f, [C@e80d1ff, [C@52c05d3b, [C@3a64c34e, [C@6bdab91, [C@4ce2cb55, [C@77fddc31, [C@1be1a408, [C@20b9b538, [C@43462851, [C@30ec4a87, [C@4b0ab323, [C@74b23210]]
Javadoc for the class: FrenchAnalyzer
类的Javadoc: FrenchAnalyzer
2 个解决方案
#1
3
Try this:
试试这个:
Iterator iter = FrenchAnalyzer.getDefaultStopSet().iterator();
while(iter.hasNext()) {
char[] stopWord = (char[]) iter.next();
stopWords.add(new String (stopWord));
}
#2
2
It looks like that method returns Set<char[]>
, so the first thing you should do it type your iterator, then build a String from the char array. Using a foreach loop simplifies your code too:
看起来这个方法返回了Set
A simple implementation would be:
一个简单的执行办法是:
for (char[] chars : (Set<char[]>)FrenchAnalyzer.getDefaultStopSet()) {
stopWords.add(new String(chars, "UTF-8"));
}
#1
3
Try this:
试试这个:
Iterator iter = FrenchAnalyzer.getDefaultStopSet().iterator();
while(iter.hasNext()) {
char[] stopWord = (char[]) iter.next();
stopWords.add(new String (stopWord));
}
#2
2
It looks like that method returns Set<char[]>
, so the first thing you should do it type your iterator, then build a String from the char array. Using a foreach loop simplifies your code too:
看起来这个方法返回了Set
A simple implementation would be:
一个简单的执行办法是:
for (char[] chars : (Set<char[]>)FrenchAnalyzer.getDefaultStopSet()) {
stopWords.add(new String(chars, "UTF-8"));
}