I have this function to extract all words from text
我有这个功能从文本中提取所有单词
public static string[] GetSearchWords(string text)
{
string pattern = @"\S+";
Regex re = new Regex(pattern);
MatchCollection matches = re.Matches(text);
string[] words = new string[matches.Count];
for (int i=0; i<matches.Count; i++)
{
words[i] = matches[i].Value;
}
return words;
}
and I want to exclude a list of words from the return array, the words list looks like this
我想从返回数组中排除单词列表,单词列表看起来像这样
string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
How can I modify the above function to avoid returning words which are in my list.
如何修改上述函数以避免返回列表中的单词。
2 个解决方案
#1
5
string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
var ignoredWords = strWordsToExclude.Split(',');
return words.Except(ignoredWords).ToArray();
I think Except
method fits your needs
我认为Except方法符合您的需求
#2
2
If you aren't forced to use Regex, you can use a little LINQ:
如果你没有*使用Regex,你可以使用一点LINQ:
void Main()
{
var wordsToExclude = "if,you,me,about,more,but,by,can,could,did".Split(',');
string str = "if you read about cooking you can cook";
var newWords = GetSearchWords(str, wordsToExclude); // read, cooking, cook
}
string[] GetSearchWords(string text, IEnumerable<string> toExclude)
{
var words = text.Split();
return words.Where(word => !toExclude.Contains(word)).ToArray();
}
I'm assuming a word is a series of non-whitespace characters.
我假设一个单词是一系列非空白字符。
#1
5
string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
var ignoredWords = strWordsToExclude.Split(',');
return words.Except(ignoredWords).ToArray();
I think Except
method fits your needs
我认为Except方法符合您的需求
#2
2
If you aren't forced to use Regex, you can use a little LINQ:
如果你没有*使用Regex,你可以使用一点LINQ:
void Main()
{
var wordsToExclude = "if,you,me,about,more,but,by,can,could,did".Split(',');
string str = "if you read about cooking you can cook";
var newWords = GetSearchWords(str, wordsToExclude); // read, cooking, cook
}
string[] GetSearchWords(string text, IEnumerable<string> toExclude)
{
var words = text.Split();
return words.Where(word => !toExclude.Contains(word)).ToArray();
}
I'm assuming a word is a series of non-whitespace characters.
我假设一个单词是一系列非空白字符。