从文本中提取关键字并排除单词

时间:2022-09-13 09:49:12

I have this function to extract all words from text

我有这个功能从文本中提取所有单词

public static string[] GetSearchWords(string text)
{

    string pattern = @"\S+";
    Regex re = new Regex(pattern);

    MatchCollection matches = re.Matches(text);
    string[] words = new string[matches.Count];
    for (int i=0; i<matches.Count; i++)
    {
        words[i] = matches[i].Value;
    }
    return words;
}

and I want to exclude a list of words from the return array, the words list looks like this

我想从返回数组中排除单词列表,单词列表看起来像这样

string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";

How can I modify the above function to avoid returning words which are in my list.

如何修改上述函数以避免返回列表中的单词。

2 个解决方案

#1


5  

string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
var ignoredWords = strWordsToExclude.Split(',');
return words.Except(ignoredWords).ToArray();

I think Except method fits your needs

我认为Except方法符合您的需求

#2


2  

If you aren't forced to use Regex, you can use a little LINQ:

如果你没有*使用Regex,你可以使用一点LINQ:

void Main()
{
    var wordsToExclude = "if,you,me,about,more,but,by,can,could,did".Split(',');

    string str = "if you read about cooking you can cook";

    var newWords = GetSearchWords(str, wordsToExclude); // read, cooking, cook
}



string[] GetSearchWords(string text, IEnumerable<string> toExclude)
{
    var words = text.Split();

    return words.Where(word => !toExclude.Contains(word)).ToArray();
}

I'm assuming a word is a series of non-whitespace characters.

我假设一个单词是一系列非空白字符。

#1


5  

string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
var ignoredWords = strWordsToExclude.Split(',');
return words.Except(ignoredWords).ToArray();

I think Except method fits your needs

我认为Except方法符合您的需求

#2


2  

If you aren't forced to use Regex, you can use a little LINQ:

如果你没有*使用Regex,你可以使用一点LINQ:

void Main()
{
    var wordsToExclude = "if,you,me,about,more,but,by,can,could,did".Split(',');

    string str = "if you read about cooking you can cook";

    var newWords = GetSearchWords(str, wordsToExclude); // read, cooking, cook
}



string[] GetSearchWords(string text, IEnumerable<string> toExclude)
{
    var words = text.Split();

    return words.Where(word => !toExclude.Contains(word)).ToArray();
}

I'm assuming a word is a series of non-whitespace characters.

我假设一个单词是一系列非空白字符。