选择字符串中的上一个和下一个单词

I'm looping through a lot of strings like this one in C# :

我在C#中循环了很多像这样的字符串:

“Look, good against remotes is one thing, good against the living, that’s something else.”

“看起来,对遥控器有好处是一回事,对生活有好处,那就是别的东西。”

In these strings, I have a single selected word, determined by an index from a previous function, like the second "good" in the case above.

在这些字符串中,我有一个选定的单词,由前一个函数的索引确定,就像上面例子中的第二个“好”一样。

“Look, good (<- not this one) against remotes is one thing, good (<- this one) against the living, that’s something else.”

“看起来,好的(< - 不是这一个)反对遥控器是一回事,好的(< - 这一个)反对生活,这是另一回事。”

I want to find the words surrounding my selected word. In the case above, thing and against.

我想找到我所选单词周围的单词。在上面的情况下,事情和反对。

“Look, good against remotes is one thing, good against the living, that’s something else.”

“看起来,对遥控器有好处是一回事,对生活有好处,那就是别的东西。”

I have tried taking the string apart with .split() and different approaches with regular expressions, but I can't find a good way to achieve this. I have access to the word, good in the example above, and the index (41 above) where it's located in the string.

我试过用.split()和正则表达式的不同方法分开字符串,但我找不到一个很好的方法来实现这一点。我可以访问上面示例中的单词,以及它位于字符串中的索引(上面的41)。

A huge bonus if it would ignore punctuation and commas, so that in the example above, my theoretical function would only return against since there is a comma between thing and good.

如果它会忽略标点符号和逗号,那将是一个巨大的奖励,所以在上面的例子中,我的理论函数只会返回,因为在事物和善之间有一个逗号。

Is there a simple way to achieve this? Any help appreciated.

有没有一种简单的方法来实现这一目标?任何帮助赞赏。

7 个解决方案

#1

Including the "huge bonus":

包括“巨额奖金”:

string text = "Look, good against remotes is one thing, good against the living, that’s something else.";
string word = "good";
int index = 41;

string before = Regex.Match(text.Substring(0, index), @"(\w*)\s*$").Groups[1].Value;
string after = Regex.Match(text.Substring(index + word.Length), @"^\s*(\w*)").Groups[1].Value;

In this case before will be an empty string because of the comma, and after will be "against".

在这种情况下,之前将是一个空字符串,因为逗号,之后将是“反对”。

Explanation: When getting before, the first step is to grab just the first part of the string up until just before the target word, text.Substring(0, index) does this. Then we use the regular expression (\w*)\s*$ to match and capture a word (\w*) followed by any amount of whitespace \s* at the end of the string ($). The contents of the first capture group is the word we want, if we could not match a word the regex will still match but it will match an empty string or only whitespace, and the first capture group will contain an empty string.

说明:在获取之前,第一步是直接获取字符串的第一部分,直到目标字之前,text.Substring(0,index)执行此操作。然后我们使用正则表达式(\ w *)\ s * $匹配并捕获一个单词(\ w *),后跟字符串末尾的任意数量的空格\ * *($)。第一个捕获组的内容是我们想要的单词,如果我们无法匹配正则表达式仍将匹配的单词,但它将匹配空字符串或仅空白,并且第一个捕获组将包含空字符串。

The logic for getting after is pretty much the same, except that text.Substring(index + word.Length) is used to get the rest of the string after the target word. The regex ^\s*(\w*) is similar except that it is anchored to the beginning of the string with ^ and the \s* comes before the \w* since we need to strip off whitespace on the front end of the word.

除了text.Substring(index + word.Length)用于获取目标字之后的其余字符串之外,获取之后的逻辑几乎相同。正则表达式^ \ s *(\ w *)是类似的,除了它用^固定到字符串的开头,而\ s *在\ w *之前,因为我们需要去除前端的空格。字。

#2

string phrase = "Look, good against remotes is one thing, good against the living, that’s something else.";
int selectedPosition = 41;
char[] ignoredSpecialChars = new char[2] { ',', '.' };

string afterWord = phrase.Substring(selectedPosition)
                         .Split(' ')[1]
                         .Trim(ignoredSpecialChars);
string beforeWord = phrase.Substring(0, selectedPosition)
                          .Split(' ')
                          .Last()
                          .Trim(ignoredSpecialChars);

You can change ignoredSpecialChars array to get rid of the special characters you don't need.

您可以更改ignoredSpecialChars数组以删除不需要的特殊字符。

UPDATE:

This will return null if there are any special characters between your word and words that surround it.

如果您的单词和它周围的单词之间有任何特殊字符,则返回null。

string phrase = "Look, good against remotes is one thing, good against the living, that’s something else.";
int selectedPosition = 41;
char[] ignoredSpecialChars = new char[2] { ',', '.' };

string afterWord = phrase.Substring(selectedPosition)
                         .Split(' ')[1];
afterWord = Char.IsLetterOrDigit(afterWord.First()) ?
            afterWord.TrimEnd(ignoredSpecialChars) : 
            null;

string beforeWord = phrase.Substring(0, selectedPosition)
                          .Split(' ')
                          .Last();
beforeWord = Char.IsLetterOrDigit(beforeWord.Last()) ?
             beforeWord.TrimStart(ignoredSpecialChars) : 
             null;

#3

i haven't tested it yet, but it should work. You can just look at the Substring before and after the word and then search for the first or the last " ". Then you know where the words start and end.

我还没有测试过,但它应该工作。你可以在单词之前和之后查看Substring,然后搜索第一个或最后一个“”。然后你知道单词的开头和结尾。

string word = "good";
int index = 41

string before = word.Substring(0,index-1).Trim();   //-1 because you want to ignore the " " right in front of the word
string after = word.Substring(index+word.length+1).Trim();   //+1 because of the " " after the word

int indexBefore = before.LastIndexOf(" ");
int indexAfter = after.IndexOf(" ");

string wordBefore = before.Substring(indexBefore, index-1);
string wordAfter = after.Substring(index+word.length+1, indexAfter);

EDIT

and if you want to ignore punctuation and commas, just remove them from your string

如果你想忽略标点符号和逗号,只需从字符串中删除它们即可

#4

You can use regular expression [^’a-zA-Z]+ to get words from your string:

您可以使用正则表达式[^'a-zA-Z] +从字符串中获取单词:

words = Regex.Split(text, @"[^’a-zA-Z0-9]+");

Implementing navigation is up to you. Store index of selected word and use it to get next one or previous:

实施导航取决于您。存储所选单词的索引并使用它来获取下一个或前一个单词:

int index = Array.IndexOf(words, "living");
if (index < words.Count() - 1)
    next = words[index + 1]; // that's

if (index > 0)
    previous = words[index - 1]; // the

#5

Here is a linqpad program written in vb

这是一个用vb编写的linqpad程序

    Sub Main
    dim input as string = "Look, good against remotes is one thing, good against the living, that’s something else."

    dim words as new list(of string)(input.split(" "c))

    dim index = getIndex(words)

    dim retVal = GetSurrounding(words, index, "good", 2)

    retVal.dump()
End Sub

function getIndex(words as list(of string)) as dictionary(of string, list(of integer))

    for i as integer = 0 to words.count- 1
            words(i) = getWord(words(i))
    next

    'words.dump()

    dim index as new dictionary(of string, List(of integer))(StringComparer.InvariantCultureIgnoreCase)
    for j as integer = 0 to words.count- 1
            dim word = words(j)
            if index.containsKey(word) then
                    index(word).add(j)
            else  
                    index.add(word, new list(of integer)({j}))
            end if
    next

    'index.dump()
    return index
end function

function getWord(candidate) as string
    dim pattern as string = "^[\w'’]+"
    dim match = Regex.Match(candidate, pattern)
    if match.success then
            return match.toString()
    else
            return candidate
    end if
end function 

function GetSurrounding(words, index, word, position) as tuple(of string, string)        

    if not index.containsKey(word) then
            return nothing
    end if

    dim indexEntry = index(word)
    if position > indexEntry.count
            'not enough appearences of word
            return nothing
    else
            dim left = ""
            dim right = ""
            dim positionInWordList = indexEntry(position -1)
            if PositionInWordList >0
                    left = words(PositionInWordList-1)
            end if
            if PositionInWordList < words.count -1
                    right = words(PositionInWordList +1)
            end if

            return new tuple(of string, string)(left, right)
    end if
end function

#6

Without the regex this can be done recursively with Array.IndexOf.

如果没有正则表达式,可以使用Array.IndexOf以递归方式完成。

public class BeforeAndAfterWordFinder
{
    public string Input { get; private set; }
    private string[] words;

    public BeforeAndAfterWordFinder(string input)
    {
        Input = input;
        words = Input.Split(new string[] { ", ", " " }, StringSplitOptions.None);
    }

    public void Run(int occurance, string word)
    {
        int index = 0;
        OccuranceAfterWord(occurance, word, ref index);
        Print(index);            
    }

    private void OccuranceAfterWord(int occurance, string word, ref int lastIndex, int thisOccurance = 0)
    {
        lastIndex = lastIndex > 0 ? Array.IndexOf(words, word, lastIndex + 1) : Array.IndexOf(words, word);

        if (lastIndex != -1)
        {
            thisOccurance++; 
            if (thisOccurance < occurance)
            {
                OccuranceAfterWord(occurance, word, ref lastIndex, thisOccurance);
            }                
        }            
    }

    private void Print(int index)
    {            
        Console.WriteLine("{0} : {1}", words[index - 1], words[index + 1]);//check for index out of range
    }
}

Usage:

  string input = "Look, good against remotes is one thing, good against the living, that’s something else.";
  var F = new BeforeAndAfterWordFinder(input);
  F.Run(2, "good");

#7

-2

create a string where you remove punctuation and commas (use Remove). from that string, search for Substring "thing good against". and so on, if needed.

创建一个删除标点符号和逗号的字符串(使用Remove)。从该字符串中,搜索Substring“thing good against”。等等,如果需要的话。

#1