使用List 替换字符串

时间:2022-02-15 16:51:44

I have a List of words I want to ignore like this one :

我有一个我想忽略的单词列表,如下所示:

public List<String> ignoreList = new List<String>()
        {
            "North",
            "South",
            "East",
            "West"
        };

For a given string, say "14th Avenue North" I want to be able to remove the "North" part, so basically a function that would return "14th Avenue " when called.

对于给定的字符串,比如说“14th Avenue North”我希望能够删除“North”部分,所以基本上这个函数在被调用时将返回“第14大道”。

I feel like there is something I should be able to do with a mix of LINQ, regex and replace, but I just can't figure it out.

我觉得有一些东西我可以用LINQ,正则表达式和替换混合,但我只是想不出来。

The bigger picture is, I'm trying to write an address matching algorithm. I want to filter out words like "Street", "North", "Boulevard", etc. before I use the Levenshtein algorithm to evaluate the similarity.

更大的图景是,我正在尝试编写一个地址匹配算法。在使用Levenshtein算法评估相似性之前,我想过滤掉“Street”,“North”,“Boulevard”等词。

11 个解决方案

#1


12  

How about this:

这个怎么样:

string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)));

or for .Net 3:

或.Net 3:

string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)).ToArray());

Note that this method splits the string up into individual words so it only removes whole words. That way it will work properly with addresses like Northampton Way #123 that string.Replace can't handle.

请注意,此方法将字符串拆分为单个单词,因此它只删除整个单词。这样它就可以正常使用像Northampton Way#123这样的地址,而string.place无法处理。

#2


6  

Regex r = new Regex(string.Join("|", ignoreList.Select(s => Regex.Escape(s)).ToArray()));
string s = "14th Avenue North";
s = r.Replace(s, string.Empty);

#3


2  

Something like this should work:

像这样的东西应该工作:

string FilterAllValuesFromIgnoreList(string someStringToFilter)
{
  return ignoreList.Aggregate(someStringToFilter, (str, filter)=>str.Replace(filter, ""));
}

#4


2  

What's wrong with a simple for loop?

简单的for循环有什么问题?

string street = "14th Avenue North";
foreach (string word in ignoreList)
{
    street = street.Replace(word, string.Empty);
}

#5


2  

If you know that the list of word contains only characters that do not need escaping inside a regular expression then you can do this:

如果您知道单词列表只包含不需要在正则表达式中转义的字符,那么您可以这样做:

string s = "14th Avenue North";
Regex regex = new Regex(string.Format(@"\b({0})\b",
                        string.Join("|", ignoreList.ToArray())));
s = regex.Replace(s, "");

Result:

结果:

14th Avenue 

If there are special characters you will need to fix two things:

如果有特殊字符,您需要修复两件事:

  • Use Regex.Escape on each element of ignore list.
  • 在忽略列表的每个元素上使用Regex.Escape。
  • The word-boundary \b will not match a whitespace followed by a symbol or vice versa. You may need to check for whitespace (or other separating characters such as punctuation) using lookaround assertions instead.
  • 字边界\ b将不匹配后跟符号的空格,反之亦然。您可能需要使用外观断言检查空格(或其他分隔字符,如标点符号)。

Here's how to fix these two problems:

以下是解决这两个问题的方法:

Regex regex = new Regex(string.Format(@"(?<= |^)({0})(?= |$)",
    string.Join("|", ignoreList.Select(x => Regex.Escape(x)).ToArray())));

#6


1  

If it's a short string as in your example, you can just loop though the strings and replace one at a time. If you want to get fancy you can use the LINQ Aggregate method to do it:

如果它是一个短字符串,就像你的例子中一样,你可以循环遍历字符串并一次替换一个字符串。如果你想得到想象,你可以使用LINQ Aggregate方法来做到这一点:

address = ignoreList.Aggregate(address, (a, s) => a.Replace(s, String.Empty));

If it's a large string, that would be slow. Instead you can replace all strings in a single run through the string, which is much faster. I made a method for that in this answer.

如果它是一个大字符串,那将会很慢。相反,您可以通过字符串替换单个运行中的所有字符串,这要快得多。我在这个答案中为此做了一个方法。

#7


1  

LINQ makes this easy and readable. This requires normalized data though, particularly in that it is case-sensitive.

LINQ使这简单易读。这需要标准化数据,特别是它区分大小写。

List<string> ignoreList = new List<string>()
{
    "North",
    "South",
    "East",
    "West"
};    

string s = "123 West 5th St"
        .Split(' ')  // Separate the words to an array
        .ToList()    // Convert array to TList<>
        .Except(ignoreList) // Remove ignored keywords
        .Aggregate((s1, s2) => s1 + " " + s2); // Reconstruct the string

#8


0  

Why not juts Keep It Simple ?

为什么不juts保持简单?

public static string Trim(string text)
{
   var rv = text.trim();
   foreach (var ignore in ignoreList) {
      if(tv.EndsWith(ignore) {
      rv = rv.Replace(ignore, string.Empty);
   }
  }
   return rv;
}

#9


0  

You can do this using and expression if you like, but it's easier to turn it around than using a Aggregate. I would do something like this:

如果您愿意,可以使用和表达式来执行此操作,但是使用聚合比使用聚合更容易。我会做这样的事情:

string s = "14th Avenue North"
ignoreList.ForEach(i => s = s.Replace(i, ""));
//result is "14th Avenue "

#10


0  

public static string Trim(string text)
{
   var rv = text;
   foreach (var ignore in ignoreList)
      rv = rv.Replace(ignore, "");
   return rv;
}

Updated For Gabe

更新了Gabe


public static string Trim(string text)
{
   var rv = "";
   var words = text.Split(" ");
   foreach (var word in words)
   {
      var present = false;
      foreach (var ignore in ignoreList)
         if (word == ignore)
            present = true;
      if (!present)
         rv += word;
   }
   return rv;
}

#11


0  

If you have a list, I think you're going to have to touch all the items. You could create a massive RegEx with all your ignore keywords and replace to String.Empty.

如果您有一个列表,我认为您将不得不触摸所有项目。您可以使用所有ignore关键字创建一个大规模的RegEx,并替换为String.Empty。

Here's a start:

这是一个开始:

(^|\s+)(North|South|East|West){1,2}(ern)?(\s+|$)

If you have a single RegEx for ignore words, you can do a single replace for each phrase you want to pass to the algorithm.

如果您有一个忽略单词的RegEx,则可以为要传递给算法的每个短语执行单个替换。

#1


12  

How about this:

这个怎么样:

string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)));

or for .Net 3:

或.Net 3:

string.Join(" ", text.Split().Where(w => !ignoreList.Contains(w)).ToArray());

Note that this method splits the string up into individual words so it only removes whole words. That way it will work properly with addresses like Northampton Way #123 that string.Replace can't handle.

请注意,此方法将字符串拆分为单个单词,因此它只删除整个单词。这样它就可以正常使用像Northampton Way#123这样的地址,而string.place无法处理。

#2


6  

Regex r = new Regex(string.Join("|", ignoreList.Select(s => Regex.Escape(s)).ToArray()));
string s = "14th Avenue North";
s = r.Replace(s, string.Empty);

#3


2  

Something like this should work:

像这样的东西应该工作:

string FilterAllValuesFromIgnoreList(string someStringToFilter)
{
  return ignoreList.Aggregate(someStringToFilter, (str, filter)=>str.Replace(filter, ""));
}

#4


2  

What's wrong with a simple for loop?

简单的for循环有什么问题?

string street = "14th Avenue North";
foreach (string word in ignoreList)
{
    street = street.Replace(word, string.Empty);
}

#5


2  

If you know that the list of word contains only characters that do not need escaping inside a regular expression then you can do this:

如果您知道单词列表只包含不需要在正则表达式中转义的字符,那么您可以这样做:

string s = "14th Avenue North";
Regex regex = new Regex(string.Format(@"\b({0})\b",
                        string.Join("|", ignoreList.ToArray())));
s = regex.Replace(s, "");

Result:

结果:

14th Avenue 

If there are special characters you will need to fix two things:

如果有特殊字符,您需要修复两件事:

  • Use Regex.Escape on each element of ignore list.
  • 在忽略列表的每个元素上使用Regex.Escape。
  • The word-boundary \b will not match a whitespace followed by a symbol or vice versa. You may need to check for whitespace (or other separating characters such as punctuation) using lookaround assertions instead.
  • 字边界\ b将不匹配后跟符号的空格,反之亦然。您可能需要使用外观断言检查空格(或其他分隔字符,如标点符号)。

Here's how to fix these two problems:

以下是解决这两个问题的方法:

Regex regex = new Regex(string.Format(@"(?<= |^)({0})(?= |$)",
    string.Join("|", ignoreList.Select(x => Regex.Escape(x)).ToArray())));

#6


1  

If it's a short string as in your example, you can just loop though the strings and replace one at a time. If you want to get fancy you can use the LINQ Aggregate method to do it:

如果它是一个短字符串,就像你的例子中一样,你可以循环遍历字符串并一次替换一个字符串。如果你想得到想象,你可以使用LINQ Aggregate方法来做到这一点:

address = ignoreList.Aggregate(address, (a, s) => a.Replace(s, String.Empty));

If it's a large string, that would be slow. Instead you can replace all strings in a single run through the string, which is much faster. I made a method for that in this answer.

如果它是一个大字符串,那将会很慢。相反,您可以通过字符串替换单个运行中的所有字符串,这要快得多。我在这个答案中为此做了一个方法。

#7


1  

LINQ makes this easy and readable. This requires normalized data though, particularly in that it is case-sensitive.

LINQ使这简单易读。这需要标准化数据,特别是它区分大小写。

List<string> ignoreList = new List<string>()
{
    "North",
    "South",
    "East",
    "West"
};    

string s = "123 West 5th St"
        .Split(' ')  // Separate the words to an array
        .ToList()    // Convert array to TList<>
        .Except(ignoreList) // Remove ignored keywords
        .Aggregate((s1, s2) => s1 + " " + s2); // Reconstruct the string

#8


0  

Why not juts Keep It Simple ?

为什么不juts保持简单?

public static string Trim(string text)
{
   var rv = text.trim();
   foreach (var ignore in ignoreList) {
      if(tv.EndsWith(ignore) {
      rv = rv.Replace(ignore, string.Empty);
   }
  }
   return rv;
}

#9


0  

You can do this using and expression if you like, but it's easier to turn it around than using a Aggregate. I would do something like this:

如果您愿意,可以使用和表达式来执行此操作,但是使用聚合比使用聚合更容易。我会做这样的事情:

string s = "14th Avenue North"
ignoreList.ForEach(i => s = s.Replace(i, ""));
//result is "14th Avenue "

#10


0  

public static string Trim(string text)
{
   var rv = text;
   foreach (var ignore in ignoreList)
      rv = rv.Replace(ignore, "");
   return rv;
}

Updated For Gabe

更新了Gabe


public static string Trim(string text)
{
   var rv = "";
   var words = text.Split(" ");
   foreach (var word in words)
   {
      var present = false;
      foreach (var ignore in ignoreList)
         if (word == ignore)
            present = true;
      if (!present)
         rv += word;
   }
   return rv;
}

#11


0  

If you have a list, I think you're going to have to touch all the items. You could create a massive RegEx with all your ignore keywords and replace to String.Empty.

如果您有一个列表,我认为您将不得不触摸所有项目。您可以使用所有ignore关键字创建一个大规模的RegEx,并替换为String.Empty。

Here's a start:

这是一个开始:

(^|\s+)(North|South|East|West){1,2}(ern)?(\s+|$)

If you have a single RegEx for ignore words, you can do a single replace for each phrase you want to pass to the algorithm.

如果您有一个忽略单词的RegEx,则可以为要传递给算法的每个短语执行单个替换。