如何检查字符串是否是另一个字符串的正确子集

时间:2021-11-04 21:20:05

I want to check if a string is a strictly a subset of another string. For this end I used boost::contains and I compare the size of strings as follows:

我想检查字符串是否严格是另一个字符串的子集。为此我使用了boost :: contains,我比较了字符串的大小,如下所示:

#include <boost/algorithm/string.hpp>
#include <iostream>

using namespace std;
using namespace boost::algorithm;

int main()
{
  string str1 = "abc news";
  string str2 = "abc";
  //strim strings using boost
  trim(str1);
  trim(str2);
  //if str2 is a subset of str1 and its size is less than the size of str1 then it is strictly contained in str1
  if(contains(str1,str2) && (str2.size() < str1.size()))
  {
    cout <<"contains" << end;
  }
  return 0;
}

Is there a better way to solve this problem? Instead of also comparing the size of strings?

有没有更好的方法来解决这个问题?而不是比较字符串的大小?


Example

  • ABC is a proper subset of ABC NEWS
  • ABC是ABC新闻的合适子集

  • ABC is not a proper subset of ABC
  • ABC不是ABC的合适子集


4 个解决方案

#1


You can just use == or != to compare the strings:

您可以使用==或!=来比较字符串:

if(contains(str1, str2) && (str1 != str2))
    ...

If string contains a string and both are not equal, you have a real subset.

如果string包含一个字符串且两者不相等,那么您就有了一个真实的子集。

If this is better than your method is for you to decide. It is less typing and very clear (IMO), but probably a little bit slower if both strings are long and equal or both start with the same, long sequence.

如果这比你的方法更好,你决定。它的输入较少且非常清晰(IMO),但如果两个字符串都很长且相等或两者都以相同的长序列开始,则可能会慢一点。

Note: If you really care about performance, you might want to try the Boyer-Moore search and the Boyer-Moore-Horspool search. They are way faster than any trivial string search (as apparently used in the string search in stdlibc++, see here), I do not know if boost::contains uses them.

注意:如果您真的关心性能,可能需要尝试Boyer-Moore搜索和Boyer-Moore-Horspool搜索。它们比任何简单的字符串搜索更快(在stdlibc ++中的字符串搜索中显然使用,请参见此处),我不知道boost :: contains是否使用它们。

#2


I would use the following:

我会使用以下内容:

bool is_substr_of(const std::string& sub, const std::string& s) {
  return sub.size() < s.size() && s.find(sub) != s.npos;
}

This uses the standard library only, and does the size check first which is cheaper than s.find(sub) != s.npos.

这只使用标准库,并首先检查尺寸比s.find(sub)!= s.npos便宜。

#3


About Comparaison operations

TL;DR : Be sure about the format of what you're comparing.

TL; DR:确定您所比较的格式。

Be wary of how you define strictly.

警惕严格定义的方式。

For example, you did not pointed out thoses issue is your question, but if i submit let's say :

例如,你没有指出问题是你的问题,但如果我提交让我们说:

 "ABC       " //IE whitespaces
 "ABC\n"

What is your take on it ? Do you accept it or not ? If you don't, you'll have to either trim or to clean your output before comparing - just a general note on comparaison operations -

你对它有什么看法?你接受与否吗?如果你不这样做,你必须在比较之前修剪或清理你的输出 - 只是关于比较操作的一般说明 -

Anyway, as Baum pointed out, you can either check equality of your strings using == or you can compare length (which is more efficient given that you first checked for substring) with either size() or length();

无论如何,正如Baum指出的那样,您可以使用==检查字符串的相等性,或者您可以比较length(在您首次检查子字符串时更有效)与size()或length();

#4


another approach, using only the standard library:

另一种方法,只使用标准库:

#include <algorithm>
#include <string>
#include <iostream>

using namespace std;

int main()
{
  string str1 = "abc news";
  string str2 = "abc";
  if (str2 != str1
    && search(begin(str1), end(str1), 
              begin(str2), end(str2)) != end(str1))
  {
    cout <<"contains" << endl;
  }
  return 0;
}

#1


You can just use == or != to compare the strings:

您可以使用==或!=来比较字符串:

if(contains(str1, str2) && (str1 != str2))
    ...

If string contains a string and both are not equal, you have a real subset.

如果string包含一个字符串且两者不相等,那么您就有了一个真实的子集。

If this is better than your method is for you to decide. It is less typing and very clear (IMO), but probably a little bit slower if both strings are long and equal or both start with the same, long sequence.

如果这比你的方法更好,你决定。它的输入较少且非常清晰(IMO),但如果两个字符串都很长且相等或两者都以相同的长序列开始,则可能会慢一点。

Note: If you really care about performance, you might want to try the Boyer-Moore search and the Boyer-Moore-Horspool search. They are way faster than any trivial string search (as apparently used in the string search in stdlibc++, see here), I do not know if boost::contains uses them.

注意:如果您真的关心性能,可能需要尝试Boyer-Moore搜索和Boyer-Moore-Horspool搜索。它们比任何简单的字符串搜索更快(在stdlibc ++中的字符串搜索中显然使用,请参见此处),我不知道boost :: contains是否使用它们。

#2


I would use the following:

我会使用以下内容:

bool is_substr_of(const std::string& sub, const std::string& s) {
  return sub.size() < s.size() && s.find(sub) != s.npos;
}

This uses the standard library only, and does the size check first which is cheaper than s.find(sub) != s.npos.

这只使用标准库,并首先检查尺寸比s.find(sub)!= s.npos便宜。

#3


About Comparaison operations

TL;DR : Be sure about the format of what you're comparing.

TL; DR:确定您所比较的格式。

Be wary of how you define strictly.

警惕严格定义的方式。

For example, you did not pointed out thoses issue is your question, but if i submit let's say :

例如,你没有指出问题是你的问题,但如果我提交让我们说:

 "ABC       " //IE whitespaces
 "ABC\n"

What is your take on it ? Do you accept it or not ? If you don't, you'll have to either trim or to clean your output before comparing - just a general note on comparaison operations -

你对它有什么看法?你接受与否吗?如果你不这样做,你必须在比较之前修剪或清理你的输出 - 只是关于比较操作的一般说明 -

Anyway, as Baum pointed out, you can either check equality of your strings using == or you can compare length (which is more efficient given that you first checked for substring) with either size() or length();

无论如何,正如Baum指出的那样,您可以使用==检查字符串的相等性,或者您可以比较length(在您首次检查子字符串时更有效)与size()或length();

#4


another approach, using only the standard library:

另一种方法,只使用标准库:

#include <algorithm>
#include <string>
#include <iostream>

using namespace std;

int main()
{
  string str1 = "abc news";
  string str2 = "abc";
  if (str2 != str1
    && search(begin(str1), end(str1), 
              begin(str2), end(str2)) != end(str1))
  {
    cout <<"contains" << endl;
  }
  return 0;
}