在两个字符串中查找常见字符

时间:2022-05-05 21:37:59

I am coding for the problem in which we got to count the number of common characters in two strings. Main part of the count goes like this

我正在编码我们需要计算两个字符串中常见字符数的问题。计数的主要部分是这样的

for(i=0; i < strlen(s1); i++) {
    for(j = 0; j < strlen(s2); j++) {
        if(s1[i] == s2[j]) {
            count++;
            s2[j] = '*';
            break;
        }
    }
}

This goes with an O(n^2) logic. However I could not think of a better solution than this. Can anyone help me in coding with an O(n) logic.

这与O(n ^ 2)逻辑一致。但是,我想不出比这更好的解决方案。任何人都可以用O(n)逻辑帮助我编码。

10 个解决方案

#1


8  

This is very simple. Take two int arrays freq1 and freq2. Initialize all its elements to 0. Then read your strings and store the frequencies of the characters to these arrays. After that compare the arrays freq1 and freq2 to find the common characters.

这很简单。取两个int数组freq1和freq2。将其所有元素初始化为0.然后读取字符串并将字符的频率存储到这些数组中。之后比较数组freq1和freq2以找到常见字符。

#2


3  

Your current code is O(n^3) because of the O(n) strlens and produces incorrect results, for example on "aa", "aa" (which your code will return 4).

您的当前代码为O(n ^ 3),因为O(n)条带并产生不正确的结果,例如“aa”,“aa”(您的代码将返回4)。

This code counts letters in common (each letter being counted at most once) in O(n).

此代码在O(n)中计算共同的字母(每个字母最多计算一次)。

int common(const char *a, const char *b) {
    int table[256] = {0};
    int result = 0;
    for (; *a; a++)table[*a]++;
    for (; *b; b++)result += (table[*b]-- > 0);
    return result;
}

Depending on how you define "letters in common", you may have different logic. Here's some testcases for the definition I'm using (which is size of the multiset intersection).

根据您如何定义“共同字母”,您可能有不同的逻辑。这是我正在使用的定义的一些测试用例(这是multiset交集的大小)。

int main(int argc, char *argv[]) {
    struct { const char *a, *b; int want; } cases[] = {
        {"a", "a", 1},
        {"a", "b", 0},
        {"a", "aa", 1},
        {"aa", "a", 1},
        {"ccc", "cccc", 3},
        {"aaa", "aaa", 3},
        {"abc", "cba", 3},
        {"aasa", "asad", 3},
    };
    int fail = 0;
    for (int i = 0; i < sizeof(cases) / sizeof(*cases); i++) {
        int got = common(cases[i].a, cases[i].b);
        if (got != cases[i].want) {
            fail = 1;
            printf("common(%s, %s) = %d, want %d\n",
                   cases[i].a, cases[i].b, got, cases[i].want);
        }
    }
    return fail;
}

#3


3  

It can be done in O(n) time with constant space.

它可以在O(n)时间内以恒定的空间完成。

The pseudo code goes like this :

伪代码如下:

int map1[26], map2[26];
int common_chars = 0;

for c1 in string1:
    map1[c1]++;

for c2 in string2:
    map2[c2]++;

for i in 1 to 26:
    common_chars += min(map1[i], map2[i]);

#4


1  

You can do it with 2n:

你可以用2n做到这一点:

int i,j, len1 = strlen(s1), len2 = strlen(s2);
unsigned char allChars[256] = { 0 };
int count = 0;

for( i=0; i<len1; i++ )
{
    allChars[ (unsigned char) s1[i] ] = 1;
}

for( i=0; i<len2; i++ )
{
    if( allChars[ (unsigned char) s1[i] ] == 1 )
    {
        allChars[ (unsigned char) s2[i] ] = 2;
    }
}

for( i=0; i<256; i++ )
{
    if( allChars[i] == 2 )
    {
        cout << allChars[i] << endl;
        count++;
    }
}

#5


1  

Following code traverses each sting only once. So the complexity is O(n). One of the assumptions is that the upper and lower cases are considered same.

以下代码仅遍历每个sting一次。所以复杂性是O(n)。其中一个假设是上下案例被认为是相同的。

 #include<stdio.h>

int main() {
    char a[] = "Hello world";
    char b[] = "woowrd";
    int x[26] = {0};
    int i;
    int index;

    for (i = 0; a[i] != '\0'; i++) {
        index = a[i] - 'a';
        if (index > 26) {
            //capital char
            index = a[i] - 'A';
        }
        x[index]++;
    }

    for (i = 0; b[i] != '\0'; i++) {
        index = b[i] - 'a';
        if (index > 26) {
            //capital char
            index = b[i] - 'A';
        }

        if (x[index] > 0)
            x[index] = -1;
    }

    printf("Common characters in '%s' and '%s' are ", a, b);
    for (i = 0; i < 26; i++) {
        if (x[i] < 0)
            printf("%c", 'a'+i);
    }
    printf("\n");
}

#6


1  

int count(string a, string b) 
{

   int i,c[26]={0},c1[26]={};

   for(i=0;i<a.length();i++)
   {
        if(97<=a[i]&&a[i]<=123)
        c[a[i]-97]++;
   }
   for(i=0;i<b.length();i++)
   {
       if(97<=b[i]&&b[i]<=123)
        c1[b[i]-97]++;
    } 
    int s=0;
    for(i=0;i<26;i++)
    {
        s=s+abs(c[i]+c1[i]-(c[i]-c1[i]));

    }   

    return (s);  
}

This is much easier and better solution

这是更容易和更好的解决方案

#7


0  

for (std::vector<char>::iterator i = s1.begin(); i != s1.end(); ++i)
{
    if (std::find(s2.begin(), s2.end(), *i) != s2.end())
   {
    dest.push_back(*i);
   }
}

taken from here

取自这里

#8


0  

C implementation to run in O(n) time and constant space.

C实现在O(n)时间和常量空间中运行。

#define ALPHABETS_COUNT 26 
int commonChars(char *s1, char *s2)
{
    int c_count = 0, i; 
    int arr1[ALPHABETS_COUNT] = {0}, arr2[ALPHABETS_COUNT] = {0};

    /* Compute the number of occurances of each character */
    while (*s1) arr1[*s1++-'a'] += 1;
    while (*s2) arr2[*s2++-'a'] += 1;       

    /* Increment count based on match found */
    for(i=0; i<ALPHABETS_COUNT; i++) {
        if(arr1[i] == arr2[i]) c_count += arr1[i];
        else if(arr1[i]>arr2[i] && arr2[i] != 0) c_count += arr2[i];
        else if(arr2[i]>arr1[i] && arr1[i] != 0) c_count += arr1[i];
    }

    return c_count;

}

#9


0  

First, your code does not run in O(n^2), it runs in O(nm), where n and m are the length of each string.

首先,您的代码不在O(n ^ 2)中运行,它以O(nm)运行,其中n和m是每个字符串的长度。

You can do it in O(n+m), but not better, since you have to go through each string, at least once, to see if a character is in both.

你可以在O(n + m)中完成,但不是更好,因为你必须经历每个字符串,至少一次,以查看一个字符是否同时存在。

An example in C++, assuming:

C ++中的一个例子,假设:

  • ASCII characters
  • All characters included (letters, numbers, special, spaces, etc...)
  • 包括所有字符(字母,数字,特殊,空格等......)

  • Case sensitive

std::vector<char> strIntersect(std::string const&s1, std::string const&s2){

    std::vector<bool> presents(256, false);  //Assuming ASCII
    std::vector<char> intersection;

    for (auto c : s1) {
        presents[c] = true;
    }
    for (auto c : s2) {
        if (presents[c]){
            intersection.push_back(c);
            presents[c] = false;
        }
    }
    return intersection;
}

int main() {
    std::vector<char> result; 
    std::string s1 = "El perro de San Roque no tiene rabo, porque Ramon Rodriguez se lo ha cortado";
    std::string s2 = "Saint Roque's dog has no tail, because Ramon Rodriguez chopped it off";

    //Expected: "S a i n t   R o q u e s d g h l , b c m r z p"

    result = strIntersect(s1, s2);
    for (auto c : result) {
         std::cout << c << " ";
    }
    std::cout << std::endl;

    return 0;
}

#10


-1  

can be easily done using the concept of "catching" which is a sub-algorithm of hashing.

可以使用“捕获”的概念轻松完成,“捕获”是散列的子算法。

#1


8  

This is very simple. Take two int arrays freq1 and freq2. Initialize all its elements to 0. Then read your strings and store the frequencies of the characters to these arrays. After that compare the arrays freq1 and freq2 to find the common characters.

这很简单。取两个int数组freq1和freq2。将其所有元素初始化为0.然后读取字符串并将字符的频率存储到这些数组中。之后比较数组freq1和freq2以找到常见字符。

#2


3  

Your current code is O(n^3) because of the O(n) strlens and produces incorrect results, for example on "aa", "aa" (which your code will return 4).

您的当前代码为O(n ^ 3),因为O(n)条带并产生不正确的结果,例如“aa”,“aa”(您的代码将返回4)。

This code counts letters in common (each letter being counted at most once) in O(n).

此代码在O(n)中计算共同的字母(每个字母最多计算一次)。

int common(const char *a, const char *b) {
    int table[256] = {0};
    int result = 0;
    for (; *a; a++)table[*a]++;
    for (; *b; b++)result += (table[*b]-- > 0);
    return result;
}

Depending on how you define "letters in common", you may have different logic. Here's some testcases for the definition I'm using (which is size of the multiset intersection).

根据您如何定义“共同字母”,您可能有不同的逻辑。这是我正在使用的定义的一些测试用例(这是multiset交集的大小)。

int main(int argc, char *argv[]) {
    struct { const char *a, *b; int want; } cases[] = {
        {"a", "a", 1},
        {"a", "b", 0},
        {"a", "aa", 1},
        {"aa", "a", 1},
        {"ccc", "cccc", 3},
        {"aaa", "aaa", 3},
        {"abc", "cba", 3},
        {"aasa", "asad", 3},
    };
    int fail = 0;
    for (int i = 0; i < sizeof(cases) / sizeof(*cases); i++) {
        int got = common(cases[i].a, cases[i].b);
        if (got != cases[i].want) {
            fail = 1;
            printf("common(%s, %s) = %d, want %d\n",
                   cases[i].a, cases[i].b, got, cases[i].want);
        }
    }
    return fail;
}

#3


3  

It can be done in O(n) time with constant space.

它可以在O(n)时间内以恒定的空间完成。

The pseudo code goes like this :

伪代码如下:

int map1[26], map2[26];
int common_chars = 0;

for c1 in string1:
    map1[c1]++;

for c2 in string2:
    map2[c2]++;

for i in 1 to 26:
    common_chars += min(map1[i], map2[i]);

#4


1  

You can do it with 2n:

你可以用2n做到这一点:

int i,j, len1 = strlen(s1), len2 = strlen(s2);
unsigned char allChars[256] = { 0 };
int count = 0;

for( i=0; i<len1; i++ )
{
    allChars[ (unsigned char) s1[i] ] = 1;
}

for( i=0; i<len2; i++ )
{
    if( allChars[ (unsigned char) s1[i] ] == 1 )
    {
        allChars[ (unsigned char) s2[i] ] = 2;
    }
}

for( i=0; i<256; i++ )
{
    if( allChars[i] == 2 )
    {
        cout << allChars[i] << endl;
        count++;
    }
}

#5


1  

Following code traverses each sting only once. So the complexity is O(n). One of the assumptions is that the upper and lower cases are considered same.

以下代码仅遍历每个sting一次。所以复杂性是O(n)。其中一个假设是上下案例被认为是相同的。

 #include<stdio.h>

int main() {
    char a[] = "Hello world";
    char b[] = "woowrd";
    int x[26] = {0};
    int i;
    int index;

    for (i = 0; a[i] != '\0'; i++) {
        index = a[i] - 'a';
        if (index > 26) {
            //capital char
            index = a[i] - 'A';
        }
        x[index]++;
    }

    for (i = 0; b[i] != '\0'; i++) {
        index = b[i] - 'a';
        if (index > 26) {
            //capital char
            index = b[i] - 'A';
        }

        if (x[index] > 0)
            x[index] = -1;
    }

    printf("Common characters in '%s' and '%s' are ", a, b);
    for (i = 0; i < 26; i++) {
        if (x[i] < 0)
            printf("%c", 'a'+i);
    }
    printf("\n");
}

#6


1  

int count(string a, string b) 
{

   int i,c[26]={0},c1[26]={};

   for(i=0;i<a.length();i++)
   {
        if(97<=a[i]&&a[i]<=123)
        c[a[i]-97]++;
   }
   for(i=0;i<b.length();i++)
   {
       if(97<=b[i]&&b[i]<=123)
        c1[b[i]-97]++;
    } 
    int s=0;
    for(i=0;i<26;i++)
    {
        s=s+abs(c[i]+c1[i]-(c[i]-c1[i]));

    }   

    return (s);  
}

This is much easier and better solution

这是更容易和更好的解决方案

#7


0  

for (std::vector<char>::iterator i = s1.begin(); i != s1.end(); ++i)
{
    if (std::find(s2.begin(), s2.end(), *i) != s2.end())
   {
    dest.push_back(*i);
   }
}

taken from here

取自这里

#8


0  

C implementation to run in O(n) time and constant space.

C实现在O(n)时间和常量空间中运行。

#define ALPHABETS_COUNT 26 
int commonChars(char *s1, char *s2)
{
    int c_count = 0, i; 
    int arr1[ALPHABETS_COUNT] = {0}, arr2[ALPHABETS_COUNT] = {0};

    /* Compute the number of occurances of each character */
    while (*s1) arr1[*s1++-'a'] += 1;
    while (*s2) arr2[*s2++-'a'] += 1;       

    /* Increment count based on match found */
    for(i=0; i<ALPHABETS_COUNT; i++) {
        if(arr1[i] == arr2[i]) c_count += arr1[i];
        else if(arr1[i]>arr2[i] && arr2[i] != 0) c_count += arr2[i];
        else if(arr2[i]>arr1[i] && arr1[i] != 0) c_count += arr1[i];
    }

    return c_count;

}

#9


0  

First, your code does not run in O(n^2), it runs in O(nm), where n and m are the length of each string.

首先,您的代码不在O(n ^ 2)中运行,它以O(nm)运行,其中n和m是每个字符串的长度。

You can do it in O(n+m), but not better, since you have to go through each string, at least once, to see if a character is in both.

你可以在O(n + m)中完成,但不是更好,因为你必须经历每个字符串,至少一次,以查看一个字符是否同时存在。

An example in C++, assuming:

C ++中的一个例子,假设:

  • ASCII characters
  • All characters included (letters, numbers, special, spaces, etc...)
  • 包括所有字符(字母,数字,特殊,空格等......)

  • Case sensitive

std::vector<char> strIntersect(std::string const&s1, std::string const&s2){

    std::vector<bool> presents(256, false);  //Assuming ASCII
    std::vector<char> intersection;

    for (auto c : s1) {
        presents[c] = true;
    }
    for (auto c : s2) {
        if (presents[c]){
            intersection.push_back(c);
            presents[c] = false;
        }
    }
    return intersection;
}

int main() {
    std::vector<char> result; 
    std::string s1 = "El perro de San Roque no tiene rabo, porque Ramon Rodriguez se lo ha cortado";
    std::string s2 = "Saint Roque's dog has no tail, because Ramon Rodriguez chopped it off";

    //Expected: "S a i n t   R o q u e s d g h l , b c m r z p"

    result = strIntersect(s1, s2);
    for (auto c : result) {
         std::cout << c << " ";
    }
    std::cout << std::endl;

    return 0;
}

#10


-1  

can be easily done using the concept of "catching" which is a sub-algorithm of hashing.

可以使用“捕获”的概念轻松完成,“捕获”是散列的子算法。