使用SSE计算汉明距离到几个字符串

时间:2022-04-25 19:14:13

I have n (8 bit) character strings all of them of the same length (say m), and another string s of the same length. I need to compute Hamming distances from s to each of the others strings. In plain C, something like:

我有n(8位)字符串,它们全部长度相同(比如m),另一个字符串s长度相同。我需要计算从s到每个其他字符串的汉明距离。在普通C中,类似于:

unsigned char strings[n][m];
unsigned char s[m];
int distances[n];

for(i=0; i<n; i++) {
  int distances[i] = 0;
  for(j=0; j<m; j++) {
    if(strings[i][j] != s[j])
      distances[i]++;
  }
}

I would like to use SIMD instructions with gcc to perform such computations more efficiently. I have read that PcmpIstrI in SSE 4.2 can be useful and my target computer supports that instruction set, so I would prefer a solution using SSE 4.2.

我想使用带有gcc的SIMD指令来更有效地执行这样的计算。我已经读过SSE 4.2中的PcmpIstrI可能很有用,而我的目标计算机支持该指令集,所以我更喜欢使用SSE 4.2的解决方案。

EDIT:

编辑:

I wrote following function to compute Hamming distance between two strings:

我编写了以下函数来计算两个字符串之间的汉明距离:

static inline int popcnt128(__m128i n) {
  const __m128i n_hi = _mm_unpackhi_epi64(n, n);
  return _mm_popcnt_u64(_mm_cvtsi128_si64(n)) + _mm_popcnt_u64(_mm_cvtsi128_si64(n_hi));
}

int HammingDist(const unsigned char *p1, unsigned const char *p2, const int len) {
#define MODE (_SIDD_UBYTE_OPS | _SIDD_CMP_EQUAL_EACH | _SIDD_BIT_MASK | _SIDD_NEGATIVE_POLARITY)
  __m128i smm1 = _mm_loadu_si128 ((__m128i*) p1);
  __m128i smm2 = _mm_loadu_si128 ((__m128i*) p2);
  __m128i ResultMask;

  int iters = len / 16;
  int diffs = 0;
  int i;

  for(i=0; i<iters; i++) {
    ResultMask = _mm_cmpestrm (smm1,16,smm2,16,MODE); 

    diffs += popcnt128(ResultMask);
    p1 = p1+16;
    p2 = p2+16;
    smm1 = _mm_loadu_si128 ((__m128i*)p1);
    smm2 =_mm_loadu_si128 ((__m128i*)p2);
  }

  int mod = len % 16;
  if(mod>0) {
     ResultMask = _mm_cmpestrm (smm1,mod,smm2,mod,MODE); 
     diffs += popcnt128(ResultMask);
  }

  return diffs;
} 

So I can solve my problem by means of:

所以我可以通过以下方式解决我的问题:

for(i=0; i<n; i++) {
  int distances[i] = HammingDist(s, strings[i], m);
}

Is this the best I can do or can I use the fact that one of the strings compared is always the same? In addition, should I do some alignment on my arrays to improve performance?

这是我能做的最好的,还是我可以使用其中一个字符串总是相同的事实?另外,我应该在阵列上做一些对齐以提高性能吗?

ANOTHER ATTEMPT

另一个尝试

Following Harold's recomendation, I have written following code:

按照Harold的推荐,我写了以下代码:

void _SSE_hammingDistances(const ByteP str, const ByteP strings, int *ds, const int n, const int m) {
    int iters = m / 16;

    __m128i *smm1, *smm2, diffs;

    for(int j=0; j<n; j++) {
        smm1 = (__m128i*)  str;
        smm2 = (__m128i*)  &strings[j*(m+1)]; // m+1, as strings are '\0' terminated

        diffs =  _mm_setzero_si128();

        for (int i = 0; i < iters; i++) {
            diffs = _mm_add_epi8(diffs, _mm_cmpeq_epi8(*smm1, *smm2));
            smm1 += 1;
            smm2 += 1;
        }

        int s = m;
        signed char *ptr = (signed char *) &diffs;
        for(int p=0; p<16; p++) {
            s += *ptr;
            ptr++;
        }

        *ds = s;
        ds++;
    }
}

but I am not able to do the final addition of bytes in __m128i by using psadbw. Can anyone please help me with that?

但是我无法通过使用psadbw在__m128i中最后添加字节。谁能帮助我呢?

1 个解决方案

#1


2  

Here's an improved version of your latest routine, which uses PSADBW (_mm_sad_epu8) to eliminate the scalar code:

这是最新例程的改进版本,它使用PSADBW(_mm_sad_epu8)来消除标量代码:

void hammingDistances_SSE(const uint8_t * str, const uint8_t * strings, int * const ds, const int n, const int m)
{
    const int iters = m / 16;

    const __m128i smm1 = _mm_loadu_si128((__m128i*)str);

    assert((m & 15) == 0);      // m must be a multiple of 16

    for (int j = 0; j < n; j++)
    {
        __m128i smm2 = _mm_loadu_si128((__m128i*)&strings[j*(m+1)]); // m+1, as strings are '\0' terminated

        __m128i diffs = _mm_setzero_si128();

        for (int i = 0; i < iters; i++)
        {
            diffs = _mm_sub_epi8(diffs, _mm_cmpeq_epi8(smm1, smm2));
        }

        diffs = _mm_sad_epu8(diffs, _mm_setzero_si128());
        ds[j] = m - (_mm_extract_epi16(diffs, 0) + _mm_extract_epi16(diffs, 4));
    }
}

#1


2  

Here's an improved version of your latest routine, which uses PSADBW (_mm_sad_epu8) to eliminate the scalar code:

这是最新例程的改进版本,它使用PSADBW(_mm_sad_epu8)来消除标量代码:

void hammingDistances_SSE(const uint8_t * str, const uint8_t * strings, int * const ds, const int n, const int m)
{
    const int iters = m / 16;

    const __m128i smm1 = _mm_loadu_si128((__m128i*)str);

    assert((m & 15) == 0);      // m must be a multiple of 16

    for (int j = 0; j < n; j++)
    {
        __m128i smm2 = _mm_loadu_si128((__m128i*)&strings[j*(m+1)]); // m+1, as strings are '\0' terminated

        __m128i diffs = _mm_setzero_si128();

        for (int i = 0; i < iters; i++)
        {
            diffs = _mm_sub_epi8(diffs, _mm_cmpeq_epi8(smm1, smm2));
        }

        diffs = _mm_sad_epu8(diffs, _mm_setzero_si128());
        ds[j] = m - (_mm_extract_epi16(diffs, 0) + _mm_extract_epi16(diffs, 4));
    }
}