文章/答案/技术大牛

发布

社区首页 >问答首页 >使用SIMD基于另一个向量位值计算值的乘积

问使用SIMD基于另一个向量位值计算值的乘积
EN

Stack Overflow用户

提问于 2019-11-08 01:17:32

回答 1查看 119关注 0票数 2

我有两个向量。大小为N的双精度a的向量和大小为ceil(N/8)的无符号字符的向量b。目标是计算a的一些值的乘积。b将逐位读取，其中每一位表示是否在产品中考虑来自a的给定double。

  // Let's create some data      
  unsigned nbBits  = 1e7;
  unsigned nbBytes = nbBits / 8;
  unsigned char nbBitsInLastByte = nbBits % 8;
  assert(nbBits == nbBytes * 8 + nbBitsInLastByte);
  std::vector<double> a(nbBits, 0.999999);   // In practice a values will vary. It is just an easy to build example I am showing here
  std::vector<unsigned char> b(nbBytes, false); // I am not using `vector<bool>` nor `bitset`. I've got my reasons!
  assert(a.size() == b.size() * 8);

  // Set a few bits to true
  for (unsigned byte = 0 ; byte < (nbBytes-1) ; byte+=2)
  {
    b[byte] |= 1 << 2; // set second (zero-based counting) bit to 'true'
    b[byte] |= 1 << 7; // set last bit to 'true'
                //  ^ This is the bit index
  }

如上所述，我的目标是在b为true时计算a中值的乘积。它可以通过以下方式实现

  // Initialize the variable we want to compute
  double product = 1.0;

  // Product for the first nbByts-1 bytes
  for (unsigned byte = 0 ; byte < (nbBytes-1) ; ++byte)
  {
    for (unsigned bit = 0 ; bit < 8 ; ++bit) // inner loop could be manually unrolled
    {
      if((b[byte] >> bit) & 1) // gets the bit value
        product *= a[byte*8+bit];
    }
  }

  // Product for the last byte
  for (unsigned bit = 0 ; bit < nbBitsInLastByte ; ++bit)
  {
    if((b[nbBytes-1] >> bit) & 1) // gets the bit value
      product *= a[(nbBytes-1)*8+bit];
  }

这个乘积计算是我代码中速度最慢的部分。我想知道显式向量化(SIMD)这个过程在这里是否有帮助。我一直在看“xmmintrin.h”中提供的函数，但我对SIMD了解不多，也找不到对我有帮助的东西。你能帮帮我吗?

performance

sse

simd

c++

回答 1

Stack Overflow用户

发布于 2019-11-21 23:53:00

如果你不关心乘法顺序，这很简单。关键是来自SSE4.1集合的_mm_blendv_pd指令。这使您可以完全实现无分支。

// Load 2 double values from source pointer, and conditionally multiply with the product.
// Returns the new product.
template<int startIdx>
inline __m128d product2( const double* pSource, __m128i mask, __m128d oldProduct )
{
    // Multiply values unconditionally
    const __m128d source = _mm_loadu_pd( pSource + startIdx );
    const __m128d newProduct = _mm_mul_pd( source, oldProduct );

    // We only calling product2 with 4 different template arguments.
    // There are 16 vector registers in total, enough for all 4 different `maskAndBits` values.
    constexpr int64_t bit1 = 1 << startIdx;
    constexpr int64_t bit2 = 1 << ( startIdx + 1 );
    const __m128i maskAndBits = _mm_setr_epi64x( bit1, bit2 );
    mask = _mm_and_si128( mask, maskAndBits );

    // NAN if the mask is 0 after the above AND i.e. the bit was not set, 0.0 if the bit was set
    const __m128d maskDouble = _mm_castsi128_pd( _mm_cmpeq_epi64( mask, _mm_setzero_si128() ) );

    // This instruction actually does the masking, it's from SSE 4.1
    return _mm_blendv_pd( newProduct, oldProduct, maskDouble );
}

double conditionalProducts( const double* ptr, const uint8_t* masks, size_t size )
{
    // Round down the size of your input vector, and multiply last couple values the old way.
    assert( 0 == size % 8 );
    __m128d prod = _mm_set1_pd( 1.0 );
    const double* const end = ptr + size;
    while( ptr < end )
    {
        // Broadcast the mask byte into 64-bit integer lanes
        const __m128i mask = _mm_set1_epi64x( *masks );
        // Compute the conditional products of 8 values
        prod = product2<0>( ptr, mask, prod );
        prod = product2<2>( ptr, mask, prod );
        prod = product2<4>( ptr, mask, prod );
        prod = product2<6>( ptr, mask, prod );
        // Advance the pointers
        ptr += 8;
        masks++;
    }
    // Multiply two lanes together
    prod = _mm_mul_sd( prod, _mm_shuffle_pd( prod, prod, 0b11 ) );
    return _mm_cvtsd_f64( prod );
}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58753959

复制

相似问题

问使用SIMD基于另一个向量位值计算值的乘积
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用SIMD基于另一个向量位值计算值的乘积EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用SIMD基于另一个向量位值计算值的乘积
EN