问如何在C++中加速矩阵乘法？
EN

Stack Overflow用户

提问于 2018-04-12 00:19:05

回答 2查看 0关注 0票数 0

我用这个简单的算法执行矩阵乘法。为了更灵活，我使用对象作为矩阵，其中包含动态创建的数组。

这个解决方案与我的第一个静态数组相比要慢4倍。我能做些什么来加快数据访问速度？我不想改变算法。

 matrix mult_std(matrix a, matrix b) {
 matrix c(a.dim(), false, false);
 for (int i = 0; i < a.dim(); i++)
  for (int j = 0; j < a.dim(); j++) {
   int sum = 0;
   for (int k = 0; k < a.dim(); k++)
    sum += a(i,k) * b(k,j);
   c(i,j) = sum;
  }

 return c;
}

编辑

我在下面添加了完整的源代码，并尝试了一些建议：

互换k和j循环迭代->性能改进
申报dim()和operator()()如inline->业绩改善
通过Const引用传递参数->性能损失！为什么？所以我不用它。

现在的表演几乎和以前的色情剧一样。也许应该有更多的改进。

但是我有另一个问题：函数中有一个内存错误。mult_strassen(...)。为什么?

terminate called after throwing an instance of 'std::bad_alloc'

编辑

以下是一些结果。比较了标准算法(STD)、j和k循环的交换阶数(SWAP)和块大小为13(块)的阻塞算法。

回答 2

Stack Overflow用户

发布于 2018-04-12 09:06:08

通过Const引用传递参数，以便以以下方式开始：

matrix mult_std(matrix const& a, matrix const& b) {

为了给你更多的细节，我们需要知道其他方法使用的细节。

要回答为什么原来的方法要快4倍，我们需要看原来的方法。

这个问题无疑是你的，因为这个问题已经解决了一百万次了。

同样的，当问这种问题的时候总向可编译的源提供适当的输入，这样我们实际上可以构建并运行代码看看发生了什么。

如果没有代码，我们只是在猜测。

编辑

修复了原始C代码中的主要错误(缓冲区溢出)

在一个公平的比较中，我更新了代码以并行运行测试：

 // INCLUDES -------------------------------------------------------------------
 #include <stdlib.h>
 #include <stdio.h>
 #include <sys/time.h>
 #include <time.h>

 // DEFINES -------------------------------------------------------------------
 // The original problem was here. The MAXDIM was 500. But we were using arrays
 // that had a size of 512 in each dimension. This caused a buffer overrun that
 // the dim variable and caused it to be reset to 0. The result of this was causing
 // the multiplication loop to fall out before it had finished (as the loop was
 // controlled by this global variable.
 //
 // Everything now uses the MAXDIM variable directly.
 // This of course gives the C code an advantage as the compiler can optimize the
 // loop explicitly for the fixed size arrays and thus unroll loops more efficiently.
 #define MAXDIM 512
 #define RUNS 10

 // MATRIX FUNCTIONS ----------------------------------------------------------
 class matrix
 {
 public:
 matrix(int dim)
       : dim_(dim)
 {
         data_ = new int[dim_ * dim_];

 }

     inline int dim() const {
                         return dim_;
                 }
                 inline int& operator()(unsigned row, unsigned col) {
                         return data_[dim_*row + col];
                 }

                 inline int operator()(unsigned row, unsigned col) const {
                         return data_[dim_*row + col];
                 }

 private:
     int dim_;
     int* data_;
 };

// ---------------------------------------------------
 void random_matrix(int (&matrix)[MAXDIM][MAXDIM]) {
         for (int r = 0; r < MAXDIM; r++)
                 for (int c = 0; c < MAXDIM; c++)
                         matrix[r][c] = rand() % 100;
 }
 void random_matrix_class(matrix& matrix) {
         for (int r = 0; r < matrix.dim(); r++)
                 for (int c = 0; c < matrix.dim(); c++)
                         matrix(r, c) = rand() % 100;
 }

 template<typename T, typename M>
 float run(T f, M const& a, M const& b, M& c)
 {
         float time = 0;

         for (int i = 0; i < RUNS; i++) {
                 struct timeval start, end;
                 gettimeofday(&start, NULL);
                 f(a,b,c);
                 gettimeofday(&end, NULL);

                 long s = start.tv_sec * 1000 + start.tv_usec / 1000;
                 long e = end.tv_sec * 1000 + end.tv_usec / 1000;

                 time += e - s;
         }
         return time / RUNS;
 }
 // SEQ MULTIPLICATION ----------------------------------------------------------
  int* mult_seq(int const(&a)[MAXDIM][MAXDIM], int const(&b)[MAXDIM][MAXDIM], int (&z)[MAXDIM][MAXDIM]) {
          for (int r = 0; r < MAXDIM; r++) {
                  for (int c = 0; c < MAXDIM; c++) {
                          z[r][c] = 0;
                          for (int i = 0; i < MAXDIM; i++)
                                  z[r][c] += a[r][i] * b[i][c];
                  }
          }
  }
  void mult_std(matrix const& a, matrix const& b, matrix& z) {
          for (int r = 0; r < a.dim(); r++) {
                  for (int c = 0; c < a.dim(); c++) {
                          z(r,c) = 0;
                          for (int i = 0; i < a.dim(); i++)
                                  z(r,c) += a(r,i) * b(i,c);
                  }
          }
  }

  // MAIN ------------------------------------------------------------------------
  using namespace std;
  int main(int argc, char* argv[]) {
          srand(time(NULL));

          int matrix_a[MAXDIM][MAXDIM];
          int matrix_b[MAXDIM][MAXDIM];
          int matrix_c[MAXDIM][MAXDIM];
          random_matrix(matrix_a);
          random_matrix(matrix_b);
          printf("%d ", MAXDIM);
          printf("%f \n", run(mult_seq, matrix_a, matrix_b, matrix_c));

          matrix a(MAXDIM);
          matrix b(MAXDIM);
          matrix c(MAXDIM);
          random_matrix_class(a);
          random_matrix_class(b);
          printf("%d ", MAXDIM);
          printf("%f \n", run(mult_std, a, b, c));

          return 0;
  }

现在的结果是：

$ g++ t1.cpp
$ ./a.exe
512 1270.900000
512 3308.800000

$ g++ -O3 t1.cpp
$ ./a.exe
512 284.900000
512 622.000000

从这一点我们可以看到，当完全优化C++代码时，C代码的速度大约是C++代码的两倍。我在代码中找不到原因。

票数 0

Stack Overflow用户

发布于 2018-04-12 09:46:59

说到加速，如果交换k和j循环迭代：

matrix mult_std(matrix a, matrix b) {
 matrix c(a.dim(), false, false);
 for (int i = 0; i < a.dim(); i++)
  for (int k = 0; k < a.dim(); k++)
   for (int j = 0; j < a.dim(); j++)  // swapped order
    c(i,j) += a(i,k) * b(k,j);

 return c;
}

那是因为k内最内循环上的索引将导致缓存丢失。b每次迭代。带着j作为最内部的索引，两者c和b是连续访问的，而a待在原地。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/-100008075

复制

相似问题

问如何在C++中加速矩阵乘法？
EN

回答 2

Stack Overflow用户

编辑

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在C++中加速矩阵乘法？EN

回答 2

Stack Overflow用户

编辑

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在C++中加速矩阵乘法？
EN