文章/答案/技术大牛

发布

社区首页 >问答首页 >C语言中的SHA256性能优化

问C语言中的SHA256性能优化
EN

Stack Overflow用户

提问于 2013-08-31 08:40:07

回答 4查看 20.9K关注 0票数 17

我需要经常散列一个大型的值数据库。因此，需要快速实现SHA-2 hasher .我目前正在使用SHA256。

我现在使用的sha256_transform算法是：C (下面的代码)

我分析了我的代码，这段代码段在每次散列中花费了96%的计算时间，这使得这个函数对我的目标至关重要。

它对一个名为data[]的64字节长二进制字符串进行操作，并在ctx->state中输出结果。

我要求更快的版本这个功能。请记住，即使是轻微的修改也会对速度产生负面影响。

#define uchar unsigned char
#define uint unsigned int

#define ROTLEFT(a,b) (((a) << (b)) | ((a) >> (32-(b))))
#define ROTRIGHT(a,b) (((a) >> (b)) | ((a) << (32-(b))))

#define CH(x,y,z) (((x) & (y)) ^ (~(x) & (z)))
#define MAJ(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z)))
#define EP0(x) (ROTRIGHT(x,2) ^ ROTRIGHT(x,13) ^ ROTRIGHT(x,22))
#define EP1(x) (ROTRIGHT(x,6) ^ ROTRIGHT(x,11) ^ ROTRIGHT(x,25))
#define SIG0(x) (ROTRIGHT(x,7) ^ ROTRIGHT(x,18) ^ ((x) >> 3))
#define SIG1(x) (ROTRIGHT(x,17) ^ ROTRIGHT(x,19) ^ ((x) >> 10))

void sha256_transform(SHA256_CTX *ctx, uchar data[]) {
    uint a,b,c,d,e,f,g,h,i,j,t1,t2,m[64];

    a = ctx->state[0];
    b = ctx->state[1];
    c = ctx->state[2];
    d = ctx->state[3];
    e = ctx->state[4];
    f = ctx->state[5];
    g = ctx->state[6];
    h = ctx->state[7];

    for (i=0,j=0; i < 16; i++, j += 4)
        m[i] = (data[j] << 24) | (data[j+1] << 16) | (data[j+2] << 8) | (data[j+3]);

    for ( ; i < 64; i++)
        m[i] = SIG1(m[i-2]) + m[i-7] + SIG0(m[i-15]) + m[i-16];

    for (i = 0; i < 64; ++i) {
        t1 = h + EP1(e) + CH(e,f,g) + k[i] + m[i];
        t2 = EP0(a) + MAJ(a,b,c);
        h = g;
        g = f;
        f = e;
        e = d + t1;
        d = c;
        c = b;
        b = a;
        a = t1 + t2;
    }

    ctx->state[0] += a;
    ctx->state[1] += b;
    ctx->state[2] += c;
    ctx->state[3] += d;
    ctx->state[4] += e;
    ctx->state[5] += f;
    ctx->state[6] += g;
    ctx->state[7] += h;
}

sha256

optimization

Stack Overflow用户

发布于 2015-01-26 20:30:55

更新2

你真的应该使用Intel的ISA-L_ crypto，这是Intel的密码的参考库。最初的帖子链接到Intel的旧参考代码，它被吸收到ISA-L_crypto中。

使用下面的示例，我的膝上型计算机每个核心获得~4 GB/s：

$ git clone http://github.com/01org/isa-l_crypto
$ cd isa-l_crypto
$ ./autogen.sh && ./configure
$ make -j 16
$ cd sha256_mb
$ gcc sha256_mb_vs_ossl_perf.c -march=native -O3 -Wall -I../include ../.libs/libisal_crypto.a -lcrypto
$ ./a.out
sha256_openssl_cold: runtime =     511833 usecs, bandwidth 640 MB in 0.5118 sec = 1311.15 MB/s
multibinary_sha256_cold: runtime =     172098 usecs, bandwidth 640 MB in 0.1721 sec = 3899.46 MB/s
Multi-buffer sha256 test complete 32 buffers of 1048576 B with 20 iterations
 multibinary_sha256_ossl_perf: Pass

原始邮政

这是Intel参考实现：

v2.zip

守则载于：

http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/sha-256-implementations-paper.html

我在一个基于haswell的Xeon微处理器(E5-2650 v3)上获得了大约350 MB/s。它是在装配和利用英特尔AES-NI实现的.

旧版更新：

SHA的最新Intel参考实现(现在是ISA-L_crypto的一部分)位于：

票数 7

查看全部 4 条回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/18546244

复制

相似问题

问C语言中的SHA256性能优化
EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问C语言中的SHA256性能优化EN

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问C语言中的SHA256性能优化
EN