此开发是在Windows上以用户模式进行的。
我有两个(可能相当大的)缓冲区,我想知道它们之间的字节数不同。
我自己写的,只是逐字节检查,但这导致了一个相当慢的实现。当我比较几百兆字节的时候,这是不可取的。我知道我可以通过很多不同的方法来优化它,但这似乎是一个常见的问题,可能已经有了优化的解决方案,我不可能像优化专家那样有效地优化它。
也许我的谷歌搜索不足,但我找不到任何其他C或C++函数可以计算两个缓冲区之间的不同字节数。对于C标准库、WinAPI或C++标准库是否有我不知道的内置函数?或者我需要手动优化吗?
发布于 2022-04-01 02:38:10
最后,我编写了这个(可能有点糟糕)的优化代码来完成我的工作。我希望它能在引擎盖下将其矢量化,但这似乎并没有不幸地发生,而且我也不想在SIMD的本质上进行手工挖掘。因此,我的一些花招最终可能会使它变慢,但速度仍然足够快,仅占我代码运行时的4% (几乎所有这些都是memcmp)。不管会不会更好,这对我来说已经足够了。
我将注意到,这是为我的用例快速设计的,在这里,我只期望有一些罕见的差异。
inline size_t ComputeDifferenceSmall(
_In_reads_bytes_(size) char* buf1,
_In_reads_bytes_(size) char* buf2,
size_t size) {
/* size should be <= 0x1000 bytes */
/* In my case, I expect frequent differences if any at all are present. */
size_t res = 0;
for (size_t i = 0; i < (size & ~0xF); i += 0x10) {
uint64_t diff1 = *reinterpret_cast<uint64_t*>(buf1) ^
*reinterpret_cast<uint64_t*>(buf2);
if (!diff1) continue;
/* Bit fiddle to make each byte 1 if they're different and 0 if the same */
diff1 = ((diff1 & 0xF0F0F0F0F0F0F0F0ULL) >> 4) | (diff1 & 0x0F0F0F0F0F0F0F0FULL);
diff1 = ((diff1 & 0x0C0C0C0C0C0C0C0CULL) >> 2) | (diff1 & 0x0303030303030303ULL);
diff1 = ((diff1 & 0x0202020202020202ULL) >> 1) | (diff1 & 0x0101010101010101ULL);
/* Sum the bytes */
diff1 = (diff1 >> 32) + (diff1 & 0xFFFFFFFFULL);
diff1 = (diff1 >> 16) + (diff1 & 0xFFFFULL);
diff1 = (diff1 >> 8) + (diff1 & 0xFFULL);
diff1 = (diff1 >> 4) + (diff1 & 0xFULL);
res += diff1;
}
for (size_t i = (size & ~0xF); i < size; i++) {
res += (buf1[i] != buf2[i]);
}
return res;
}
size_t ComputeDifference(
_In_reads_bytes_(size) char* buf1,
_In_reads_bytes_(size) char* buf2,
size_t size) {
size_t res = 0;
/* I expect most pages to be identical, and both buffers should be page aligned if
* larger than a page. memcmp has more optimizations than I'll ever come up with,
* so I can just use that to determine if I need to check for differences
* in the page. */
for (size_t pn = 0; pn < (size & ~0xFFF); pn += 0x1000) {
if (memcmp(&buf1[pn], &buf2[pn], 0x1000)) {
res += ComputeDifferenceSmall(&buf1[pn], &buf2[pn], 0x1000);
}
}
return res + ComputeDifferenceSmall(
&buf1[size & ~0xFFF], &buf2[size & ~0xFFF], size & 0xFFF);
}https://stackoverflow.com/questions/71700685
复制相似问题