我写了一个简单的ticket lock的简单实现。锁定部分如下所示:
struct ticket {
uint16_t next_ticket;
uint16_t now_serving;
};
void lock(ticket* tkt) {
const uint16_t my_ticket =
__sync_fetch_and_add(&tkt->next_ticket, 1);
while (tkt->now_serving != my_ticket) {
_mm_pause();
__asm__ __volatile__("":::"memory");
}
}
然后我意识到,我可以用std::atomic
s来写这段代码,而不是使用gcc的内在函数:
struct atom_ticket {
std::atomic<uint16_t> next_ticket;
std::atomic<uint16_t> now_serving;
};
void lock(atom_ticket* tkt) {
const uint16_t my_ticket =
tkt->next_ticket.fetch_add(1, std::memory_order_relaxed);
while (tkt->now_serving.load(std::memory_order_relaxed) != my_ticket) {
_mm_pause();
}
}
它们会生成几乎相同的汇编,但后者会生成额外的movzwl
指令。为什么会有这个额外的mov
?有没有更好、更正确的方法来编写lock()
?
使用-march=native -O3
的程序集输出
0000000000000000 <lock(ticket*)>:
0: b8 01 00 00 00 mov $0x1,%eax
5: 66 f0 0f c1 07 lock xadd %ax,(%rdi)
a: 66 39 47 02 cmp %ax,0x2(%rdi)
e: 74 08 je 18 <lock(ticket*)+0x18>
10: f3 90 pause
12: 66 39 47 02 cmp %ax,0x2(%rdi)
16: 75 f8 jne 10 <lock(ticket*)+0x10>
18: f3 c3 repz retq
1a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000000020 <lock(atom_ticket*)>:
20: ba 01 00 00 00 mov $0x1,%edx
25: 66 f0 0f c1 17 lock xadd %dx,(%rdi)
2a: 48 83 c7 02 add $0x2,%rdi
2e: eb 02 jmp 32 <lock(atom_ticket*)+0x12>
30: f3 90 pause
=> 32: 0f b7 07 movzwl (%rdi),%eax <== ???
35: 66 39 c2 cmp %ax,%dx
38: 75 f6 jne 30 <lock(atom_ticket*)+0x10>
3a: f3 c3 repz retq
为什么不直接使用cmp (%rdi),%dx
呢?
发布于 2015-11-17 10:47:52
在第一个中
12: 66 39 47 02 cmp %ax,0x2(%rdi)
cmp是mov和cmp指令的组合(很可能在微体系结构指令集中生成两条指令)
原子变体使用以下命令对now_serving执行单独的读取
32: 0f b7 07 movzwl (%rdi),%eax
然后,是否与
35: 66 39 c2 cmp %ax,%dx
https://stackoverflow.com/questions/33284236
复制相似问题