sh,
I agree it does look like a bug where the Tahiti compiler should be using the exec mask but doesn't. This is easy to test because for Cayman the compiler does use the exec mask and the loop works correctly, looping 64 times (for a single wave), while on Tahiti code fails to loop. The code should execute the same on both platforms. For simplicity, I replaced the f(a) function and the code becomes
__kernel void test(__global int * ptr) {
int value; int new_value;
do{
value = atomic_add(ptr, 0);
new_value=value+1;
} while(value != atom_cmpxchg(ptr, value, new_value));
}
During each loop, the first thread to execute atomic_cmpxchg() changes the value of *ptr causing subsequent threads to fail atomic_cmpxchg() and to loop. The Cayman code uses the exec mask to track which threads are active/looping, and loops until all 64 bits of exec are set to 0. The Tahiti code can only branch if vcc == 0 , which doesn't happen so the code never loops.
Another test. You can make the code work correctly on Tahiti by forcing the compiler to use the exec mask, by placing this bogus line just before the do loop,
ptr=ptr+(get_local_id(0)&0xff000000);
This prevents the compiler from knowing the value of ptr so it will use the exec mask. In reality, ptr is not changed.
drallan
//Disassembly without ptr=ptr+(get_local_id(0)&0xff000000); s_buffer_load_dword s0, s[8:11], 0x00 label_0001: s_waitcnt 0x0000 v_mov_b32 v0, s0 v_mov_b32 v1, 0 buffer_atomic_add v1, v0, s[4:7], 0 offen glc s_waitcnt vmcnt(0) v_add_i32 v2, vcc, 1, v1 v_mov_b32 v3, v1 buffer_atomic_cmpswap v[2:3], v0, s[4:7], 0 offen glc s_waitcnt vmcnt(0) v_cmp_eq_i32 vcc, v1, v2 s_cbranch_vccz label_0001 s_branch label_0010 s_branch label_0001 label_0010: s_endpgm
//Disassembly with ptr=ptr+(get_local_id(0)&0xff000000); label_000E: s_waitcnt 0x0000 v_mov_b32 v1, 0 buffer_atomic_add v1, v0, s[4:7], 0 offen glc s_waitcnt vmcnt(0) v_add_i32 v2, vcc, 1, v1 v_mov_b32 v3, v1 buffer_atomic_cmpswap v[2:3], v0, s[4:7], 0 offen glc s_waitcnt vmcnt(0) v_cmp_eq_i32 s[8:9], v1, v2 s_mov_b64 s[10:11], exec s_and_b64 exec, s[10:11], s[8:9] s_andn2_b64 s[2:3], s[2:3], exec s_cbranch_scc0 label_0021 s_mov_b64 exec, s[10:11] s_and_b64 exec, exec, s[2:3] s_branch label_000E label_0021: s_mov_b64 exec, s[0:1] s_endpgm