Quantcast
Channel: AMD Developer Forums: Message List - Possible bug with atom_cmpxchg
Viewing all articles
Browse latest Browse all 7

Re: Possible bug with atom_cmpxchg

$
0
0

sh,

 

I agree it does look like a bug where the Tahiti compiler should be using  the exec mask but doesn't. This is easy to test because for Cayman the compiler does use the exec mask and the loop works correctly, looping 64 times (for a single wave), while on Tahiti code fails to loop. The code should execute the same on both platforms. For simplicity, I replaced the f(a) function and the code becomes

 

__kernel void test(__global int * ptr) {

    int value;  int new_value;


   do{

      value = atomic_add(ptr, 0);

     new_value=value+1;

    } while(value != atom_cmpxchg(ptr, value, new_value));

}

 

During each loop, the first thread to execute atomic_cmpxchg() changes the value of *ptr causing subsequent threads to fail atomic_cmpxchg() and to loop. The Cayman code uses the exec mask to track which threads are active/looping, and loops until all 64 bits of exec are set to 0.  The Tahiti code can only branch if vcc == 0 , which doesn't happen so the code never loops.

 

Another test. You can make the code work correctly on Tahiti by forcing the compiler to use the exec mask, by placing this bogus line just before the do loop,

 

ptr=ptr+(get_local_id(0)&0xff000000);

 

This prevents the compiler from knowing the value of  ptr so it will use the exec mask. In reality, ptr is not changed.

 

drallan

 

//Disassembly without ptr=ptr+(get_local_id(0)&0xff000000);
 s_buffer_load_dword  s0, s[8:11], 0x00                   
label_0001:
 s_waitcnt     0x0000                                      
 v_mov_b32     v0, s0                                        v_mov_b32     v1, 0                                       
 buffer_atomic_add  v1, v0, s[4:7], 0 offen glc            
 s_waitcnt     vmcnt(0)                                    
 v_add_i32     v2, vcc, 1, v1                              
 v_mov_b32     v3, v1                                      
 buffer_atomic_cmpswap  v[2:3], v0, s[4:7], 0 offen glc    
 s_waitcnt     vmcnt(0)                                    
 v_cmp_eq_i32  vcc, v1, v2                                 
 s_cbranch_vccz  label_0001                                
 s_branch      label_0010                                  
 s_branch      label_0001                                  
label_0010:
 s_endpgm 

 

//Disassembly with ptr=ptr+(get_local_id(0)&0xff000000);
label_000E:
 s_waitcnt     0x0000                                      
 v_mov_b32     v1, 0                                       
 buffer_atomic_add  v1, v0, s[4:7], 0 offen glc            
 s_waitcnt     vmcnt(0)                                    
 v_add_i32     v2, vcc, 1, v1                              
 v_mov_b32     v3, v1                                      
 buffer_atomic_cmpswap  v[2:3], v0, s[4:7], 0 offen glc    
 s_waitcnt     vmcnt(0)                                    
 v_cmp_eq_i32  s[8:9], v1, v2                              
 s_mov_b64     s[10:11], exec                              
 s_and_b64     exec, s[10:11], s[8:9]                      
 s_andn2_b64   s[2:3], s[2:3], exec                          s_cbranch_scc0 label_0021 
 s_mov_b64     exec, s[10:11]                              
 s_and_b64     exec, exec, s[2:3]                          
 s_branch      label_000E                                  
label_0021:  s_mov_b64 exec, s[0:1]                                
 s_endpgm

 


Viewing all articles
Browse latest Browse all 7

Trending Articles