Is it actually compiling wrongly or is it just caching the read of value and hence always giving you the wrong behaviour? Both of the reads are not globally coherent, so they may read from L1. I don't know what the behaviour of the atomic operation is with respect to L1, but at the very least if one lane updates it atomically that is no guarantee that other lanes will pick that value up until after their atomic read.
Does it fix the behaviour to make value = *ptr an atomic read? Replace that line with "value = atomic_add(ptr, 0)".
It does look to not be updating the masks but it may be that that's a side effect of assumptions it knows it can make about the read operation.