1. 13 3月, 2015 1 次提交
    • P
      queue: fix QSLIST_INSERT_HEAD_ATOMIC race · 2120465f
      Paolo Bonzini 提交于
      There is a not-so-subtle race in QSLIST_INSERT_HEAD_ATOMIC.
      
      Because atomic_cmpxchg returns the old value instead of a success flag,
      QSLIST_INSERT_HEAD_ATOMIC was checking for success by comparing against
      the second argument to atomic_cmpxchg.  Unfortunately, this only works
      if the second argument is a local or thread-local variable.
      
      If it is in memory, it can be subject to common subexpression elimination
      (and then everything's fine) or reloaded after the atomic_cmpxchg,
      depending on the compiler's whims.  If the latter happens, the race can
      happen.  A thread can sneak in, doing something on elm->field.sle_next
      after the atomic_cmpxchg and before the comparison.  This causes a wrong
      failure, and then two threads are using "elm" at the same time.  In the
      case discovered by Christian, the sequence was likely something like this:
      
          thread 1                   | thread 2
          QSLIST_INSERT_HEAD_ATOMIC  |
            atomic_cmpxchg succeeds  |
            elm added to list        |
                                     | steal release_pool
                                     | QSLIST_REMOVE_HEAD
                                     | elm removed from list
                                     | ...
                                     | QSLIST_INSERT_HEAD_ATOMIC
                                     |   (overwrites sle_next)
            spurious failure         |
            atomic_cmpxchg succeeds  |
            elm added to list again  |
                                     |
          steal release_pool         |
          QSLIST_REMOVE_HEAD         |
          elm removed again          |
      
      The last three steps could be done by a third thread as well.
      A reproducer that failed in a matter of seconds is as follows:
      
      - the guest has 32 VCPUs on a 28 core host (hyperthreading was enabled),
        memory was 16G just to err on the safe side (the host has 64G, but hey
        at least you need no s390)
      
      - the guest has 24 null-aio virtio-blk devices using dataplane
        (-object iothread,id=ioN -drive if=none,id=blkN,driver=null-aio,size=500G
        -device virtio-blk-pci,iothread=ioN,drive=blkN)
      
      - the guest also has a single network interface.  It's only doing loopback
        tests so slirp vs. tap and the model doesn't matter.
      
      - the guest is running fio with the following script:
      
           [global]
           rw=randread
           blocksize=16k
           ioengine=libaio
           runtime=10m
           buffered=0
           fallocate=none
           time_based
           iodepth=32
      
           [virtio1a]
           filename=/dev/block/252\:16
      
           [virtio1b]
           filename=/dev/block/252\:16
      
           ...
      
           [virtio24a]
           filename=/dev/block/252\:384
      
           [virtio24b]
           filename=/dev/block/252\:384
      
           [listen1]
           protocol=tcp
           ioengine=net
           port=12345
           listen
           rw=read
           bs=4k
           size=1000g
      
           [connect1]
           protocol=tcp
           hostname=localhost
           ioengine=net
           port=12345
           protocol=tcp
           rw=write
           startdelay=1
           size=1000g
      
           ...
      
           [listen8]
           protocol=tcp
           ioengine=net
           port=12352
           listen
           rw=read
           bs=4k
           size=1000g
      
           [connect8]
           protocol=tcp
           hostname=localhost
           ioengine=net
           port=12352
           rw=write
           startdelay=1
           size=1000g
      
      Moral of the story: I should refrain from writing more clever stuff.
      At least it looks like it is not too clever to be undebuggable.
      Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-id: 1426002357-6889-1-git-send-email-pbonzini@redhat.com
      Fixes: c740ad92Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      2120465f
  2. 12 3月, 2015 31 次提交
  3. 11 3月, 2015 8 次提交