1. 08 4月, 2014 17 次提交
  2. 30 3月, 2014 1 次提交
    • A
      rbd: drop an unsafe assertion · 638c323c
      Alex Elder 提交于
      Olivier Bonvalet reported having repeated crashes due to a failed
      assertion he was hitting in rbd_img_obj_callback():
      
          Assertion failure in rbd_img_obj_callback() at line 2165:
      	rbd_assert(which >= img_request->next_completion);
      
      With a lot of help from Olivier with reproducing the problem
      we were able to determine the object and image requests had
      already been completed (and often freed) at the point the
      assertion failed.
      
      There was a great deal of discussion on the ceph-devel mailing list
      about this.  The problem only arose when there were two (or more)
      object requests in an image request, and the problem was always
      seen when the second request was being completed.
      
      The problem is due to a race in the window between setting the
      "done" flag on an object request and checking the image request's
      next completion value.  When the first object request completes, it
      checks to see if its successor request is marked "done", and if
      so, that request is also completed.  In the process, the image
      request's next_completion value is updated to reflect that both
      the first and second requests are completed.  By the time the
      second request is able to check the next_completion value, it
      has been set to a value *greater* than its own "which" value,
      which caused an assertion to fail.
      
      Fix this problem by skipping over any completion processing
      unless the completing object request is the next one expected.
      Test only for inequality (not >=), and eliminate the bad
      assertion.
      Tested-by: NOlivier Bonvalet <ob@daevel.fr>
      Signed-off-by: NAlex Elder <elder@linaro.org>
      Reviewed-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NIlya Dryomov <ilya.dryomov@inktank.com>
      638c323c
  3. 24 3月, 2014 1 次提交
    • R
      virtio-blk: base queue-depth on virtqueue ringsize or module param · fc4324b4
      Rusty Russell 提交于
      Venkatash spake thus:
      
        virtio-blk set the default queue depth to 64 requests, which was
        insufficient for high-IOPS devices. Instead set the blk-queue depth to
        the device's virtqueue depth divided by two (each I/O requires at least
        two VQ entries).
      
      But behold, Ted added a module parameter:
      
        Also allow the queue depth to be something which can be set at module
        load time or via a kernel boot-time parameter, for
        testing/benchmarking purposes.
      
      And I rewrote it substantially, mainly to take
      VIRTIO_RING_F_INDIRECT_DESC into account.
      
      As QEMU sets the vq size for PCI to 128, Venkatash's patch wouldn't
      have made a change.  This version does (since QEMU also offers
      VIRTIO_RING_F_INDIRECT_DESC.
      Inspired-by: N"Theodore Ts'o" <tytso@mit.edu>
      Based-on-the-true-story-of: Venkatesh Srinivas <venkateshs@google.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: virtio-dev@lists.oasis-open.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: Frank Swiderski <fes@google.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      fc4324b4
  4. 15 3月, 2014 1 次提交
    • J
      blk-mq: allow blk_mq_init_commands() to return failure · 95363efd
      Jens Axboe 提交于
      If drivers do dynamic allocation in the hardware command init
      path, then we need to be able to handle and return failures.
      
      And if they do allocations or mappings in the init command path,
      then we need a cleanup function to free up that space at exit
      time. So add blk_mq_free_commands() as the cleanup function.
      
      This is required for the mtip32xx driver conversion to blk-mq.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      95363efd
  5. 14 3月, 2014 9 次提交
    • S
      mtip32xx: mtip_async_complete() bug fixes · 5eb9291c
      Sam Bradshaw 提交于
      This patch fixes 2 issues in the fast completion path:
      1) Possible double completions / double dma_unmap_sg() calls due to lack
      of atomicity in the check and subsequent dereference of the upper layer
      callback function. Fixed with cmpxchg before unmap and callback.
      2) Regression in unaligned IO constraining workaround for p420m devices.
      Fixed by checking if IO is unaligned and using proper semaphore if so.
      Signed-off-by: NSam Bradshaw <sbradshaw@micron.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5eb9291c
    • F
      mtip32xx: Unmap the DMA segments before completing the IO request · 368c89d7
      Felipe Franciosi 提交于
      If the buffers are unmapped after completing a request, then stale data
      might be in the request.
      Signed-off-by: NFelipe Franciosi <felipe@paradoxo.org>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      368c89d7
    • F
      mtip32xx: Set queue bounce limit · 1044b1bb
      Felipe Franciosi 提交于
      We need to set the queue bounce limit during the device initialization to
      prevent excessive bouncing on 32 bit architectures.
      Signed-off-by: NFelipe Franciosi <felipe@paradoxo.org>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1044b1bb
    • A
      nvme: Use pci_enable_msi_range() and pci_enable_msix_range() · be577fab
      Alexander Gordeev 提交于
      As result of deprecation of MSI-X/MSI enablement functions
      pci_enable_msix() and pci_enable_msi_block() all drivers
      using these two interfaces need to be updated to use the
      new pci_enable_msi_range()  or pci_enable_msi_exact()
      and pci_enable_msix_range() or pci_enable_msix_exact()
      interfaces.
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      be577fab
    • A
      cciss: Fallback to MSI rather than to INTx if MSI-X failed · 371ff93a
      Alexander Gordeev 提交于
      Currently the driver falls back to INTx mode when MSI-X
      initialization failed. This is a suboptimal behaviour
      for chips that also support MSI. This update changes that
      behaviour and falls back to MSI mode in case MSI-X mode
      initialization failed.
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: iss_storagedev@hp.com
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-pci@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      371ff93a
    • A
      swim3: fix interruptible_sleep_on race · 106fd892
      Arnd Bergmann 提交于
      interruptible_sleep_on is racy and going away. This replaces the one
      caller in the swim3 driver with the equivalent race-free
      wait_event_interruptible call. Since we're here already, this
      also fixes the case where we get interrupted from atomic context,
      which used to just spin in the loop.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      106fd892
    • A
      ataflop: fix sleep_on races · 7b8a3d22
      Arnd Bergmann 提交于
      sleep_on() is inherently racy, and has been deprecated for a long time.
      This fixes two instances in the atari floppy driver:
      
      * fdc_wait/fdc_busy becomes an open-coded mutex. We cannot use the
        regular mutex since it gets released in interrupt context. The
        open-coded version using wait_event() and cmpxchg() is equivalent
        to the existing code but does the checks atomically, and we can
        now safely check the condition with irqs enabled.
      
      * format_wait becomes a completion, which is the natural structure
        here. The format ioctl waits for the background task to either
        complete or abort.
      
      This does not attempt to fix the preexisting bug of calling schedule
      with local interrupts disabled.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michael Schmitz <schmitz@biophys.uni-duesseldorf.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7b8a3d22
    • A
      DAC960: remove sleep_on usage · 9c552e1d
      Arnd Bergmann 提交于
      sleep_on and its variants are going away. The use of sleep_on() in
      DAC960_V2_ExecuteUserCommand seems to be bogus because the command
      by the time we get there, the command has completed already and
      we just enter the timeout. Based on this interpretation, I concluded
      that we can replace it with a simple msleep(1000) and rearrange the
      code around it slightly.
      
      The interruptible_sleep_on_timeout in DAC960_gam_ioctl seems equivalent
      to the race-free version using wait_event_interruptible_timeout.
      I left the driver to return -EINTR rather than -ERESTARTSYS to preserve
      the timeout behavior.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9c552e1d
    • A
      mtip32xx: Use pci_enable_msi() instead of pci_enable_msi_range() · c94efe36
      Alexander Gordeev 提交于
      Commit "mtip32xx: Use pci_enable_msix_range() instead of
      pci_enable_msix()" was unnecessary, since pci_enable_msi()
      function is not deprecated and is still preferable for
      enabling the single MSI mode. This update reverts usage of
      pci_enable_msi() function.
      
      Besides, the changelog for that commit was bogus, since
      mtip32xx driver uses MSI interrupt, not MSI-X.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: linux-pci@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c94efe36
  6. 13 3月, 2014 1 次提交
  7. 11 3月, 2014 1 次提交
    • J
      mtip32xx: fix bad use of smp_processor_id() · 7f328908
      Jens Axboe 提交于
      mtip_pci_probe() dumps the current CPU when loaded, but it does
      so in a preemptible context. Hence smp_processor_id() correctly
      warns:
      
      BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/155
      caller is mtip_pci_probe+0x53/0x880 [mtip32xx]
      
      Switch to raw_smp_processor_id(), since it's just informational
      and persistent accuracy isn't important.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7f328908
  8. 07 3月, 2014 2 次提交
    • T
      nvme: don't use PREPARE_WORK · 9ca97374
      Tejun Heo 提交于
      PREPARE_[DELAYED_]WORK() are being phased out.  They have few users
      and a nasty surprise in terms of reentrancy guarantee as workqueue
      considers work items to be different if they don't have the same work
      function.
      
      nvme_dev->reset_work is multiplexed with multiple work functions.
      Introduce nvme_reset_workfn() which invokes nvme_dev->reset_workfn and
      always use it as the work function and update the users to set the
      ->reset_workfn field instead of overriding the work function using
      PREPARE_WORK().
      
      It would probably be best to route this with other related updates
      through the workqueue tree.
      
      Compile tested.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: linux-nvme@lists.infradead.org
      9ca97374
    • T
      floppy: don't use PREPARE_[DELAYED_]WORK · 75ddb38f
      Tejun Heo 提交于
      PREPARE_[DELAYED_]WORK() are being phased out.  They have few users
      and a nasty surprise in terms of reentrancy guarantee as workqueue
      considers work items to be different if they don't have the same work
      function.
      
      floppy has been multiplexing floppy_work and fd_timer with multiple
      work functions.  Introduce floppy_work_workfn() and fd_timer_workfn()
      which invoke floppy_work_fn and fd_timer_fn respectively and always
      use the two functions as the work functions and update the users to
      set floppy_work_fn and fd_timer_fn instead of overriding work
      functions using PREPARE_[DELAYED_]WORK().
      
      It would probably be best to route this with other related updates
      through the workqueue tree.
      
      Lightly tested using qemu.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJiri Kosina <jkosina@suse.cz>
      75ddb38f
  9. 04 3月, 2014 2 次提交
    • M
      zram: avoid null access when fail to alloc meta · db5d711e
      Minchan Kim 提交于
      zram_meta_alloc could fail so caller should check it.  Otherwise, your
      system will hang.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      db5d711e
    • D
      mm: close PageTail race · 668f9abb
      David Rientjes 提交于
      Commit bf6bddf1 ("mm: introduce compaction and migration for
      ballooned pages") introduces page_count(page) into memory compaction
      which dereferences page->first_page if PageTail(page).
      
      This results in a very rare NULL pointer dereference on the
      aforementioned page_count(page).  Indeed, anything that does
      compound_head(), including page_count() is susceptible to racing with
      prep_compound_page() and seeing a NULL or dangling page->first_page
      pointer.
      
      This patch uses Andrea's implementation of compound_trans_head() that
      deals with such a race and makes it the default compound_head()
      implementation.  This includes a read memory barrier that ensures that
      if PageTail(head) is true that we return a head page that is neither
      NULL nor dangling.  The patch then adds a store memory barrier to
      prep_compound_page() to ensure page->first_page is set.
      
      This is the safest way to ensure we see the head page that we are
      expecting, PageTail(page) is already in the unlikely() path and the
      memory barriers are unfortunately required.
      
      Hugetlbfs is the exception, we don't enforce a store memory barrier
      during init since no race is possible.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      668f9abb
  10. 22 2月, 2014 5 次提交