1. 01 11月, 2015 6 次提交
    • M
      dm: eliminate unused "bioset" process for each bio-based DM device · dbba42d8
      Mikulas Patocka 提交于
      Commit 54efd50b ("block: make
      generic_make_request handle arbitrarily sized bios") makes it possible
      for block devices to process large bios.  In doing so that commit
      allocates a new queue->bio_split bioset for each block device, this
      bioset is used for allocating bios when the driver needs to split large
      bios.
      
      Each bioset allocates a workqueue process, thus the above commit
      increases the number of processes allocated per block device.
      
      DM doesn't need the queue->bio_split bioset, thus we can deallocate it.
      This reduces the number of allocated processes per bio-based DM device
      from 3 to 2.  Also remove the call to blk_queue_split(), it is not
      needed because DM does its own splitting.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      dbba42d8
    • M
      dm: convert ffs to __ffs · a3d939ae
      Mikulas Patocka 提交于
      ffs counts bit starting with 1 (for the least significant bit), __ffs
      counts bits starting with 0. This patch changes various occurrences of ffs
      to __ffs and removes subtraction of 1 from the result.
      
      Note that __ffs (unlike ffs) is not defined when called with zero
      argument, but it is not called with zero argument in any of these cases.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      a3d939ae
    • J
      dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() · 6f65985e
      Julia Lawall 提交于
      Remove DM's unneeded NULL tests before calling these destroy functions,
      now that they check for NULL, thanks to these v4.3 commits:
      3942d299 ("mm/slab_common: allow NULL cache pointer in kmem_cache_destroy()")
      4e3ca3e0 ("mm/mempool: allow NULL `pool' pointer in mempool_destroy()")
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@ expression x; @@
      -if (x != NULL)
        \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
      // </smpl>
      Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      6f65985e
    • C
      dm: add support for passing through persistent reservations · 71cdb697
      Christoph Hellwig 提交于
      This adds support to pass through persistent reservation requests
      similar to the existing ioctl handling, and with the same limitations,
      e.g. devices may only have a single target attached.
      
      This is mostly intended for multipathing.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      71cdb697
    • C
      dm: refactor ioctl handling · e56f81e0
      Christoph Hellwig 提交于
      This moves the call to blkdev_ioctl and the argument checking to DM core
      code, and only leaves a callout to find the block device to operate on
      in the targets.  This simplifies the code and allows us to pass through
      ioctl-like command using other methods in the next patch.
      
      Also split out a helper around calling the prepare_ioctl method that
      will be reused for persistent reservation handling.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e56f81e0
    • M
      Revert "dm mpath: fix stalls when handling invalid ioctls" · 47796938
      Mauricio Faria de Oliveira 提交于
      This reverts commit a1989b33.
      
      That commit introduced a regression at least for the case of the SG_IO ioctl()
      running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
      are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
      than blocking due to queue_if_no_path until a path becomes active, for example.
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
      from multipath devices; which leads to SCSI/filesystem errors in such a guest.
      
      More general scenarios can hit that regression too. The following demonstration
      employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
      (some output & user changes omitted for brevity and comments added for clarity).
      
      Reverting that commit restores normal operation (queueing) in failing scenarios;
      tested on linux-next (next-20151022).
      
      1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)
      
          $ cat sg_simple0.c
          ... see [3] ...
          $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
          $ gcc sgio_inquiry.c -o sgio_inquiry
      
      2) The ioctl() works fine with active paths present.
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=active
          | |- 8:0:11:0  sdz  65:144  active undef running
          | `- 9:0:9:0   sdbf 67:144  active undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  active undef running
            `- 9:0:12:0  sdbo 68:32   active undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          Some of the INQUIRY command's response:
              IBM       2145              0000
          INQUIRY duration=0 millisecs, resid=0
      
      3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
         for unprivileged users (rather than blocking due to queue_if_no_path).
      
          # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
                do multipathd -k"fail path $path"; done
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=enabled
          | |- 8:0:11:0  sdz  65:144  failed undef running
          | `- 9:0:9:0   sdbf 67:144  failed undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  failed undef running
            `- 9:0:12:0  sdbo 68:32   failed undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
      4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
         it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().
      
          $ dmesg
          <...>
          [] device-mapper: multipath: Failing path 65:144.
          [] device-mapper: multipath: Failing path 67:144.
          [] device-mapper: multipath: Failing path 65:224.
          [] device-mapper: multipath: Failing path 68:32.
          [] sgio_inquiry: sending ioctl 2285 to a partition!
      
      5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
         (then queueing happens -- in this example, queue_if_no_path is set);
         this is due to a conditional check in scsi_verify_blk_ioctl().
      
          # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
          # ./sgio_inquiry /dev/mapper/85ag56 &
          [1] 72830
      
          # cat /proc/72830/stack
          [<c00000171c0df700>] 0xc00000171c0df700
          [<c000000000015934>] __switch_to+0x204/0x350
          [<c000000000152d4c>] msleep+0x5c/0x80
          [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
          [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
          [<c0000000003128e4>] block_ioctl+0x64/0xd0
          [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
          [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
          [<c000000000009358>] system_call+0x38/0xd0
      
      6) This is the function call chain exercised in this analysis:
      
      SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
          -> do_vfs_ioctl()
              -> vfs_ioctl()
                  ...
                  error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                  ...
                      -> dm_blk_ioctl() @ drivers/md/dm.c
                          -> multipath_ioctl() @ drivers/md/dm-mpath.c
                              ...
                              (bdev = NULL, due to no active paths)
                              ...
                              if (!bdev || <...>) {
                                  int err = scsi_verify_blk_ioctl(NULL, cmd);
                                  if (err)
                                      r = err;
                              }
                              ...
                                  -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                      ...
                                      if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                          return 0;
                                      ...
                                      if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                          return 0;
                                      ...
                                      printk_ratelimited(KERN_WARNING
                                                 "%s: sending ioctl %x to a partition!\n" <...>);
      
                                      return -ENOIOCTLCMD;
                                  <-
                              ...
                              return r ? : <...>
                          <-
                  ...
                  if (error == -ENOIOCTLCMD)
                      error = -ENOTTY;
                   out:
                      return error;
                  ...
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      47796938
  2. 30 10月, 2015 1 次提交
    • M
      dm: initialize non-blk-mq queue data before queue is used · ad5f498f
      Mikulas Patocka 提交于
      Commit bfebd1cd ("dm: add full blk-mq
      support to request-based DM") moves the initialization of the fields
      backing_dev_info.congested_fn, backing_dev_info.congested_data and
      queuedata from the function dm_init_md_queue (that is called when the
      device is created) to dm_init_old_md_queue (that is called after the
      device type is determined).
      
      There is no locking when accessing these variables, thus it is possible
      for other parts of the kernel to briefly see this data in a transient
      state (e.g. queue->backing_dev_info.congested_fn initialized and
      md->queue->backing_dev_info.congested_data uninitialized, resulting in
      passing an incorrect parameter to the function dm_any_congested).
      
      This queue data is left initialized for blk-mq devices even though they
      that don't use it.
      
      Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # v4.1+
      ad5f498f
  3. 22 10月, 2015 5 次提交
  4. 15 10月, 2015 2 次提交
  5. 13 10月, 2015 2 次提交
  6. 10 10月, 2015 15 次提交
  7. 04 10月, 2015 6 次提交
    • L
      Linux 4.3-rc4 · 049e6dde
      Linus Torvalds 提交于
      049e6dde
    • L
      Merge branch 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile · 30c44659
      Linus Torvalds 提交于
      Pull strscpy string copy function implementation from Chris Metcalf.
      
      Chris sent this during the merge window, but I waffled back and forth on
      the pull request, which is why it's going in only now.
      
      The new "strscpy()" function is definitely easier to use and more secure
      than either strncpy() or strlcpy(), both of which are horrible nasty
      interfaces that have serious and irredeemable problems.
      
      strncpy() has a useless return value, and doesn't NUL-terminate an
      overlong result.  To make matters worse, it pads a short result with
      zeroes, which is a performance disaster if you have big buffers.
      
      strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
      the insane NUL padding, but having a differently broken return value
      which returns the original length of the source string.  Which means
      that it will read characters past the count from the source buffer, and
      you have to trust the source to be properly terminated.  It also makes
      error handling fragile, since the test for overflow is unnecessarily
      subtle.
      
      strscpy() avoids both these problems, guaranteeing the NUL termination
      (but not excessive padding) if the destination size wasn't zero, and
      making the overflow condition very obvious by returning -E2BIG.  It also
      doesn't read past the size of the source, and can thus be used for
      untrusted source data too.
      
      So why did I waffle about this for so long?
      
      Every time we introduce a new-and-improved interface, people start doing
      these interminable series of trivial conversion patches.
      
      And every time that happens, somebody does some silly mistake, and the
      conversion patch to the improved interface actually makes things worse.
      Because the patch is mindnumbing and trivial, nobody has the attention
      span to look at it carefully, and it's usually done over large swatches
      of source code which means that not every conversion gets tested.
      
      So I'm pulling the strscpy() support because it *is* a better interface.
      But I will refuse to pull mindless conversion patches.  Use this in
      places where it makes sense, but don't do trivial patches to fix things
      that aren't actually known to be broken.
      
      * 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
        tile: use global strscpy() rather than private copy
        string: provide strscpy()
        Make asm/word-at-a-time.h available on all architectures
      30c44659
    • L
      Merge tag 'md/4.3-fixes' of git://neil.brown.name/md · 15ecf9a9
      Linus Torvalds 提交于
      Pull md fixes from Neil Brown:
       "Assorted fixes for md in 4.3-rc.
      
        Two tagged for -stable, and one is really a cleanup to match and
        improve kmemcache interface.
      
      * tag 'md/4.3-fixes' of git://neil.brown.name/md:
        md/bitmap: don't pass -1 to bitmap_storage_alloc.
        md/raid1: Avoid raid1 resync getting stuck
        md: drop null test before destroy functions
        md: clear CHANGE_PENDING in readonly array
        md/raid0: apply base queue limits *before* disk_stack_limits
        md/raid5: don't index beyond end of array in need_this_block().
        raid5: update analysis state for failed stripe
        md: wait for pending superblock updates before switching to read-only
      15ecf9a9
    • L
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 0d877081
      Linus Torvalds 提交于
      Pull MIPS updates from Ralf Baechle:
       "This week's round of MIPS fixes:
         - Fix JZ4740 build
         - Fix fallback to GFP_DMA
         - FP seccomp in case of ENOSYS
         - Fix bootmem panic
         - A number of FP and CPS fixes
         - Wire up new syscalls
         - Make sure BPF assembler objects can properly be disassembled
         - Fix BPF assembler code for MIPS I"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: scall: Always run the seccomp syscall filters
        MIPS: Octeon: Fix kernel panic on startup from memory corruption
        MIPS: Fix R2300 FP context switch handling
        MIPS: Fix octeon FP context switch handling
        MIPS: BPF: Fix load delay slots.
        MIPS: BPF: Do all exports of symbols with FEXPORT().
        MIPS: Fix the build on jz4740 after removing the custom gpio.h
        MIPS: CPS: #ifdef on CONFIG_MIPS_MT_SMP rather than CONFIG_MIPS_MT
        MIPS: CPS: Don't include MT code in non-MT kernels.
        MIPS: CPS: Stop dangling delay slot from has_mt.
        MIPS: dma-default: Fix 32-bit fall back to GFP_DMA
        MIPS: Wire up userfaultfd and membarrier syscalls.
      0d877081
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3e519dde
      Linus Torvalds 提交于
      Pull irq fixes from Thomas Gleixner:
       "This update contains:
      
         - Fix for a long standing race affecting /proc/irq/NNN
      
         - One line fix for ARM GICV3-ITS counting the wrong data
      
         - Warning silencing in ARM GICV3-ITS.  Another GCC trying to be
           overly clever issue"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/gic-v3-its: Count additional LPIs for the aliased devices
        irqchip/gic-v3-its: Silence warning when its_lpi_alloc_chunks gets inlined
        genirq: Fix race in register_irq_proc()
      3e519dde
    • M
      MIPS: scall: Always run the seccomp syscall filters · d218af78
      Markos Chandras 提交于
      The MIPS syscall handler code used to return -ENOSYS on invalid
      syscalls. Whilst this is expected, it caused problems for seccomp
      filters because the said filters never had the change to run since
      the code returned -ENOSYS before triggering them. This caused
      problems on the chromium testsuite for filters looking for invalid
      syscalls. This has now changed and the seccomp filters are always
      run even if the syscall is invalid. We return -ENOSYS once we
      return from the seccomp filters. Moreover, similar codepaths have
      been merged in the process which simplifies somewhat the overall
      syscall code.
      Signed-off-by: NMarkos Chandras <markos.chandras@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/11236/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      d218af78
  8. 03 10月, 2015 3 次提交