1. 01 11月, 2015 6 次提交
    • M
      dm: eliminate unused "bioset" process for each bio-based DM device · dbba42d8
      Mikulas Patocka 提交于
      Commit 54efd50b ("block: make
      generic_make_request handle arbitrarily sized bios") makes it possible
      for block devices to process large bios.  In doing so that commit
      allocates a new queue->bio_split bioset for each block device, this
      bioset is used for allocating bios when the driver needs to split large
      bios.
      
      Each bioset allocates a workqueue process, thus the above commit
      increases the number of processes allocated per block device.
      
      DM doesn't need the queue->bio_split bioset, thus we can deallocate it.
      This reduces the number of allocated processes per bio-based DM device
      from 3 to 2.  Also remove the call to blk_queue_split(), it is not
      needed because DM does its own splitting.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      dbba42d8
    • M
      dm: convert ffs to __ffs · a3d939ae
      Mikulas Patocka 提交于
      ffs counts bit starting with 1 (for the least significant bit), __ffs
      counts bits starting with 0. This patch changes various occurrences of ffs
      to __ffs and removes subtraction of 1 from the result.
      
      Note that __ffs (unlike ffs) is not defined when called with zero
      argument, but it is not called with zero argument in any of these cases.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      a3d939ae
    • J
      dm: drop NULL test before kmem_cache_destroy() and mempool_destroy() · 6f65985e
      Julia Lawall 提交于
      Remove DM's unneeded NULL tests before calling these destroy functions,
      now that they check for NULL, thanks to these v4.3 commits:
      3942d299 ("mm/slab_common: allow NULL cache pointer in kmem_cache_destroy()")
      4e3ca3e0 ("mm/mempool: allow NULL `pool' pointer in mempool_destroy()")
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@ expression x; @@
      -if (x != NULL)
        \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
      // </smpl>
      Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      6f65985e
    • C
      dm: add support for passing through persistent reservations · 71cdb697
      Christoph Hellwig 提交于
      This adds support to pass through persistent reservation requests
      similar to the existing ioctl handling, and with the same limitations,
      e.g. devices may only have a single target attached.
      
      This is mostly intended for multipathing.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      71cdb697
    • C
      dm: refactor ioctl handling · e56f81e0
      Christoph Hellwig 提交于
      This moves the call to blkdev_ioctl and the argument checking to DM core
      code, and only leaves a callout to find the block device to operate on
      in the targets.  This simplifies the code and allows us to pass through
      ioctl-like command using other methods in the next patch.
      
      Also split out a helper around calling the prepare_ioctl method that
      will be reused for persistent reservation handling.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e56f81e0
    • M
      Revert "dm mpath: fix stalls when handling invalid ioctls" · 47796938
      Mauricio Faria de Oliveira 提交于
      This reverts commit a1989b33.
      
      That commit introduced a regression at least for the case of the SG_IO ioctl()
      running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
      are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
      than blocking due to queue_if_no_path until a path becomes active, for example.
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
      from multipath devices; which leads to SCSI/filesystem errors in such a guest.
      
      More general scenarios can hit that regression too. The following demonstration
      employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
      (some output & user changes omitted for brevity and comments added for clarity).
      
      Reverting that commit restores normal operation (queueing) in failing scenarios;
      tested on linux-next (next-20151022).
      
      1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)
      
          $ cat sg_simple0.c
          ... see [3] ...
          $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
          $ gcc sgio_inquiry.c -o sgio_inquiry
      
      2) The ioctl() works fine with active paths present.
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=active
          | |- 8:0:11:0  sdz  65:144  active undef running
          | `- 9:0:9:0   sdbf 67:144  active undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  active undef running
            `- 9:0:12:0  sdbo 68:32   active undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          Some of the INQUIRY command's response:
              IBM       2145              0000
          INQUIRY duration=0 millisecs, resid=0
      
      3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
         for unprivileged users (rather than blocking due to queue_if_no_path).
      
          # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
                do multipathd -k"fail path $path"; done
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=enabled
          | |- 8:0:11:0  sdz  65:144  failed undef running
          | `- 9:0:9:0   sdbf 67:144  failed undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  failed undef running
            `- 9:0:12:0  sdbo 68:32   failed undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
      4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
         it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().
      
          $ dmesg
          <...>
          [] device-mapper: multipath: Failing path 65:144.
          [] device-mapper: multipath: Failing path 67:144.
          [] device-mapper: multipath: Failing path 65:224.
          [] device-mapper: multipath: Failing path 68:32.
          [] sgio_inquiry: sending ioctl 2285 to a partition!
      
      5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
         (then queueing happens -- in this example, queue_if_no_path is set);
         this is due to a conditional check in scsi_verify_blk_ioctl().
      
          # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
          # ./sgio_inquiry /dev/mapper/85ag56 &
          [1] 72830
      
          # cat /proc/72830/stack
          [<c00000171c0df700>] 0xc00000171c0df700
          [<c000000000015934>] __switch_to+0x204/0x350
          [<c000000000152d4c>] msleep+0x5c/0x80
          [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
          [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
          [<c0000000003128e4>] block_ioctl+0x64/0xd0
          [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
          [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
          [<c000000000009358>] system_call+0x38/0xd0
      
      6) This is the function call chain exercised in this analysis:
      
      SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
          -> do_vfs_ioctl()
              -> vfs_ioctl()
                  ...
                  error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                  ...
                      -> dm_blk_ioctl() @ drivers/md/dm.c
                          -> multipath_ioctl() @ drivers/md/dm-mpath.c
                              ...
                              (bdev = NULL, due to no active paths)
                              ...
                              if (!bdev || <...>) {
                                  int err = scsi_verify_blk_ioctl(NULL, cmd);
                                  if (err)
                                      r = err;
                              }
                              ...
                                  -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                      ...
                                      if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                          return 0;
                                      ...
                                      if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                          return 0;
                                      ...
                                      printk_ratelimited(KERN_WARNING
                                                 "%s: sending ioctl %x to a partition!\n" <...>);
      
                                      return -ENOIOCTLCMD;
                                  <-
                              ...
                              return r ? : <...>
                          <-
                  ...
                  if (error == -ENOIOCTLCMD)
                      error = -ENOTTY;
                   out:
                      return error;
                  ...
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      47796938
  2. 30 10月, 2015 1 次提交
    • M
      dm: initialize non-blk-mq queue data before queue is used · ad5f498f
      Mikulas Patocka 提交于
      Commit bfebd1cd ("dm: add full blk-mq
      support to request-based DM") moves the initialization of the fields
      backing_dev_info.congested_fn, backing_dev_info.congested_data and
      queuedata from the function dm_init_md_queue (that is called when the
      device is created) to dm_init_old_md_queue (that is called after the
      device type is determined).
      
      There is no locking when accessing these variables, thus it is possible
      for other parts of the kernel to briefly see this data in a transient
      state (e.g. queue->backing_dev_info.congested_fn initialized and
      md->queue->backing_dev_info.congested_data uninitialized, resulting in
      passing an incorrect parameter to the function dm_any_congested).
      
      This queue data is left initialized for blk-mq devices even though they
      that don't use it.
      
      Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # v4.1+
      ad5f498f
  3. 22 10月, 2015 3 次提交
  4. 15 10月, 2015 2 次提交
  5. 13 10月, 2015 2 次提交
  6. 10 10月, 2015 11 次提交
  7. 03 10月, 2015 3 次提交
  8. 02 10月, 2015 12 次提交
    • N
      md/bitmap: don't pass -1 to bitmap_storage_alloc. · da6fb7a9
      NeilBrown 提交于
      Passing -1 to bitmap_storage_alloc() causes page->index to be set to
      -1, which is quite problematic.
      
      So only pass ->cluster_slot if mddev_is_clustered().
      
      Fixes: b97e9257 ("Use separate bitmaps for each nodes in the cluster")
      Cc: stable@vger.kernel.org (v4.1+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      da6fb7a9
    • J
      md/raid1: Avoid raid1 resync getting stuck · e8ff8bf0
      Jes Sorensen 提交于
      close_sync() needs to set conf->next_resync to a large, but safe value
      below MaxSector and use it to determine whether or not to set
      start_next_window in wait_barrier()
      
      Solution suggested by Neil Brown.
      Reported-by: NNate Dailey <nate.dailey@stratus.com>
      Tested-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      e8ff8bf0
    • J
      md: drop null test before destroy functions · 644df1a8
      Julia Lawall 提交于
      Remove unneeded NULL test.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@ expression x; @@
      -if (x != NULL)
        \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
      // </smpl>
      Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      644df1a8
    • S
      md: clear CHANGE_PENDING in readonly array · d4929add
      Shaohua Li 提交于
      If faulty disks of an array are more than allowed degraded number, the
      array enters error handling. It will be marked as read-only with
      MD_CHANGE_PENDING/RECOVERY_NEEDED set. But currently recovery doesn't
      clear CHANGE_PENDING bit for read-only array.  If MD_CHANGE_PENDING is
      set for a raid5 array, all returned IO will be hold on a list till the
      bit is clear. But recovery nevery clears this bit, the IO is always in
      pending state and nevery finish. This has bad effects like upper layer
      can't get an IO error and the array can't be stopped.
      
      Fixes: c3cce6cd ("md/raid5: ensure device failure recorded before write request returns.")
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      d4929add
    • N
      md/raid0: apply base queue limits *before* disk_stack_limits · 66eefe5d
      NeilBrown 提交于
      Calling e.g. blk_queue_max_hw_sectors() after calls to
      disk_stack_limits() discards the settings determined by
      disk_stack_limits().
      So we need to make those calls first.
      
      Fixes: 199dc6ed ("md/raid0: update queue parameter in a safer location.")
      Cc: stable@vger.kernel.org (v2.6.35+ - please apply with 199dc6ed).
      Reported-by: NJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      66eefe5d
    • N
      md/raid5: don't index beyond end of array in need_this_block(). · 36707bb2
      NeilBrown 提交于
      When need_this_block probably shouldn't be called when there
      are more than 2 failed devices, we really don't want it to try
      indexing beyond the end of the failed_num[] of fdev[] arrays.
      
      So limit the loops to at most 2 iterations.
      Reported-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      36707bb2
    • S
      raid5: update analysis state for failed stripe · ebda780b
      Shaohua Li 提交于
      handle_failed_stripe() makes the stripe fail, eg, all IO will return
      with a failure, but it doesn't update stripe_head_state. Later
      handle_stripe() has special handling for raid6 for handle_stripe_fill().
      That check before handle_stripe_fill() doesn't skip the failed stripe
      and we get a kernel crash in need_this_block.  This patch clear the
      analysis state to make sure no functions wrongly called after
      handle_failed_stripe()
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      ebda780b
    • N
      md: wait for pending superblock updates before switching to read-only · 88724bfa
      NeilBrown 提交于
      If a superblock update is pending, wait for it to complete before
      letting md_set_readonly() switch to readonly.
      Otherwise we might lose important information about a device having
      failed.
      
      For external arrays, waiting for superblock updates can wait on
      user-space, so in that case, just return an error.
      Reported-and-tested-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      88724bfa
    • D
      drm/dp/mst: add some defines for logical/physical ports · ccf03d69
      Dave Airlie 提交于
      This just removes the magic number.
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      ccf03d69
    • D
      drm/dp/mst: drop cancel work sync in the mstb destroy path (v2) · 274d8352
      Dave Airlie 提交于
      Since 9eb1e57f
      drm/dp/mst: make sure mst_primary mstb is valid in work function
      
      we validate the mstb structs in the work function, and doing
      that takes a reference. So we should never get here with the
      work function running using the mstb device, only if the work
      function hasn't run yet or is running for another mstb.
      
      So we don't need to sync the work here, this was causing
      lockdep spew as below.
      
      [  +0.000160] =============================================
      [  +0.000001] [ INFO: possible recursive locking detected ]
      [  +0.000002] 3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug #1 Tainted: G        W      ------------
      [  +0.000001] ---------------------------------------------
      [  +0.000001] kworker/4:2/1262 is trying to acquire lock:
      [  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b29a5>] flush_work+0x5/0x2e0
      [  +0.000007]
      but task is already holding lock:
      [  +0.000001]  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
      [  +0.000004]
      other info that might help us debug this:
      [  +0.000001]  Possible unsafe locking scenario:
      
      [  +0.000002]        CPU0
      [  +0.000000]        ----
      [  +0.000001]   lock((&mgr->work));
      [  +0.000002]   lock((&mgr->work));
      [  +0.000001]
       *** DEADLOCK ***
      
      [  +0.000001]  May be due to missing lock nesting notation
      
      [  +0.000002] 2 locks held by kworker/4:2/1262:
      [  +0.000001]  #0:  (events_long){.+.+.+}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
      [  +0.000004]  #1:  ((&mgr->work)){+.+.+.}, at: [<ffffffff810b57e4>] process_one_work+0x1b4/0x710
      [  +0.000003]
      stack backtrace:
      [  +0.000003] CPU: 4 PID: 1262 Comm: kworker/4:2 Tainted: G        W      ------------   3.10.0-320.el7.rhel72.stable.backport.3.x86_64.debug #1
      [  +0.000001] Hardware name: LENOVO 20EGS0R600/20EGS0R600, BIOS GNET71WW (2.19 ) 02/05/2015
      [  +0.000008] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
      [  +0.000001]  ffffffff82c26c90 00000000a527b914 ffff88046399bae8 ffffffff816fe04d
      [  +0.000004]  ffff88046399bb58 ffffffff8110f47f ffff880461438000 0001009b840fc003
      [  +0.000002]  ffff880461438a98 0000000000000000 0000000804dc26e1 ffffffff824a2c00
      [  +0.000003] Call Trace:
      [  +0.000004]  [<ffffffff816fe04d>] dump_stack+0x19/0x1b
      [  +0.000004]  [<ffffffff8110f47f>] __lock_acquire+0x115f/0x1250
      [  +0.000002]  [<ffffffff8110fd49>] lock_acquire+0x99/0x1e0
      [  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
      [  +0.000002]  [<ffffffff810b29ee>] flush_work+0x4e/0x2e0
      [  +0.000002]  [<ffffffff810b29a5>] ? flush_work+0x5/0x2e0
      [  +0.000004]  [<ffffffff81025905>] ? native_sched_clock+0x35/0x80
      [  +0.000002]  [<ffffffff81025959>] ? sched_clock+0x9/0x10
      [  +0.000002]  [<ffffffff810da1f5>] ? local_clock+0x25/0x30
      [  +0.000002]  [<ffffffff8110dca9>] ? mark_held_locks+0xb9/0x140
      [  +0.000003]  [<ffffffff810b4ed5>] ? __cancel_work_timer+0x95/0x160
      [  +0.000002]  [<ffffffff810b4ee8>] __cancel_work_timer+0xa8/0x160
      [  +0.000002]  [<ffffffff810b4fb0>] cancel_work_sync+0x10/0x20
      [  +0.000007]  [<ffffffffa0160d17>] drm_dp_destroy_mst_branch_device+0x27/0x120 [drm_kms_helper]
      [  +0.000006]  [<ffffffffa0163968>] drm_dp_mst_link_probe_work+0x78/0xa0 [drm_kms_helper]
      [  +0.000002]  [<ffffffff810b5850>] process_one_work+0x220/0x710
      [  +0.000002]  [<ffffffff810b57e4>] ? process_one_work+0x1b4/0x710
      [  +0.000005]  [<ffffffff810b5e5b>] worker_thread+0x11b/0x3a0
      [  +0.000003]  [<ffffffff810b5d40>] ? process_one_work+0x710/0x710
      [  +0.000002]  [<ffffffff810beced>] kthread+0xed/0x100
      [  +0.000003]  [<ffffffff810bec00>] ? insert_kthread_work+0x80/0x80
      [  +0.000003]  [<ffffffff817121d8>] ret_from_fork+0x58/0x90
      
      v2: add flush_work.
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      274d8352
    • D
      drm/dp/mst: split connector registration into two parts (v2) · d9515c5e
      Dave Airlie 提交于
      In order to cache the EDID properly for tiled displays, we
      need to retrieve it before we register the connector with
      userspace, otherwise userspace can call get resources
      and try and get the edid before we've even cached it.
      
      This fixes some problems when hotplugging mst monitors,
      with X/mutter running. As mutter seems to get 0 modes
      for one of the monitors in the tile.
      
      v2: fix warning in radeon
      handle tile setting in cached path rather than
      get edid path.
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      d9515c5e
    • D
      drm/dp/mst: update the link_address_sent before sending the link address (v3) · 68d8c9fc
      Dave Airlie 提交于
      Update the state before sending the msg to close it.
      
      v2: reset value if return indicates we haven't send the msg.
      v3: just clean the code up.
      Pointed out by Adam J Richter on
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91481Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      68d8c9fc