1. 21 7月, 2016 12 次提交
    • T
      dm snap: add fake origin_direct_access · f6e629bd
      Toshi Kani 提交于
      dax-capable mapped-device is marked as DM_TYPE_DAX_BIO_BASED,
      which supports both dax and bio-based operations.  dm-snap
      needs to work with dax-capable device when bio-based operation
      is used.
      
      Add fake origin_direct_access() to origin device so that its
      origin device is also marked as DM_TYPE_DAX_BIO_BASED for
      dax-capable device.  This allows to extend target's DM table.
      dm-snap works normally when bio-based operation is used.
      
      dm-snap does not support dax operation, and mount with dax
      option to a target device or snapshot device fails.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f6e629bd
    • T
      dm stripe: add DAX support · beec25b4
      Toshi Kani 提交于
      Change dm-stripe to implement direct_access function,
      stripe_direct_access(), which maps bdev and sector and
      calls direct_access function of its physical target device.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      beec25b4
    • M
      dm error: add DAX support · f8df1fdf
      Mike Snitzer 提交于
      Allow the error target to replace an existing DAX-enabled target.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f8df1fdf
    • T
      dm linear: add DAX support · 84b22f83
      Toshi Kani 提交于
      Change dm-linear to implement direct_access function,
      linear_direct_access(), which maps sector and calls direct_access
      function of its physical target device.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      84b22f83
    • T
      dm: add infrastructure for DAX support · 545ed20e
      Toshi Kani 提交于
      Change mapped device to implement direct_access function,
      dm_blk_direct_access(), which calls a target direct_access function.
      'struct target_type' is extended to have target direct_access interface.
      This function limits direct accessible size to the dm_target's limit
      with max_io_len().
      
      Add dm_table_supports_dax() to iterate all targets and associated block
      devices to check for DAX support.  To add DAX support to a DM target the
      target must only implement the direct_access function.
      
      Add a new dm type, DM_TYPE_DAX_BIO_BASED, which indicates that mapped
      device supports DAX and is bio based.  This new type is used to assure
      that all target devices have DAX support and remain that way after
      QUEUE_FLAG_DAX is set in mapped device.
      
      At initial table load, QUEUE_FLAG_DAX is set to mapped device when setting
      DM_TYPE_DAX_BIO_BASED to the type.  Any subsequent table load to the
      mapped device must have the same type, or else it fails per the check in
      table_load().
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      545ed20e
    • M
      Merge remote-tracking branch 'jens/for-4.8/core' into dm-4.8 · e9ccb945
      Mike Snitzer 提交于
      DM's DAX support depends on block core's newly added QUEUE_FLAG_DAX.
      e9ccb945
    • T
      block: do not merge requests without consulting with io scheduler · 72ef799b
      Tahsin Erdogan 提交于
      Before merging a bio into an existing request, io scheduler is called to
      get its approval first. However, the requests that come from a plug
      flush may get merged by block layer without consulting with io
      scheduler.
      
      In case of CFQ, this can cause fairness problems. For instance, if a
      request gets merged into a low weight cgroup's request, high weight cgroup
      now will depend on low weight cgroup to get scheduled. If high weigt cgroup
      needs that io request to complete before submitting more requests, then it
      will also lose its timeslice.
      
      Following script demonstrates the problem. Group g1 has a low weight, g2
      and g3 have equal high weights but g2's requests are adjacent to g1's
      requests so they are subject to merging. Due to these merges, g2 gets
      poor disk time allocation.
      
      cat > cfq-merge-repro.sh << "EOF"
      #!/bin/bash
      set -e
      
      IO_ROOT=/mnt-cgroup/io
      
      mkdir -p $IO_ROOT
      
      if ! mount | grep -qw $IO_ROOT; then
        mount -t cgroup none -oblkio $IO_ROOT
      fi
      
      cd $IO_ROOT
      
      for i in g1 g2 g3; do
        if [ -d $i ]; then
          rmdir $i
        fi
      done
      
      mkdir g1 && echo 10 > g1/blkio.weight
      mkdir g2 && echo 495 > g2/blkio.weight
      mkdir g3 && echo 495 > g3/blkio.weight
      
      RUNTIME=10
      
      (echo $BASHPID > g1/cgroup.procs &&
       fio --readonly --name name1 --filename /dev/sdb \
           --rw read --size 64k --bs 64k --time_based \
           --runtime=$RUNTIME --offset=0k &> /dev/null)&
      
      (echo $BASHPID > g2/cgroup.procs &&
       fio --readonly --name name1 --filename /dev/sdb \
           --rw read --size 64k --bs 64k --time_based \
           --runtime=$RUNTIME --offset=64k &> /dev/null)&
      
      (echo $BASHPID > g3/cgroup.procs &&
       fio --readonly --name name1 --filename /dev/sdb \
           --rw read --size 64k --bs 64k --time_based \
           --runtime=$RUNTIME --offset=256k &> /dev/null)&
      
      sleep $((RUNTIME+1))
      
      for i in g1 g2 g3; do
        echo ---- $i ----
        cat $i/blkio.time
      done
      
      EOF
      # ./cfq-merge-repro.sh
      ---- g1 ----
      8:16 162
      ---- g2 ----
      8:16 165
      ---- g3 ----
      8:16 686
      
      After applying the patch:
      
      # ./cfq-merge-repro.sh
      ---- g1 ----
      8:16 90
      ---- g2 ----
      8:16 445
      ---- g3 ----
      8:16 471
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      72ef799b
    • B
      block: Fix spelling in a source code comment · 68bdf1ac
      Bart Van Assche 提交于
      Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      68bdf1ac
    • Y
      block: expose QUEUE_FLAG_DAX in sysfs · ea6ca600
      Yigal Korman 提交于
      Provides the ability to identify DAX enabled devices in userspace.
      Signed-off-by: NYigal Korman <yigal@plexistor.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ea6ca600
    • T
      block: add QUEUE_FLAG_DAX for devices to advertise their DAX support · 163d4baa
      Toshi Kani 提交于
      Currently, presence of direct_access() in block_device_operations
      indicates support of DAX on its block device.  Because
      block_device_operations is instantiated with 'const', this DAX
      capablity may not be enabled conditinally.
      
      In preparation for supporting DAX to device-mapper devices, add
      QUEUE_FLAG_DAX to request_queue flags to advertise their DAX
      support.  This will allow to set the DAX capability based on how
      mapped device is composed.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <linux-s390@vger.kernel.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      163d4baa
    • J
      dm thin: fix a race condition between discarding and provisioning a block · 2a0fbffb
      Joe Thornber 提交于
      The discard passdown was being issued after the block was unmapped,
      which meant the block could be reprovisioned whilst the passdown discard
      was still in flight.
      
      We can only identify unshared blocks (safe to do a passdown a discard
      to) once they're unmapped and their ref count hits zero.  Block ref
      counts are now used to guard against concurrent allocation of these
      blocks that are being discarded.  So now we unmap the block, issue
      passdown discards, and the immediately increment ref counts for regions
      that have been discarded via passed down (this is safe because
      allocation occurs within the same thread).  We then decrement ref counts
      once the passdown discard IO is complete -- signaling these blocks may
      now be allocated.
      
      This fixes the potential for corruption that was reported here:
      https://www.redhat.com/archives/dm-devel/2016-June/msg00311.htmlReported-by: NDennis Yang <dennisyang@qnap.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2a0fbffb
    • J
      dm btree: fix a bug in dm_btree_find_next_single() · e7e0f730
      Joe Thornber 提交于
      dm_btree_find_next_single() can short-circuit the search for a block
      with a return of -ENODATA if all entries are higher than the search key
      passed to lower_bound().
      
      This hasn't been a problem because of the way the btree has been used by
      DM thinp.  But it must be fixed now in preparation for fixing the race
      in DM thinp's handling of simultaneous block discard vs allocation.
      Otherwise, once that fix is in place, some of the blocks in a discard
      would not be unmapped as expected.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e7e0f730
  2. 19 7月, 2016 28 次提交