1. 30 8月, 2017 14 次提交
  2. 29 8月, 2017 5 次提交
  3. 26 8月, 2017 6 次提交
  4. 24 8月, 2017 5 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
    • B
      skd: Change default interrupt mode to MSI-X · 744353b6
      Bart Van Assche 提交于
      Since MSI support on some motherboards is unreliable, change the
      default interrupt mode from MSI to MSI-X. This patch avoids that
      the following message appears sporadially in the kernel logs of
      my test setup:
      
      do_IRQ: 3.193 No irq handler for vector
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      744353b6
    • B
      skd: Avoid double completions in case of a timeout · f2fe4459
      Bart Van Assche 提交于
      Avoid that normal request completion and the timeout handler can
      run concurrently by calling blk_mq_complete_request() instead of
      blk_mq_end_request() from skd_end_request(). Avoid that the block
      layer can reuse a request while the firmware is still processing
      it. Convert skd_softirq_done() to blk-mq. Pass the pointer to
      skd_softirq_done() to the block layer core through
      blk_mq_ops.complete instead of by calling blk_queue_softirq_done().
      Pass the pointer to skd_timed_out() to the block layer core
      through blk_mq_ops.timeout instead of by calling
      blk_queue_timed_out(). The timeout handler has been tested as
      follows:
      
          echo 1 > /sys/block/skd0/io-timeout-fail &&
          (cd /sys/kernel/debug/fail_io_timeout &&
            echo 100 > probability &&
            echo N > task-filter &&
            echo 1 > times)
      
      Fixes: commit a74d5b76 ("skd: Switch to block layer timeout mechanism")
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f2fe4459
    • B
      skd: Inline skd_process_request() · c39c6c77
      Bart Van Assche 提交于
      This patch does not change any functionality but makes the skd
      driver code more similar to that of other blk-mq kernel drivers.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c39c6c77
    • B
      skd: Report completion mismatches once · 49f16e2f
      Bart Van Assche 提交于
      This patch removes one debug statement but otherwise does not change
      any functionality.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      49f16e2f
  5. 23 8月, 2017 10 次提交
    • S
      nullb: badbblocks support · 2f54a613
      Shaohua Li 提交于
      Sometime disk could have tracks broken and data there is inaccessable,
      but data in other parts can be accessed in normal way. MD RAID supports
      such disks. But we don't have a good way to test it, because we can't
      control which part of a physical disk is bad. For a virtual disk, this
      can be easily controlled.
      
      This patch adds a new 'badblock' attribute. Configure it in this way:
      echo "+1-100" > xxx/badblock, this will make sector [1-100] as bad
      blocks.
      echo "-20-30" > xxx/badblock, this will make sector [20-30] good
      
      If badblocks are accessed, the nullb disk will return IO error. Other
      parts of the disk can accessed in normal way.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2f54a613
    • S
      nullb: emulate cache · deb78b41
      Shaohua Li 提交于
      Software must flush disk cache to guarantee data safety. To check if
      software correctly does disk cache flush, we must know the behavior of
      disk. But physical disk behavior is uncontrollable. Even software
      doesn't do the flush, the disk probably does the flush. This patch tries
      to emulate a cache in the test disk.
      
      All write will go to a cache first, when the cache is full, we then
      flush some data to disk storage. A flush request will flush all data of
      the cache to disk storage. A FUA write will write to memory store
      directly and revalidate data in cache. If there is a power failure (by
      writing to power attribute, 'echo 0 > disk_name/power'), we discard all
      data in the cache, but preserve the data in disk storage. Later we can
      power on the disk again as usual (write 1 to 'power' attribute), then we
      can check data integrity and very if software does everything correctly.
      
      A new attribute 'cache_size' (in MB) is added to configure cache size.
      
      Based on original patch from Kyungchan Koh
      Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      deb78b41
    • S
      nullb: bandwidth control · eff2c4f1
      Shaohua Li 提交于
      In test, we usually expect controllable disk speed. For example, in a
      raid array, we'd like some disks are fast and some are slow. MD RAID
      actually has a feature for this. To test the feature, we'd like to make
      the disk run in specific speed.
      
      block throttling probably can be used for this purpose, but it requires
      cgroup setup. Here we just implement a simple throttling mechanism in
      the driver. There is slight fluctuation in the mechanism, but it's good
      enough for test.
      
      To configure the bandwidth cap, user sets the 'mbps' attribute. mbps is
      MB/s.
      
      Based on original patch from Kyungchan Koh
      Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eff2c4f1
    • S
      nullb: support discard · 306eb6b4
      Shaohua Li 提交于
      discard makes sense for memory backed disk. And also it's useful to test
      if upper layer supports dicard correctly.
      
      User configures 'discard' attribute to enable/disable dicard support.
      
      Based on original patch from Kyungchan Koh
      Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      306eb6b4
    • S
      nullb: support memory backed store · 5bcd0e0c
      Shaohua Li 提交于
      This adds memory backed store in nullb.
      
      User configure 'memory_backed' attribute for this. By default, nullb
      disk doesn't use memory backed store.
      
      Based on original patch from Kyungchan Koh
      Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5bcd0e0c
    • S
      nullb: use ida to manage index · 94bc02e3
      Shaohua Li 提交于
      We now dynamically create disks. Managing the disk index with ida to
      avoid bump up the index too much.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      94bc02e3
    • S
      nullb: add interface to power on disk · cedcafad
      Shaohua Li 提交于
      The device created in nullb configfs interface isn't power on by
      default. After user configures the device, user can do 'echo 1 >
      xxx/nullb/device_name/power' to power on the device, which will create a
      disk. the xxx/nullb/device_name/index is the disk index, so if the index
      is 2, the new created disk should be named as /dev/nullb2. Note, the
      'index' is only valid after disk is power on.
      
      'echo 0 > xxx/nullb/device_name/power' will remove the disk. Note, this
      doesn't remove the device. To remove the device, user should do 'rmdir
      xxx/nullb/device_name'. Removing the device will remove the disk too.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cedcafad
    • S
      nullb: add configfs interface · 3bf2bd20
      Shaohua Li 提交于
      Add configfs interface for nullb. configfs interface is more flexible
      and easy to configure in a per-disk basis.
      
      Configuration is something like this:
      mount -t configfs none /mnt
      
      Checking which features the driver supports:
      cat /mnt/nullb/features
      
      The 'features' attribute is for future extension. We probably will add
      new features into the driver, userspace can check this attribute to find
      the supported features.
      
      Create/remove a device:
      mkdir/rmdir /mnt/nullb/a
      
      Then configure the device by setting attributes under /mnt/nullb/a, most
      of nullb supported module parameters are converted to attributes:
      size; /* device size in MB */
      completion_nsec; /* time in ns to complete a request */
      submit_queues; /* number of submission queues */
      home_node; /* home node for the device */
      queue_mode; /* block interface */
      blocksize; /* block size */
      irqmode; /* IRQ completion handler */
      hw_queue_depth; /* queue depth */
      use_lightnvm; /* register as a LightNVM device */
      blocking; /* blocking blk-mq device */
      use_per_node_hctx; /* use per-node allocation for hardware context */
      
      Note, creating a device doesn't create a disk immediately. Creating a
      disk is done in two phases: create a device and then power on the
      device. Next patch will introduce device power on.
      
      Based on original patch from Kyungchan Koh
      Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3bf2bd20
    • S
      nullb: factor disk parameters · 2984c868
      Shaohua Li 提交于
      When we switch to configfs interface, each disk could have different
      configuration. To prepare for the change, we move most disk setting to a
      separate data structure. The existing module parameter interface is
      kept. The 'nr_devices' and 'shared_tags' don't make sense for per-disk
      setting, so they are remained as global settings.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2984c868
    • D
      skd: error pointer dereference in skd_cons_disk() · 92d499d4
      Dan Carpenter 提交于
      My initial impulse was to check for IS_ERR_OR_NULL() but when I looked
      at this code a bit more closely, we should only need to check for
      IS_ERR().
      
      The blk_mq_alloc_tag_set() returns negative error codes and zero on
      success so we can just do an "if (rc) goto err_out;".  It's better to
      preserve the error code anyhow.  The blk_mq_init_queue() returns error
      pointers on failure, it never returns NULL.  We can also remove the
      "q = NULL;" at the start because that's no longer needed.
      
      Fixes: ca33dd92 ("skd: Convert to blk-mq")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      92d499d4