1. 23 8月, 2017 8 次提交
    • S
      nullb: emulate cache · deb78b41
      Shaohua Li 提交于
      Software must flush disk cache to guarantee data safety. To check if
      software correctly does disk cache flush, we must know the behavior of
      disk. But physical disk behavior is uncontrollable. Even software
      doesn't do the flush, the disk probably does the flush. This patch tries
      to emulate a cache in the test disk.
      
      All write will go to a cache first, when the cache is full, we then
      flush some data to disk storage. A flush request will flush all data of
      the cache to disk storage. A FUA write will write to memory store
      directly and revalidate data in cache. If there is a power failure (by
      writing to power attribute, 'echo 0 > disk_name/power'), we discard all
      data in the cache, but preserve the data in disk storage. Later we can
      power on the disk again as usual (write 1 to 'power' attribute), then we
      can check data integrity and very if software does everything correctly.
      
      A new attribute 'cache_size' (in MB) is added to configure cache size.
      
      Based on original patch from Kyungchan Koh
      Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      deb78b41
    • S
      nullb: bandwidth control · eff2c4f1
      Shaohua Li 提交于
      In test, we usually expect controllable disk speed. For example, in a
      raid array, we'd like some disks are fast and some are slow. MD RAID
      actually has a feature for this. To test the feature, we'd like to make
      the disk run in specific speed.
      
      block throttling probably can be used for this purpose, but it requires
      cgroup setup. Here we just implement a simple throttling mechanism in
      the driver. There is slight fluctuation in the mechanism, but it's good
      enough for test.
      
      To configure the bandwidth cap, user sets the 'mbps' attribute. mbps is
      MB/s.
      
      Based on original patch from Kyungchan Koh
      Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eff2c4f1
    • S
      nullb: support discard · 306eb6b4
      Shaohua Li 提交于
      discard makes sense for memory backed disk. And also it's useful to test
      if upper layer supports dicard correctly.
      
      User configures 'discard' attribute to enable/disable dicard support.
      
      Based on original patch from Kyungchan Koh
      Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      306eb6b4
    • S
      nullb: support memory backed store · 5bcd0e0c
      Shaohua Li 提交于
      This adds memory backed store in nullb.
      
      User configure 'memory_backed' attribute for this. By default, nullb
      disk doesn't use memory backed store.
      
      Based on original patch from Kyungchan Koh
      Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5bcd0e0c
    • S
      nullb: use ida to manage index · 94bc02e3
      Shaohua Li 提交于
      We now dynamically create disks. Managing the disk index with ida to
      avoid bump up the index too much.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      94bc02e3
    • S
      nullb: add interface to power on disk · cedcafad
      Shaohua Li 提交于
      The device created in nullb configfs interface isn't power on by
      default. After user configures the device, user can do 'echo 1 >
      xxx/nullb/device_name/power' to power on the device, which will create a
      disk. the xxx/nullb/device_name/index is the disk index, so if the index
      is 2, the new created disk should be named as /dev/nullb2. Note, the
      'index' is only valid after disk is power on.
      
      'echo 0 > xxx/nullb/device_name/power' will remove the disk. Note, this
      doesn't remove the device. To remove the device, user should do 'rmdir
      xxx/nullb/device_name'. Removing the device will remove the disk too.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cedcafad
    • S
      nullb: add configfs interface · 3bf2bd20
      Shaohua Li 提交于
      Add configfs interface for nullb. configfs interface is more flexible
      and easy to configure in a per-disk basis.
      
      Configuration is something like this:
      mount -t configfs none /mnt
      
      Checking which features the driver supports:
      cat /mnt/nullb/features
      
      The 'features' attribute is for future extension. We probably will add
      new features into the driver, userspace can check this attribute to find
      the supported features.
      
      Create/remove a device:
      mkdir/rmdir /mnt/nullb/a
      
      Then configure the device by setting attributes under /mnt/nullb/a, most
      of nullb supported module parameters are converted to attributes:
      size; /* device size in MB */
      completion_nsec; /* time in ns to complete a request */
      submit_queues; /* number of submission queues */
      home_node; /* home node for the device */
      queue_mode; /* block interface */
      blocksize; /* block size */
      irqmode; /* IRQ completion handler */
      hw_queue_depth; /* queue depth */
      use_lightnvm; /* register as a LightNVM device */
      blocking; /* blocking blk-mq device */
      use_per_node_hctx; /* use per-node allocation for hardware context */
      
      Note, creating a device doesn't create a disk immediately. Creating a
      disk is done in two phases: create a device and then power on the
      device. Next patch will introduce device power on.
      
      Based on original patch from Kyungchan Koh
      Signed-off-by: NKyungchan Koh <kkc6196@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3bf2bd20
    • S
      nullb: factor disk parameters · 2984c868
      Shaohua Li 提交于
      When we switch to configfs interface, each disk could have different
      configuration. To prepare for the change, we move most disk setting to a
      separate data structure. The existing module parameter interface is
      kept. The 'nr_devices' and 'shared_tags' don't make sense for per-disk
      setting, so they are remained as global settings.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2984c868
  2. 08 8月, 2017 2 次提交
  3. 06 7月, 2017 1 次提交
  4. 21 6月, 2017 1 次提交
    • J
      null_blk: add support for shared tags · 82f402fe
      Jens Axboe 提交于
      Some storage drivers need to share tag sets between devices. It's
      useful to be able to model that with null_blk, to find hangs or
      performance issues.
      
      Add a 'shared_tags' bool module parameter that. If that is set to
      true and nr_devices is bigger than 1, all devices allocated will
      share the same tag set.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      82f402fe
  5. 09 6月, 2017 2 次提交
    • C
      blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653
      Christoph Hellwig 提交于
      Use the same values for use for request completion errors as the return
      value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
      a requeue, and all the others are completed as-is.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fc17b653
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca
  6. 21 4月, 2017 2 次提交
  7. 20 4月, 2017 1 次提交
  8. 31 3月, 2017 2 次提交
  9. 01 2月, 2017 1 次提交
    • C
      block: fold cmd_type into the REQ_OP_ space · aebf526b
      Christoph Hellwig 提交于
      Instead of keeping two levels of indirection for requests types, fold it
      all into the operations.  The little caveat here is that previously
      cmd_type only applied to struct request, while the request and bio op
      fields were set to plain REQ_OP_READ/WRITE even for passthrough
      operations.
      
      Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
      private requests, althought it has to add two for each so that we
      can communicate the data in/out nature of the request.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      aebf526b
  10. 31 1月, 2017 2 次提交
  11. 26 12月, 2016 1 次提交
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
  12. 16 11月, 2016 1 次提交
  13. 21 9月, 2016 2 次提交
    • M
      lightnvm: control life of nvm_dev in driver · b0b4e09c
      Matias Bjørling 提交于
      LightNVM compatible device drivers does not have a method to expose
      LightNVM specific sysfs entries.
      
      To enable LightNVM sysfs entries to be exposed, lightnvm device
      drivers require a struct device to attach it to. To allow both the
      actual device driver and lightnvm sysfs entries to coexist, the device
      driver tracks the lifetime of the nvm_dev structure.
      
      This patch refactors NVMe and null_blk to handle the lifetime of struct
      nvm_dev, which eliminates the need for struct gendisk when a lightnvm
      compatible device is provided.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b0b4e09c
    • M
      null_blk: refactor to support non-gendisk devices · 9ae2d0aa
      Matias Bjørling 提交于
      With LightNVM enabled devices, the gendisk structure is not exposed
      to the user. This hides the device driver specific sysfs entries, and
      prevents binding of LightNVM geometry information to the device.
      
      Refactor the device registration process, so that gendisk and
      non-gendisk devices are easily managed.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9ae2d0aa
  14. 15 9月, 2016 1 次提交
  15. 21 7月, 2016 1 次提交
  16. 19 3月, 2016 1 次提交
  17. 11 2月, 2016 1 次提交
    • M
      null_blk: oops when initializing without lightnvm · a514379b
      Matias Bjørling 提交于
      If the LightNVM subsystem is not compiled into the kernel, and the
      null_blk device driver requests lightnvm to be initialized. The call to
      nvm_register fails and the null_add_dev function cleans up the
      initialization. However, at this point the null block device has
      already been added to the nullb_list and thus a second cleanup will
      occur when the function has returned, that leads to a double call to
      blk_cleanup_queue.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a514379b
  18. 05 2月, 2016 1 次提交
    • M
      lightnvm: allow to force mm initialization · bf643185
      Matias Bjørling 提交于
      System block allows the device to initialize with its configured media
      manager. The system blocks is written to disk, and read again when media
      manager is determined. For this to work, the backend must store the
      data. Device drivers, such as null_blk, does not have any backend
      storage. This patch allows the media manager to be initialized without a
      storage backend.
      
      It also fix incorrect configuration of capabilities in null_blk, as it
      does not support get/set bad block interface.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      bf643185
  19. 14 1月, 2016 1 次提交
    • A
      null_blk: use sector_div instead of do_div · e93d12ae
      Arnd Bergmann 提交于
      Dividing a sector_t number should be done using sector_div rather than do_div
      to optimize the 32-bit sector_t case, and with the latest do_div optimizations,
      we now get a compile-time warning for this:
      
      arch/arm/include/asm/div64.h:32:95: note: expected 'uint64_t * {aka long long unsigned int *}' but argument is of type 'sector_t * {aka long unsigned int *}'
      drivers/block/null_blk.c:521:81: warning: comparison of distinct pointer types lacks a cast
      
      This changes the newly added code to use sector_div. It is a simplified version
      of the original patch, as Linus Torvalds pointed out that we should not be using
      an expensive division function in the first place.
      
      This version was suggested by Matias Bjorling.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Matias Bjorling <m@bjorling.me>
      Fixes: b2b7e001 ("null_blk: register as a LightNVM device")
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e93d12ae
  20. 12 1月, 2016 1 次提交
    • M
      lightnvm: refactor end_io functions for sync · 91276162
      Matias Bjørling 提交于
      To implement sync I/O support within the LightNVM core, the end_io
      functions are refactored to take an end_io function pointer instead of
      testing for initialized media manager, followed by calling its end_io
      function.
      
      Sync I/O can then be implemented using a callback that signal I/O
      completion. This is similar to the logic found in blk_to_execute_io().
      By implementing it this way, the underlying device I/Os submission logic
      is abstracted away from core, targets, and media managers.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      91276162
  21. 29 12月, 2015 1 次提交
  22. 23 12月, 2015 1 次提交
  23. 09 12月, 2015 1 次提交
  24. 08 12月, 2015 1 次提交
  25. 02 12月, 2015 3 次提交