1. 26 5月, 2017 2 次提交
  2. 21 4月, 2017 4 次提交
  3. 09 4月, 2017 1 次提交
  4. 06 4月, 2017 2 次提交
  5. 04 4月, 2017 1 次提交
  6. 02 3月, 2017 1 次提交
    • K
      nvme: Complete all stuck requests · 302ad8cc
      Keith Busch 提交于
      If the nvme driver is shutting down its controller, the drievr will not
      start the queues up again, preventing blk-mq's hot CPU notifier from
      making forward progress.
      
      To fix that, this patch starts a request_queue freeze when the driver
      resets a controller so no new requests may enter. The driver will wait
      for frozen after IO queues are restarted to ensure the queue reference
      can be reinitialized when nvme requests to unfreeze the queues.
      
      If the driver is doing a safe shutdown, the driver will wait for the
      controller to successfully complete all inflight requests so that we
      don't unnecessarily fail them. Once the controller has been disabled,
      the queues will be restarted to force remaining entered requests to end
      in failure so that blk-mq's hot cpu notifier may progress.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      302ad8cc
  7. 23 2月, 2017 2 次提交
    • A
      nvme: Enable autonomous power state transitions · c5552fde
      Andy Lutomirski 提交于
      NVMe devices can advertise multiple power states.  These states can
      be either "operational" (the device is fully functional but possibly
      slow) or "non-operational" (the device is asleep until woken up).
      Some devices can automatically enter a non-operational state when
      idle for a specified amount of time and then automatically wake back
      up when needed.
      
      The hardware configuration is a table.  For each state, an entry in
      the table indicates the next deeper non-operational state, if any,
      to autonomously transition to and the idle time required before
      transitioning.
      
      This patch teaches the driver to program APST so that each successive
      non-operational state will be entered after an idle time equal to 100%
      of the total latency (entry plus exit) associated with that state.
      The maximum acceptable latency is controlled using dev_pm_qos
      (e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational
      states with total latency greater than this value will not be used.
      As a special case, setting the latency tolerance to 0 will disable
      APST entirely.  On hardware without APST support, the sysfs file will
      not be exposed.
      
      The latency tolerance for newly-probed devices is set by the module
      parameter nvme_core.default_ps_max_latency_us.
      
      In theory, the device can expose "default" APST table, but this
      doesn't seem to function correctly on my device (Samsung 950), nor
      does it seem particularly useful.  There is also an optional
      mechanism by which a configuration can be "saved" so it will be
      automatically loaded on reset.  This can be configured from
      userspace, but it doesn't seem useful to support in the driver.
      
      On my laptop, enabling APST seems to save nearly 1W.
      
      The hardware tables can be decoded in userspace with nvme-cli.
      'nvme id-ctrl /dev/nvmeN' will show the power state table and
      'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST
      configuration.
      
      This feature is quirked off on a known-buggy Samsung device.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c5552fde
    • A
      nvme: Add a quirk mechanism that uses identify_ctrl · bd4da3ab
      Andy Lutomirski 提交于
      Currently, all NVMe quirks are based on PCI IDs.  Add a mechanism to
      define quirks based on identify_ctrl's vendor id, model number,
      and/or firmware revision.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      bd4da3ab
  8. 18 2月, 2017 2 次提交
  9. 07 2月, 2017 1 次提交
  10. 31 1月, 2017 1 次提交
  11. 14 1月, 2017 1 次提交
  12. 21 12月, 2016 1 次提交
    • K
      nvme: simplify stripe quirk · e6282aef
      Keith Busch 提交于
      Some OEMs believe they own the Identify Controller vendor specific
      region and will repurpose it with their own values. While not common,
      we can't rely on the PCI VID:DID to tell use how to decode the field
      we reserved for this as the stripe size so we need to do something else
      for the list of devices using this quirk.
      
      The field was supposed to allow flexibility on the device's back-end
      striping, but it turned out that never materialized; the chunk is always
      the same as MDTS in the products subscribing to this quirk, so this
      patch removes the stripe_size field and sets the chunk to the max hw
      transfer size for the devices using this quirk.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      e6282aef
  13. 09 12月, 2016 1 次提交
    • C
      block: improve handling of the magic discard payload · f9d03f96
      Christoph Hellwig 提交于
      Instead of allocating a single unused biovec for discard requests, send
      them down without any payload.  Instead we allow the driver to add a
      "special" payload using a biovec embedded into struct request (unioned
      over other fields never used while in the driver), and overloading
      the number of segments for this case.
      
      This has a couple of advantages:
      
       - we don't have to allocate the bio_vec
       - the amount of special casing for discard requests in the block
         layer is significantly reduced
       - using this same scheme for other request types is trivial,
         which will be important for implementing the new WRITE_ZEROES
         op on devices where it actually requires a payload (e.g. SCSI)
       - we can get rid of playing games with the request length, as
         we'll never touch it and completions will work just fine
       - it will allow us to support ranged discard operations in the
         future by merging non-contiguous discard bios into a single
         request
       - last but not least it removes a lot of code
      
      This patch is the common base for my WIP series for ranges discards and to
      remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES,
      so it would be good to get it in quickly.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f9d03f96
  14. 30 11月, 2016 1 次提交
  15. 11 11月, 2016 2 次提交
    • C
      nvme: don't pass the full CQE to nvme_complete_async_event · 7bf58533
      Christoph Hellwig 提交于
      We only need the status and result fields, and passing them explicitly
      makes life a lot easier for the Fibre Channel transport which doesn't
      have a full CQE for the fast path case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7bf58533
    • C
      nvme: introduce struct nvme_request · d49187e9
      Christoph Hellwig 提交于
      This adds a shared per-request structure for all NVMe I/O.  This structure
      is embedded as the first member in all NVMe transport drivers request
      private data and allows to implement common functionality between the
      drivers.
      
      The first use is to replace the current abuse of the SCSI command
      passthrough fields in struct request for the NVMe command passthrough,
      but it will grow a field more fields to allow implementing things
      like common abort handlers in the future.
      
      The passthrough commands are handled by having a pointer to the SQE
      (struct nvme_command) in struct nvme_request, and the union of the
      possible result fields, which had to be turned from an anonymous
      into a named union for that purpose.  This avoids having to pass
      a reference to a full CQE around and thus makes checking the result
      a lot more lightweight.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      d49187e9
  16. 25 9月, 2016 1 次提交
  17. 21 9月, 2016 2 次提交
    • S
      lightnvm: expose device geometry through sysfs · 40267efd
      Simon A. F. Lund 提交于
      For a host to access an Open-Channel SSD, it has to know its geometry,
      so that it writes and reads at the appropriate device bounds.
      
      Currently, the geometry information is kept within the kernel, and not
      exported to user-space for consumption. This patch exposes the
      configuration through sysfs and enables user-space libraries, such as
      liblightnvm, to use the sysfs implementation to get the geometry of an
      Open-Channel SSD.
      
      The sysfs entries are stored within the device hierarchy, and can be
      found using the "lightnvm" device type.
      
      An example configuration looks like this:
      
      /sys/class/nvme/
      └── nvme0n1
         ├── capabilities: 3
         ├── device_mode: 1
         ├── erase_max: 1000000
         ├── erase_typ: 1000000
         ├── flash_media_type: 0
         ├── media_capabilities: 0x00000001
         ├── media_type: 0
         ├── multiplane: 0x00010101
         ├── num_blocks: 1022
         ├── num_channels: 1
         ├── num_luns: 4
         ├── num_pages: 64
         ├── num_planes: 1
         ├── page_size: 4096
         ├── prog_max: 100000
         ├── prog_typ: 100000
         ├── read_max: 10000
         ├── read_typ: 10000
         ├── sector_oob_size: 0
         ├── sector_size: 4096
         ├── media_manager: gennvm
         ├── ppa_format: 0x380830082808001010102008
         ├── vendor_opcode: 0
         ├── max_phys_secs: 64
         └── version: 1
      Signed-off-by: NSimon A. F. Lund <slund@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      40267efd
    • M
      lightnvm: control life of nvm_dev in driver · b0b4e09c
      Matias Bjørling 提交于
      LightNVM compatible device drivers does not have a method to expose
      LightNVM specific sysfs entries.
      
      To enable LightNVM sysfs entries to be exposed, lightnvm device
      drivers require a struct device to attach it to. To allow both the
      actual device driver and lightnvm sysfs entries to coexist, the device
      driver tracks the lifetime of the nvm_dev structure.
      
      This patch refactors NVMe and null_blk to handle the lifetime of struct
      nvm_dev, which eliminates the need for struct gendisk when a lightnvm
      compatible device is provided.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b0b4e09c
  18. 15 9月, 2016 1 次提交
  19. 13 7月, 2016 1 次提交
    • K
      nvme: Limit command retries · f80ec966
      Keith Busch 提交于
      Many controller implementations will return errors to commands that will
      not succeed, but without the DNR bit set. The driver previously retried
      these commands an unlimited number of times until the command timeout
      has exceeded, which takes an unnecessarilly long period of time.
      
      This patch limits the number of retries a command can have, defaulting
      to 5, but is user tunable at load or runtime.
      
      The struct request's 'retries' field is used to track the number of
      retries attempted. This is in contrast with scsi's use of this field,
      which indicates how many retries are allowed.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f80ec966
  20. 12 7月, 2016 1 次提交
    • G
      nvme/quirk: Add a delay before checking for adapter readiness · 54adc010
      Guilherme G. Piccoli 提交于
      When disabling the controller, the specification says the register
      NVME_REG_CC should be written and then driver needs to wait the
      adapter to be ready, which is checked by reading another register
      bit (NVME_CSTS_RDY). There's a timeout validation in this checking,
      so in case this timeout is reached the driver gives up and removes
      the adapter from the system.
      
      After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003)
      (HGST adapter) end up being removed if we issue a reset_controller,
      because driver keeps verifying the NVME_REG_CSTS until the timeout is
      reached. This patch adds a necessary quirk for this adapter, by
      introducing a delay before nvme_wait_ready(), so the reset procedure
      is able to be completed. This quirk is needed because just increasing
      the timeout is not enough in case of this adapter - the driver must
      wait before start reading NVME_REG_CSTS register on this specific
      device.
      Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      54adc010
  21. 08 7月, 2016 1 次提交
  22. 06 7月, 2016 4 次提交
    • S
      nvme: add keep-alive support · 038bd4cb
      Sagi Grimberg 提交于
      Periodic keep-alive is a mandatory feature in NVMe over Fabrics, and
      optional in NVMe 1.2.1 for PCIe.  This patch adds periodic keep-alive
      sent from the host to verify that the controller is still responsive
      and vice-versa.  The keep-alive timeout is user-defined (with
      keep_alive_tmo connection parameter) and defaults to 5 seconds.
      
      In order to avoid a race condition where the host sends a keep-alive
      competing with the target side keep-alive timeout expiration, the host
      adds a grace period of 10 seconds when publishing the keep-alive timeout
      to the target.
      
      In case a keep-alive failed (or timed out), a transport specific error
      recovery kicks in.
      
      For now only NVMe over Fabrics is wired up to support keep alive, but
      we can add PCIe support easily once controllers actually supporting it
      become available.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NSteve Wise <swise@chelsio.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      038bd4cb
    • C
      nvme-fabrics: add a generic NVMe over Fabrics library · 07bfcd09
      Christoph Hellwig 提交于
      The NVMe over Fabrics library provides an interface for both transports
      and the nvme core to handle fabrics specific commands and attributes
      independent of the underlying transport.
      
      In addition, the fabrics library adds a misc device interface that allow
      actually creating a fabrics controller, as we can't just autodiscover
      it like in the PCI case.  The nvme-cli utility has been enhanced to use
      this interface to support fabric connect and discovery.
      
      Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>,
      Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>,
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      07bfcd09
    • M
      nvme: add fabrics sysfs attributes · 1a353d85
      Ming Lin 提交于
      - delete_controller: This attribute allows to delete a controller.
        A driver is not obligated to support it (pci doesn't) so it is
        created only if the driver supports it. The new fabrics drivers
        will support it (essentialy a disconnect operation).
      
        Usage:
        echo > /sys/class/nvme/nvme0/delete_controller
      
      - subsysnqn: This attribute shows the subsystem nqn of the configured
        device. If a driver does not implement the get_subsysnqn method, the
        file will not appear in sysfs.
      
      - transport: This attribute shows the transport name. Added a "name"
        field to struct nvme_ctrl_ops.
      
        For loop,
        cat /sys/class/nvme/nvme0/transport
        loop
      
        For RDMA,
        cat /sys/class/nvme/nvme0/transport
        rdma
      
        For PCIe,
        cat /sys/class/nvme/nvme0/transport
        pcie
      
      - address: This attributes shows the controller address. The fabrics
        drivers that will implement get_address can show the address of the
        connected controller.
      
        example:
        cat /sys/class/nvme/nvme0/address
        traddr=192.168.2.2,trsvcid=1023
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Reviewed-by: NJay Freyensee <james.p.freyensee@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1a353d85
    • C
      nvme: Modify and export sync command submission for fabrics · eb71f435
      Christoph Hellwig 提交于
      NVMe over fabrics will use __nvme_submit_sync_cmd in the the
      transport and require a few tweaks to it.  For that we export it
      and add a few more paramters:
      
      1. allow passing a queue ID to the block layer
      
         For the NVMe over Fabrics connect command we need to able to specify a
         queue ID that we want to send the command on.  Add a qid parameter to
         the relevant functions to enable this behavior.
      
      2. allow submitting at_head commands
      
         In cases where we want to (re)connect to a controller
         where we have inflight queued commands we want to first
         connect and only then allow the other queued commands to
         be kicked. This will prevents failures in controller resets
         and reconnects.
      
      3. allow passing flags to blk_mq_allocate_request
      
         Both for Fabrics connect the the keep-alive feature in NVMe 1.2.1 we
         want to be able to use reserved requests.
      Reviewed-by: NJay Freyensee <james.p.freyensee@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Tested-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      eb71f435
  23. 08 6月, 2016 2 次提交
  24. 18 5月, 2016 1 次提交
  25. 02 5月, 2016 3 次提交