1. 12 1月, 2017 1 次提交
    • G
      nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too · b5a10c5f
      Guilherme G. Piccoli 提交于
      Commit 54adc010 ("nvme/quirk: Add a delay before checking for adapter
      readiness") introduced a quirk to adapters that cannot read the bit
      NVME_CSTS_RDY right after register NVME_REG_CC is set; these adapters
      need a delay or else the action of reading the bit NVME_CSTS_RDY could
      somehow corrupt adapter's registers state and it never recovers.
      
      When this quirk was added, we checked ctrl->tagset in order to avoid
      quirking in probe time, supposing we would never require such delay
      during probe. Well, it was too optimistic; we in fact need this quirk
      at probe time in some cases, like after a kexec.
      
      In some experiments, after abnormal shutdown of machine (aka power cord
      unplug), we booted into our bootloader in Power, which is a Linux kernel,
      and kexec'ed into another distro. If this kexec is too quick, we end up
      reaching the probe of NVMe adapter in that distro when adapter is in
      bad state (not fully initialized on our bootloader). What happens next
      is that nvme_wait_ready() is unable to complete, except if the quirk is
      enabled.
      
      So, this patch removes the original ctrl->tagset verification in order
      to enable the quirk even on probe time.
      
      Fixes: 54adc010 ("nvme/quirk: Add a delay before checking for adapter readiness")
      Reported-by: NAndrew Byrne <byrneadw@ie.ibm.com>
      Reported-by: NJaime A. H. Gomez <jahgomez@mx1.ibm.com>
      Reported-by: NZachary D. Myers <zdmyers@us.ibm.com>
      Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Acked-by: NJeffrey Lien <Jeff.Lien@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      b5a10c5f
  2. 21 12月, 2016 1 次提交
    • K
      nvme: simplify stripe quirk · e6282aef
      Keith Busch 提交于
      Some OEMs believe they own the Identify Controller vendor specific
      region and will repurpose it with their own values. While not common,
      we can't rely on the PCI VID:DID to tell use how to decode the field
      we reserved for this as the stripe size so we need to do something else
      for the list of devices using this quirk.
      
      The field was supposed to allow flexibility on the device's back-end
      striping, but it turned out that never materialized; the chunk is always
      the same as MDTS in the products subscribing to this quirk, so this
      patch removes the stripe_size field and sets the chunk to the max hw
      transfer size for the devices using this quirk.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      e6282aef
  3. 14 12月, 2016 1 次提交
    • L
      Revert "nvme: add support for the Write Zeroes command" · cdb98c26
      Linus Torvalds 提交于
      This reverts commit 6d31e3ba.
      
      This causes bootup problems for me both on my laptop and my desktop.
      What they have in common is that they have NVMe disks with dm-crypt, but
      it's not the same controller, so it's not controller-specific.
      
      Jens does not see it on his machine (also NVMe), so it's presumably
      something that triggers just on bootup.  Possibly related to dm-crypt
      and the fact that I mark my luks volume with "allow-discards" in
      /etc/crypttab.
      
      It's 100% repeatable for me, which made it fairly straightforward to
      bisect the problem to this commit. Small mercies.
      
      So we don't know what the reason is yet, but the revert is needed to get
      things going again.
      Acked-by: NJens Axboe <axboe@fb.com>
      Cc: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cdb98c26
  4. 09 12月, 2016 1 次提交
    • C
      block: improve handling of the magic discard payload · f9d03f96
      Christoph Hellwig 提交于
      Instead of allocating a single unused biovec for discard requests, send
      them down without any payload.  Instead we allow the driver to add a
      "special" payload using a biovec embedded into struct request (unioned
      over other fields never used while in the driver), and overloading
      the number of segments for this case.
      
      This has a couple of advantages:
      
       - we don't have to allocate the bio_vec
       - the amount of special casing for discard requests in the block
         layer is significantly reduced
       - using this same scheme for other request types is trivial,
         which will be important for implementing the new WRITE_ZEROES
         op on devices where it actually requires a payload (e.g. SCSI)
       - we can get rid of playing games with the request length, as
         we'll never touch it and completions will work just fine
       - it will allow us to support ranged discard operations in the
         future by merging non-contiguous discard bios into a single
         request
       - last but not least it removes a lot of code
      
      This patch is the common base for my WIP series for ranges discards and to
      remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES,
      so it would be good to get it in quickly.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f9d03f96
  5. 06 12月, 2016 1 次提交
  6. 01 12月, 2016 1 次提交
  7. 30 11月, 2016 1 次提交
  8. 16 11月, 2016 1 次提交
  9. 11 11月, 2016 2 次提交
    • C
      nvme: don't pass the full CQE to nvme_complete_async_event · 7bf58533
      Christoph Hellwig 提交于
      We only need the status and result fields, and passing them explicitly
      makes life a lot easier for the Fibre Channel transport which doesn't
      have a full CQE for the fast path case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7bf58533
    • C
      nvme: introduce struct nvme_request · d49187e9
      Christoph Hellwig 提交于
      This adds a shared per-request structure for all NVMe I/O.  This structure
      is embedded as the first member in all NVMe transport drivers request
      private data and allows to implement common functionality between the
      drivers.
      
      The first use is to replace the current abuse of the SCSI command
      passthrough fields in struct request for the NVMe command passthrough,
      but it will grow a field more fields to allow implementing things
      like common abort handlers in the future.
      
      The passthrough commands are handled by having a pointer to the SQE
      (struct nvme_command) in struct nvme_request, and the union of the
      possible result fields, which had to be turned from an anonymous
      into a named union for that purpose.  This avoids having to pass
      a reference to a full CQE around and thus makes checking the result
      a lot more lightweight.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      d49187e9
  10. 03 11月, 2016 4 次提交
  11. 20 10月, 2016 2 次提交
  12. 12 10月, 2016 1 次提交
  13. 25 9月, 2016 1 次提交
  14. 21 9月, 2016 3 次提交
    • S
      lightnvm: expose device geometry through sysfs · 40267efd
      Simon A. F. Lund 提交于
      For a host to access an Open-Channel SSD, it has to know its geometry,
      so that it writes and reads at the appropriate device bounds.
      
      Currently, the geometry information is kept within the kernel, and not
      exported to user-space for consumption. This patch exposes the
      configuration through sysfs and enables user-space libraries, such as
      liblightnvm, to use the sysfs implementation to get the geometry of an
      Open-Channel SSD.
      
      The sysfs entries are stored within the device hierarchy, and can be
      found using the "lightnvm" device type.
      
      An example configuration looks like this:
      
      /sys/class/nvme/
      └── nvme0n1
         ├── capabilities: 3
         ├── device_mode: 1
         ├── erase_max: 1000000
         ├── erase_typ: 1000000
         ├── flash_media_type: 0
         ├── media_capabilities: 0x00000001
         ├── media_type: 0
         ├── multiplane: 0x00010101
         ├── num_blocks: 1022
         ├── num_channels: 1
         ├── num_luns: 4
         ├── num_pages: 64
         ├── num_planes: 1
         ├── page_size: 4096
         ├── prog_max: 100000
         ├── prog_typ: 100000
         ├── read_max: 10000
         ├── read_typ: 10000
         ├── sector_oob_size: 0
         ├── sector_size: 4096
         ├── media_manager: gennvm
         ├── ppa_format: 0x380830082808001010102008
         ├── vendor_opcode: 0
         ├── max_phys_secs: 64
         └── version: 1
      Signed-off-by: NSimon A. F. Lund <slund@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      40267efd
    • M
      lightnvm: control life of nvm_dev in driver · b0b4e09c
      Matias Bjørling 提交于
      LightNVM compatible device drivers does not have a method to expose
      LightNVM specific sysfs entries.
      
      To enable LightNVM sysfs entries to be exposed, lightnvm device
      drivers require a struct device to attach it to. To allow both the
      actual device driver and lightnvm sysfs entries to coexist, the device
      driver tracks the lifetime of the nvm_dev structure.
      
      This patch refactors NVMe and null_blk to handle the lifetime of struct
      nvm_dev, which eliminates the need for struct gendisk when a lightnvm
      compatible device is provided.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b0b4e09c
    • M
      nvme: refactor namespaces to support non-gendisk devices · ac81bfa9
      Matias Bjørling 提交于
      With LightNVM enabled namespaces, the gendisk structure is not exposed
      to the user. This prevents LightNVM users from accessing the NVMe device
      driver specific sysfs entries, and LightNVM namespace geometry.
      
      Refactor the revalidation process, so that a namespace, instead of a
      gendisk, is revalidated. This later allows patches to wire up the
      sysfs entries up to a non-gendisk namespace.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ac81bfa9
  15. 15 9月, 2016 1 次提交
  16. 24 8月, 2016 1 次提交
  17. 15 8月, 2016 1 次提交
  18. 21 7月, 2016 2 次提交
  19. 14 7月, 2016 2 次提交
    • N
      NVMe: don't allocate unused nvme_major · b09dcf58
      NeilBrown 提交于
      When alloc_disk(0) is used, the ->major number is ignored.  All device
      numbers are allocated with a major of BLOCK_EXT_MAJOR.
      
      So remove all references to nvme_major.
      
      [akpm@linux-foundation.org: one unregister_blkdev() was missed]
      Link: http://lkml.kernel.org/r/20160602064318.4403.93301.stgit@nobleSigned-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b09dcf58
    • K
      nvme: Remove RCU namespace protection · 32f0c4af
      Keith Busch 提交于
      We can't sleep with RCU read lock held, but we need to do potentially
      blocking stuff to namespace queues when iterating the list. This patch
      removes the RCU locking and holds a mutex instead.
      
      To prevent deadlocks, this patch removes holding the mutex during
      namespace scanning and removal. The unlocked namespace scanning is made
      safe by holding a reference to the namespace being scanned.
      
      List iteration that does IO has to be unlocked to allow error recovery.
      The caller must ensure the list can not be manipulated during such an
      event, so this patch adds a comment explaining this requirement to the
      only function that iterates an unlocked list. All callers currently
      meet this requirement, so no further changes required.
      
      List iterations that do not do IO can safely use the lock since it couldn't
      block recovery from missing forced IO completions.
      
      Reported-by: Ming Lin <mlin at kernel.org>
      [fixes 0bf77e9d nvme: switch to RCU freeing the namespace]
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      32f0c4af
  20. 13 7月, 2016 1 次提交
    • K
      nvme: Limit command retries · f80ec966
      Keith Busch 提交于
      Many controller implementations will return errors to commands that will
      not succeed, but without the DNR bit set. The driver previously retried
      these commands an unlimited number of times until the command timeout
      has exceeded, which takes an unnecessarilly long period of time.
      
      This patch limits the number of retries a command can have, defaulting
      to 5, but is user tunable at load or runtime.
      
      The struct request's 'retries' field is used to track the number of
      retries attempted. This is in contrast with scsi's use of this field,
      which indicates how many retries are allowed.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f80ec966
  21. 12 7月, 2016 1 次提交
    • G
      nvme/quirk: Add a delay before checking for adapter readiness · 54adc010
      Guilherme G. Piccoli 提交于
      When disabling the controller, the specification says the register
      NVME_REG_CC should be written and then driver needs to wait the
      adapter to be ready, which is checked by reading another register
      bit (NVME_CSTS_RDY). There's a timeout validation in this checking,
      so in case this timeout is reached the driver gives up and removes
      the adapter from the system.
      
      After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003)
      (HGST adapter) end up being removed if we issue a reset_controller,
      because driver keeps verifying the NVME_REG_CSTS until the timeout is
      reached. This patch adds a necessary quirk for this adapter, by
      introducing a delay before nvme_wait_ready(), so the reset procedure
      is able to be completed. This quirk is needed because just increasing
      the timeout is not enough in case of this adapter - the driver must
      wait before start reading NVME_REG_CSTS register on this specific
      device.
      Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      54adc010
  22. 08 7月, 2016 1 次提交
  23. 06 7月, 2016 6 次提交
  24. 28 6月, 2016 1 次提交
    • D
      block: convert to device_add_disk() · 0d52c756
      Dan Williams 提交于
      For block drivers that specify a parent device, convert them to use
      device_add_disk().
      
      This conversion was done with the following semantic patch:
      
          @@
          struct gendisk *disk;
          expression E;
          @@
      
          - disk->driverfs_dev = E;
          ...
          - add_disk(disk);
          + device_add_disk(E, disk);
      
          @@
          struct gendisk *disk;
          expression E1, E2;
          @@
      
          - disk->driverfs_dev = E1;
          ...
          E2 = disk;
          ...
          - add_disk(E2);
          + device_add_disk(E1, E2);
      
      ...plus some manual fixups for a few missed conversions.
      
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0d52c756
  25. 12 6月, 2016 2 次提交