提交 · 986994a27587efd8ce4c595cab89b570f7475359 · openanolis / cloud-kernel

23 2月, 2017 3 次提交

nvme: Use CNS as 8-bit field and avoid endianness conversion · 986994a2

由 Parav Pandit 提交于 1月 26, 2017

This patch defines CNS field as 8-bit field and avoids cpu_to/from_le
conversions.
Also initialize nvme_command cns value explicitly to NVME_ID_CNS_NS
for readability (don't rely on the fact that NVME_ID_CNS_NS = 0).
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

986994a2

nvme: add semicolon in nvme_command setting · 778f067c

由 Max Gurtovoy 提交于 1月 26, 2017

Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

778f067c

nvme: Make controller state visible via sysfs · 8432bdb2

由 Sagi Grimberg 提交于 11月 28, 2016

Easier for debugging and testing state machine
transitions.
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

8432bdb2

18 2月, 2017 2 次提交

nvme: Check for Security send/recv support before issuing commands. · 8a9ae523

由 Scott Bauer 提交于 2月 17, 2017

We need to verify that the controller supports the security
commands before actually trying to issue them.
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
[hch: moved the check so that we don't call into the OPAL code if not
      supported]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

8a9ae523

block/sed-opal: allocate struct opal_dev dynamically · 4f1244c8

由 Christoph Hellwig 提交于 2月 17, 2017

Insted of bloating the containing structure with it all the time this
allocates struct opal_dev dynamically.  Additionally this allows moving
the definition of struct opal_dev into sed-opal.c.  For this a new
private data field is added to it that is passed to the send/receive
callback.  After that a lot of internals can be made private as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NScott Bauer <scott.bauer@intel.com>
Reviewed-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4f1244c8

15 2月, 2017 1 次提交

Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN · e225c20e

由 Scott Bauer 提交于 2月 14, 2017

When CONFIG_KASAN is enabled, compilation fails:

block/sed-opal.c: In function 'sed_ioctl':
block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]

Moved all the ioctl structures off the stack and dynamically allocate
using _IOC_SIZE()

Fixes: 455a7b23 ("block: Add Sed-opal library")
Reported-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e225c20e

09 2月, 2017 1 次提交

nvme: support ranged discard requests · b35ba01e

由 Christoph Hellwig 提交于 2月 08, 2017

NVMe supports up to 256 ranges per DSM command, so wire up support
for ranged discards up to that limit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

b35ba01e

07 2月, 2017 1 次提交

nvme: Add Support for Opal: Unlock from S3 & Opal Allocation/Ioctls · a98e58e5

由 Scott Bauer 提交于 2月 03, 2017

This patch implements the necessary logic to unlock an Opal
enabled device coming back from an S3.

The patch also implements the SED/Opal allocation necessary to support
the opal ioctls.
Signed-off-by: NScott Bauer <scott.bauer@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a98e58e5

01 2月, 2017 1 次提交

block: fold cmd_type into the REQ_OP_ space · aebf526b

由 Christoph Hellwig 提交于 1月 31, 2017

Instead of keeping two levels of indirection for requests types, fold it
all into the operations.  The little caveat here is that previously
cmd_type only applied to struct request, while the request and bio op
fields were set to plain REQ_OP_READ/WRITE even for passthrough
operations.

Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
private requests, althought it has to add two for each so that we
can communicate the data in/out nature of the request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

aebf526b

31 1月, 2017 1 次提交

lightnvm: add ioctls for vector I/Os · 84d4add7

由 Matias Bjørling 提交于 1月 31, 2017

Enable user-space to issue vector I/O commands through ioctls. To issue
a vector I/O, the ppa list with addresses is also required and must be
mapped for the controller to access.

For each ioctl, the result and status bits are returned as well, such
that user-space can retrieve the open-channel SSD completion bits.

The implementation covers the traditional use-cases of bad block
management, and vectored read/write/erase.
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Metadata implementation, test, and fixes.
Signed-off-by: NSimon A.F. Lund <slund@cnexlabs.com>
Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

84d4add7

12 1月, 2017 1 次提交

nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too · b5a10c5f

由 Guilherme G. Piccoli 提交于 12月 28, 2016

Commit 54adc010 ("nvme/quirk: Add a delay before checking for adapter
readiness") introduced a quirk to adapters that cannot read the bit
NVME_CSTS_RDY right after register NVME_REG_CC is set; these adapters
need a delay or else the action of reading the bit NVME_CSTS_RDY could
somehow corrupt adapter's registers state and it never recovers.

When this quirk was added, we checked ctrl->tagset in order to avoid
quirking in probe time, supposing we would never require such delay
during probe. Well, it was too optimistic; we in fact need this quirk
at probe time in some cases, like after a kexec.

In some experiments, after abnormal shutdown of machine (aka power cord
unplug), we booted into our bootloader in Power, which is a Linux kernel,
and kexec'ed into another distro. If this kexec is too quick, we end up
reaching the probe of NVMe adapter in that distro when adapter is in
bad state (not fully initialized on our bootloader). What happens next
is that nvme_wait_ready() is unable to complete, except if the quirk is
enabled.

So, this patch removes the original ctrl->tagset verification in order
to enable the quirk even on probe time.

Fixes: 54adc010 ("nvme/quirk: Add a delay before checking for adapter readiness")
Reported-by: NAndrew Byrne <byrneadw@ie.ibm.com>
Reported-by: NJaime A. H. Gomez <jahgomez@mx1.ibm.com>
Reported-by: NZachary D. Myers <zdmyers@us.ibm.com>
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-by: NJeffrey Lien <Jeff.Lien@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b5a10c5f

21 12月, 2016 1 次提交

nvme: simplify stripe quirk · e6282aef

由 Keith Busch 提交于 12月 19, 2016

Some OEMs believe they own the Identify Controller vendor specific
region and will repurpose it with their own values. While not common,
we can't rely on the PCI VID:DID to tell use how to decode the field
we reserved for this as the stripe size so we need to do something else
for the list of devices using this quirk.

The field was supposed to allow flexibility on the device's back-end
striping, but it turned out that never materialized; the chunk is always
the same as MDTS in the products subscribing to this quirk, so this
patch removes the stripe_size field and sets the chunk to the max hw
transfer size for the devices using this quirk.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e6282aef

14 12月, 2016 1 次提交

Revert "nvme: add support for the Write Zeroes command" · cdb98c26

由 Linus Torvalds 提交于 12月 13, 2016

This reverts commit 6d31e3ba.

This causes bootup problems for me both on my laptop and my desktop.
What they have in common is that they have NVMe disks with dm-crypt, but
it's not the same controller, so it's not controller-specific.

Jens does not see it on his machine (also NVMe), so it's presumably
something that triggers just on bootup.  Possibly related to dm-crypt
and the fact that I mark my luks volume with "allow-discards" in
/etc/crypttab.

It's 100% repeatable for me, which made it fairly straightforward to
bisect the problem to this commit. Small mercies.

So we don't know what the reason is yet, but the revert is needed to get
things going again.
Acked-by: NJens Axboe <axboe@fb.com>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cdb98c26

09 12月, 2016 1 次提交

block: improve handling of the magic discard payload · f9d03f96

由 Christoph Hellwig 提交于 12月 08, 2016

Instead of allocating a single unused biovec for discard requests, send
them down without any payload.  Instead we allow the driver to add a
"special" payload using a biovec embedded into struct request (unioned
over other fields never used while in the driver), and overloading
the number of segments for this case.

This has a couple of advantages:

 - we don't have to allocate the bio_vec
 - the amount of special casing for discard requests in the block
   layer is significantly reduced
 - using this same scheme for other request types is trivial,
   which will be important for implementing the new WRITE_ZEROES
   op on devices where it actually requires a payload (e.g. SCSI)
 - we can get rid of playing games with the request length, as
   we'll never touch it and completions will work just fine
 - it will allow us to support ranged discard operations in the
   future by merging non-contiguous discard bios into a single
   request
 - last but not least it removes a lot of code

This patch is the common base for my WIP series for ranges discards and to
remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES,
so it would be good to get it in quickly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

f9d03f96

06 12月, 2016 1 次提交

nvme-fabrics: set sqe.command_id in core not transports · 721b3917

由 James Smart 提交于 10月 21, 2016

Currently, core.c sets command_id only on rd/wr commands, leaving it to
the transport to set it again to ensure the request had a command id.

Move location of set in core so applies to all commands.
Remove transport sets.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

721b3917

01 12月, 2016 1 次提交

nvme: add support for the Write Zeroes command · 6d31e3ba

由 Chaitanya Kulkarni 提交于 11月 30, 2016

Allow write zeroes operations (REQ_OP_WRITE_ZEROES) on the block
device, if the device supports optional command bit set for write
zeroes. Add support to setup write zeroes command. Set maximum possible
write zeroes sectors in one write zeroes command according to
nvme write zeroes command definition.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

6d31e3ba

30 11月, 2016 1 次提交

nvme: lightnvm: attach lightnvm sysfs to nvme block device · 3dc87dd0

由 Matias Bjørling 提交于 11月 28, 2016

Previously, LBA read and write were not supported in the lightnvm
specification. Now that it supports it, lets use the traditional
NVMe gendisk, and attach the lightnvm sysfs geometry export.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

3dc87dd0

16 11月, 2016 1 次提交

nvme: untangle 0 and BLK_MQ_RQ_QUEUE_OK · bac0000a

由 Omar Sandoval 提交于 11月 15, 2016

Let's not depend on any of the BLK_MQ_RQ_QUEUE_* constants having
specific values. No functional change.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

bac0000a

11 11月, 2016 2 次提交

nvme: don't pass the full CQE to nvme_complete_async_event · 7bf58533

由 Christoph Hellwig 提交于 11月 10, 2016

We only need the status and result fields, and passing them explicitly
makes life a lot easier for the Fibre Channel transport which doesn't
have a full CQE for the fast path case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7bf58533

nvme: introduce struct nvme_request · d49187e9

由 Christoph Hellwig 提交于 11月 10, 2016

This adds a shared per-request structure for all NVMe I/O.  This structure
is embedded as the first member in all NVMe transport drivers request
private data and allows to implement common functionality between the
drivers.

The first use is to replace the current abuse of the SCSI command
passthrough fields in struct request for the NVMe command passthrough,
but it will grow a field more fields to allow implementing things
like common abort handlers in the future.

The passthrough commands are handled by having a pointer to the SQE
(struct nvme_command) in struct nvme_request, and the union of the
possible result fields, which had to be turned from an anonymous
into a named union for that purpose.  This avoids having to pass
a reference to a full CQE around and thus makes checking the result
a lot more lightweight.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

d49187e9

03 11月, 2016 4 次提交

nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code · a6eaa884

由 Bart Van Assche 提交于 10月 28, 2016

Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of
QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations
that became superfluous because of this change. Change
blk_queue_stopped() tests into blk_mq_queue_stopped().

This patch fixes a race condition: using queue_flag_clear_unlocked()
is not safe if any other function that manipulates the queue flags
can be called concurrently, e.g. blk_cleanup_queue().
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

a6eaa884

nvme: Fix a race condition related to stopping queues · 3174dd33

由 Bart Van Assche 提交于 10月 28, 2016

Avoid that nvme_queue_rq() is still running when nvme_stop_queues()
returns.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Keith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

3174dd33

blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request() · 2b053aca

由 Bart Van Assche 提交于 10月 28, 2016

Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
are followed by kicking the requeue list. Hence add an argument to
these two functions that allows to kick the requeue list. This was
proposed by Christoph Hellwig.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

2b053aca

blk-mq: Remove blk_mq_cancel_requeue_work() · 9b7dd572

由 Bart Van Assche 提交于 10月 28, 2016

Since blk_mq_requeue_work() no longer restarts stopped queues
canceling requeue work is no longer needed to prevent that a
stopped queue would be restarted. Hence remove this function.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

9b7dd572

20 10月, 2016 2 次提交

nvme: use symbolic constants for CNS values · fa606826

由 Christoph Hellwig 提交于 9月 30, 2016

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

fa606826

nvme: Add tertiary number to NVME_VS · 8ef2074d

由 Gabriel Krisman Bertazi 提交于 10月 19, 2016

NVMe 1.2.1 specification adds a tertiary element to the version number.
This updates the macro and its callers to include the final number and
fixup a single place in nvmet where the version was generated manually.
Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

8ef2074d

12 10月, 2016 1 次提交

nvme: Stop probing a removed device · 0df1e4f5

由 Keith Busch 提交于 10月 11, 2016

There is no reason the nvme controller can ever return all 1's from
reading the CSTS register. This patch returns an error if we observe
that status. Without this, we may incorrectly proceed with controller
initialization and unnecessarilly rely on error handling to clean this.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0df1e4f5

25 9月, 2016 1 次提交

nvme: Pass pointers, not dma addresses, to nvme_get/set_features() · 1a6fe74d

由 Andy Lutomirski 提交于 9月 16, 2016

Any user I can imagine that needs a buffer at all will want to pass
a pointer directly.  There are no currently callers that use
buffers, so this change is painless, and it will make it much easier
to start using features that use buffers (e.g. APST).
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Tested-by: NJay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

1a6fe74d

21 9月, 2016 3 次提交

lightnvm: expose device geometry through sysfs · 40267efd

由 Simon A. F. Lund 提交于 9月 16, 2016

For a host to access an Open-Channel SSD, it has to know its geometry,
so that it writes and reads at the appropriate device bounds.

Currently, the geometry information is kept within the kernel, and not
exported to user-space for consumption. This patch exposes the
configuration through sysfs and enables user-space libraries, such as
liblightnvm, to use the sysfs implementation to get the geometry of an
Open-Channel SSD.

The sysfs entries are stored within the device hierarchy, and can be
found using the "lightnvm" device type.

An example configuration looks like this:

/sys/class/nvme/
└── nvme0n1
   ├── capabilities: 3
   ├── device_mode: 1
   ├── erase_max: 1000000
   ├── erase_typ: 1000000
   ├── flash_media_type: 0
   ├── media_capabilities: 0x00000001
   ├── media_type: 0
   ├── multiplane: 0x00010101
   ├── num_blocks: 1022
   ├── num_channels: 1
   ├── num_luns: 4
   ├── num_pages: 64
   ├── num_planes: 1
   ├── page_size: 4096
   ├── prog_max: 100000
   ├── prog_typ: 100000
   ├── read_max: 10000
   ├── read_typ: 10000
   ├── sector_oob_size: 0
   ├── sector_size: 4096
   ├── media_manager: gennvm
   ├── ppa_format: 0x380830082808001010102008
   ├── vendor_opcode: 0
   ├── max_phys_secs: 64
   └── version: 1
Signed-off-by: NSimon A. F. Lund <slund@cnexlabs.com>
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

40267efd

lightnvm: control life of nvm_dev in driver · b0b4e09c

由 Matias Bjørling 提交于 9月 16, 2016

LightNVM compatible device drivers does not have a method to expose
LightNVM specific sysfs entries.

To enable LightNVM sysfs entries to be exposed, lightnvm device
drivers require a struct device to attach it to. To allow both the
actual device driver and lightnvm sysfs entries to coexist, the device
driver tracks the lifetime of the nvm_dev structure.

This patch refactors NVMe and null_blk to handle the lifetime of struct
nvm_dev, which eliminates the need for struct gendisk when a lightnvm
compatible device is provided.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

b0b4e09c

nvme: refactor namespaces to support non-gendisk devices · ac81bfa9

由 Matias Bjørling 提交于 9月 16, 2016

With LightNVM enabled namespaces, the gendisk structure is not exposed
to the user. This prevents LightNVM users from accessing the NVMe device
driver specific sysfs entries, and LightNVM namespace geometry.

Refactor the revalidation process, so that a namespace, instead of a
gendisk, is revalidated. This later allows patches to wire up the
sysfs entries up to a non-gendisk namespace.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

ac81bfa9

15 9月, 2016 1 次提交

nvme: remove the post_scan callout · b5af7f2f

由 Christoph Hellwig 提交于 9月 14, 2016

No need now that we don't have to reverse engineer the irq affinity.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b5af7f2f

24 8月, 2016 1 次提交

nvme: Fix nvme_get/set_features() with a NULL result pointer · 9b47f77a

由 Andy Lutomirski 提交于 8月 24, 2016

nvme_set_features() callers seem to expect that passing NULL as the
result pointer is acceptable.  Teach nvme_set_features() not to try to
write to the NULL address.

For symmetry, make the same change to nvme_get_features(), despite the
fact that all current callers pass a valid result pointer.

I assume that this bug hasn't been reported in practice because
the callers that pass NULL are all in the SCSI translation layer
and no one uses the relevant operations.

Cc: stable@vger.kernel.org
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

9b47f77a

15 8月, 2016 1 次提交

nvme: Prevent controller state invalid transition · f6b6a28e

由 Gabriel Krisman Bertazi 提交于 7月 29, 2016

Acquiring the nvme_ctrl lock before reading ctrl->state in
nvme_change_ctrl_state() should prevent a theoretical invalid state
transition, in the event of two threads racing inside that function.

I haven't been able to observe this happening with the current code, and
the current state machine seems to be simple enough to not be
affected by these invalid transitions, but future modifications could
make it more likely to happen.
Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: NSagi Grimberg <sag@grimberg.me>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

f6b6a28e

21 7月, 2016 2 次提交

nvme: initialize variable before logical OR'ing it · fa9a89fc

由 Jay Freyensee 提交于 7月 20, 2016

It is typically not good coding or secure coding practice
to logical OR a variable without an initialization value first.
Here on this line:

integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;

BLK_INTEGRITY_DEVICE_CAPABLE is being OR'ed to a member variable
never set to an initial value. This patch fixes that.
Signed-off-by: NJay Freyensee <james.p.freyensee@intel.com>
Reviewed-by: NMing Lin <ming.l@samsung.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

fa9a89fc

block: ensure bios return from blk_get_request are properly initialized · 0c4de0f3

由 Christoph Hellwig 提交于 7月 19, 2016

blk_get_request is used for BLOCK_PC and similar passthrough requests.
Currently we always need to call blk_rq_set_block_pc or an open coded
version of it to allow appending bios using the request mapping helpers
later on, which is a somewhat awkward API.  Instead move the
initialization part of blk_rq_set_block_pc into blk_get_request, so that
we always have a safe to use request.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

0c4de0f3

14 7月, 2016 2 次提交

NVMe: don't allocate unused nvme_major · b09dcf58

由 NeilBrown 提交于 7月 13, 2016

When alloc_disk(0) is used, the ->major number is ignored.  All device
numbers are allocated with a major of BLOCK_EXT_MAJOR.

So remove all references to nvme_major.

[akpm@linux-foundation.org: one unregister_blkdev() was missed]
Link: http://lkml.kernel.org/r/20160602064318.4403.93301.stgit@nobleSigned-off-by: NNeilBrown <neilb@suse.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Maxim Levitsky <maximlevitsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

b09dcf58

nvme: Remove RCU namespace protection · 32f0c4af

由 Keith Busch 提交于 7月 13, 2016

We can't sleep with RCU read lock held, but we need to do potentially
blocking stuff to namespace queues when iterating the list. This patch
removes the RCU locking and holds a mutex instead.

To prevent deadlocks, this patch removes holding the mutex during
namespace scanning and removal. The unlocked namespace scanning is made
safe by holding a reference to the namespace being scanned.

List iteration that does IO has to be unlocked to allow error recovery.
The caller must ensure the list can not be manipulated during such an
event, so this patch adds a comment explaining this requirement to the
only function that iterates an unlocked list. All callers currently
meet this requirement, so no further changes required.

List iterations that do not do IO can safely use the lock since it couldn't
block recovery from missing forced IO completions.

Reported-by: Ming Lin <mlin at kernel.org>
[fixes 0bf77e9d nvme: switch to RCU freeing the namespace]
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

32f0c4af

13 7月, 2016 1 次提交

nvme: Limit command retries · f80ec966

由 Keith Busch 提交于 7月 12, 2016

Many controller implementations will return errors to commands that will
not succeed, but without the DNR bit set. The driver previously retried
these commands an unlimited number of times until the command timeout
has exceeded, which takes an unnecessarilly long period of time.

This patch limits the number of retries a command can have, defaulting
to 5, but is user tunable at load or runtime.

The struct request's 'retries' field is used to track the number of
retries attempted. This is in contrast with scsi's use of this field,
which indicates how many retries are allowed.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

f80ec966

12 7月, 2016 1 次提交

nvme/quirk: Add a delay before checking for adapter readiness · 54adc010

由 Guilherme G. Piccoli 提交于 6月 14, 2016

When disabling the controller, the specification says the register
NVME_REG_CC should be written and then driver needs to wait the
adapter to be ready, which is checked by reading another register
bit (NVME_CSTS_RDY). There's a timeout validation in this checking,
so in case this timeout is reached the driver gives up and removes
the adapter from the system.

After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003)
(HGST adapter) end up being removed if we issue a reset_controller,
because driver keeps verifying the NVME_REG_CSTS until the timeout is
reached. This patch adds a necessary quirk for this adapter, by
introducing a delay before nvme_wait_ready(), so the reset procedure
is able to be completed. This quirk is needed because just increasing
the timeout is not enough in case of this adapter - the driver must
wait before start reading NVME_REG_CSTS register on this specific
device.
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

54adc010

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功