提交 · a7dd7957acf798ac406afd6631e64a27ac4a5bf1 · openanolis / cloud-kernel

05 11月, 2014 2 次提交

由 Keith Busch 提交于 6月 18, 2014

Submits NVMe asynchronous event requests, one event up to the controller
maximum or number of possible different event types (8), whichever is
smaller. Events successfully returned by the controller are logged.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6fccf938

block: Use dma_zalloc_coherent · 4d51abf9

由 Joe Perches 提交于 6月 15, 2014

Use the zeroing function instead of dma_alloc_coherent & memset(,0,)
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4d51abf9

05 10月, 2014 1 次提交

block: disable entropy contributions for nonrot devices · b277da0a

由 Mike Snitzer 提交于 10月 04, 2014

Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set
QUEUE_FLAG_NONROT.

Historically, all block devices have automatically made entropy
contributions.  But as previously stated in commit e2e1a148 ("block: add
sysfs knob for turning off disk entropy contributions"):
    - On SSD disks, the completion times aren't as random as they
      are for rotational drives. So it's questionable whether they
      should contribute to the random pool in the first place.
    - Calling add_disk_randomness() has a lot of overhead.

There are more reliable sources for randomness than non-rotational block
devices.  From a security perspective it is better to err on the side of
caution than to allow entropy contributions from unreliable "random"
sources.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b277da0a

13 6月, 2014 1 次提交

NVMe: Fix hot cpu notification dead lock · f3db22fe

由 Keith Busch 提交于 6月 11, 2014

There is a potential dead lock if a cpu event occurs during nvme probe
since it registered with hot cpu notification. This fixes the race by
having the module register with notification outside of probe rather
than have each device register.

The actual work is done in a scheduled work queue instead of in the
notifier since assigning IO queues has the potential to block if the
driver creates additional queues.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

f3db22fe

04 6月, 2014 6 次提交

NVMe: Rename io_timeout to nvme_io_timeout · bd67608a

由 Matthew Wilcox 提交于 6月 03, 2014

It's positively immoral to have a global variable called 'io_timeout'.
Keep the module parameter called io_timeout, though.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

bd67608a

NVMe: Adhere to request queue block accounting enable/disable · b4e75cbf

由 Sam Bradshaw 提交于 5月 09, 2014

Recently, a new sysfs control "iostats" was added to selectively
enable or disable io statistics collection for request queues.  This
patch hooks that control.

IO statistics collection is rather expensive on large, multi-node
machines with drives pushing millions of iops.  Having the ability to
disable collection if not needed can improve throughput significantly.

As a data point, on a quad E5-4640, I see more than 50% throughput
improvement when io statistics accounting is disabled during heavily
multi-threaded small block random read benchmarks where device
performance is in the million iops+ range.
Signed-off-by: NSam Bradshaw <sbradshaw@micron.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

b4e75cbf

NVMe: Fix nvme get/put queue semantics · a51afb54

由 Keith Busch 提交于 5月 13, 2014

The routines to get and lock nvme queues required the caller to "put"
or "unlock" them even if getting one returned NULL. This patch fixes that.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

a51afb54

NVMe: Make admin timeout a module parameter · 9d43cf64

由 Keith Busch 提交于 5月 13, 2014

Signed-off-by: NKeith Busch <keith.busch@intel.com>
[made admin_timeout static]
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

9d43cf64

NVMe: Make iod bio timeout a parameter · 61e4ce08

由 Keith Busch 提交于 5月 13, 2014

This was originally set to 4 times the IO timeout, but that was when
the IO timeout was 5 seconds instead of 30. 20 seconds for total time
to failure seemed more reasonable than 2 minutes for most, but other
users have requested to make this a module parameter instead.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[renamed the module parameter to retry_time]
[made retry_time static]
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

61e4ce08

NVMe: Prevent possible NULL pointer dereference · 6808c5fb

由 Santosh Y 提交于 5月 29, 2014

kmalloc() used by the nvme_alloc_iod() to allocate memory for 'iod'
can fail. So check the return value.
Signed-off-by: NSantosh Y <santosh.sy@samsung.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

6808c5fb

28 5月, 2014 1 次提交

NVMe: Implement PCIe reset notification callback · f0d54a54

由 Keith Busch 提交于 5月 02, 2014

Quiesce and shutdown the device prior to reset, then restart the device and
resume IO after.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

f0d54a54

10 5月, 2014 1 次提交

NVMe: Enable BUILD_BUG_ON checks · 21bd78bc

由 Matthew Wilcox 提交于 5月 09, 2014

Since _nvme_check_size() wasn't being called from anywhere, the compiler
was optimising it away ... along with all the link-time build failures
that would result if any of the structures were the wrong size. Call it
from nvme_exit() for no particular reason.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

21bd78bc

05 5月, 2014 6 次提交

NVMe: Flush with data support · 53562be7

由 Keith Busch 提交于 4月 29, 2014

It is possible a filesystem may send a flush flagged bio with write
data. There is no such composite NVMe command, so the driver sends flush
and write separately.

The device is allowed to execute these commands in any order, so it was
possible the driver ends the bio after the write completes, but while the
flush is still active. We don't want to let a filesystem believe flush
succeeded before it really has; this could cause data corruption on a
power loss between these events. To fix, this patch splits the flush
and write into chained bios.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

53562be7

NVMe: Configure support for block flush · a7d2ce28

由 Keith Busch 提交于 4月 29, 2014

This configures an nvme request_queue as flush capable if the device
has a volatile write cache present.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

a7d2ce28

NVMe: Add tracepoints · 3291fa57

由 Keith Busch 提交于 4月 28, 2014

Adding tracepoints for bio_complete and block_split into nvme to help
with gathering IO info using blktrace and blkparse.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

3291fa57

NVMe: Protect against badly formatted CQEs · 94bbac40

由 Keith Busch 提交于 4月 24, 2014

If a misbehaving device posts a CQE with a command id < depth but for
one that was never allocated, the command info will have a callback
function set to NULL and we don't want to try invoking that.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

94bbac40

NVMe: Improve error messages · 27e8166c

由 Matthew Wilcox 提交于 4月 11, 2014

Help people diagnose what is going wrong at initialisation time by
printing out which command has gone wrong and what the device returned.
Also fix the error message printed while waiting for reset.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

27e8166c

NVMe: Update copyright headers · 8757ad65

由 Matthew Wilcox 提交于 4月 11, 2014

Make the copyright dates accurate and remove the final paragraph that
includes the address of the FSF.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

8757ad65

11 4月, 2014 6 次提交

NVMe: Retry failed commands with non-fatal errors · edd10d33

由 Keith Busch 提交于 4月 03, 2014

For commands returned with failed status, queue these for resubmission
and continue retrying them until success or for a limited amount of
time. The final timeout was arbitrarily chosen so requests can't be
retried indefinitely.

Since these are requeued on the nvmeq that submitted the command, the
callbacks have to take an nvmeq instead of an nvme_dev as a parameter
so that we can use the locked queue to append the iod to retry later.

The nvme_iod conviently can be used to track how long we've been trying
to successfully complete an iod request. The nvme_iod also provides the
nvme prp dma mappings, so I had to move a few things around so we can
keep those mappings.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[fixed checkpatch issue with long line]
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

edd10d33

NVMe: Add getgeo to block ops · 4cc09e2d

由 Keith Busch 提交于 4月 02, 2014

Some programs require HDIO_GETGEO work, which requires we implement
getgeo.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

4cc09e2d

NVMe: Start-stop nvme_thread during device add-remove. · b9afca3e

由 Dan McLeran 提交于 4月 07, 2014

Done to ensure nvme_thread is not running when there
are no devices to poll.
Signed-off-by: NDan McLeran <daniel.mcleran@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

b9afca3e

NVMe: Make I/O timeout a module parameter · b355084a

由 Keith Busch 提交于 4月 04, 2014

Increase the default timeout to 30 seconds to match SCSI.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[use byte instead of ushort]
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

b355084a

NVMe: CPU hot plug notification · 33b1e95c

由 Keith Busch 提交于 3月 24, 2014

Registers with hot cpu notification to rebalance, and potentially allocate
additional, io queues.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

33b1e95c

NVMe: per-cpu io queues · 42f61420

由 Keith Busch 提交于 3月 24, 2014

The device's IO queues are associated with CPUs, so we can use a per-cpu
variable to map the a qid to a cpu. This provides a convienient way
to optimally assign queues to multiple cpus when the device supports
fewer queues than the host has cpus. The previous implementation may
have assigned these poorly in these situations. This patch addresses
this by sharing queues among cpus that are "close" together and should
have a lower lock contention penalty.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

42f61420

24 3月, 2014 5 次提交

NVMe: Replace DEFINE_PCI_DEVICE_TABLE · 6eb0d698

由 Matthew Wilcox 提交于 3月 24, 2014

Checkpatch has started warning against using DEFINE_PCI_DEVICE_TABLE,
so replace it.  Also update the copyright date and bump the module
version number to 0.9.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

6eb0d698

NVMe: IOCTL path RCU protect queue access · 4f5099af

由 Keith Busch 提交于 3月 03, 2014

This adds rcu protected access to a queue in the nvme IOCTL path
to fix potential races between a surprise removal and queue usage in
nvme_submit_sync_cmd. The fix holds the rcu_read_lock() here to prevent
the nvme_queue from freeing while this path is executing so it can't
sleep, and so this path will no longer wait for a available command
id should they all be in use at the time a passthrough IOCTL request
is received.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

4f5099af

NVMe: RCU protected access to io queues · 5a92e700

由 Keith Busch 提交于 2月 21, 2014

This adds rcu protected access to nvme_queue to fix a race between a
surprise removal freeing the queue and a thread with open reference on
a NVMe block device using that queue.

The queues do not need to be rcu protected during the initialization or
shutdown parts, so I've added a helper function for raw deferencing
to get around the sparse errors.

There is still a hole in the IOCTL path for the same problem, which is
fixed in a subsequent patch.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

5a92e700

NVMe: Initialize device reference count earlier · fb35e914

由 Keith Busch 提交于 3月 03, 2014

If an NVMe device becomes ready but fails to create IO queues, the driver
creates a character device handle so the device can be managed. The
device reference count needs to be initialized before creating the
character device.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

fb35e914

NVMe: Add CONFIG_PM_SLEEP to suspend/resume functions · 671a6018

由 Jingoo Han 提交于 2月 13, 2014

Add CONFIG_PM_SLEEP to suspend/resume functions to fix the following
build warning when CONFIG_PM_SLEEP is not selected. This is because
sleep PM callbacks defined by SIMPLE_DEV_PM_OPS are only used when
the CONFIG_PM_SLEEP is enabled.

drivers/block/nvme-core.c:2541:12: warning: 'nvme_suspend' defined but not used [-Wunused-function]
drivers/block/nvme-core.c:2550:12: warning: 'nvme_resume' defined but not used [-Wunused-function]
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

671a6018

14 3月, 2014 1 次提交

nvme: Use pci_enable_msi_range() and pci_enable_msix_range() · be577fab

由 Alexander Gordeev 提交于 3月 04, 2014

As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range()  or pci_enable_msi_exact()
and pci_enable_msix_range() or pci_enable_msix_exact()
interfaces.
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-nvme@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

be577fab

07 3月, 2014 1 次提交

nvme: don't use PREPARE_WORK · 9ca97374

由 Tejun Heo 提交于 3月 07, 2014

PREPARE_[DELAYED_]WORK() are being phased out.  They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

nvme_dev->reset_work is multiplexed with multiple work functions.
Introduce nvme_reset_workfn() which invokes nvme_dev->reset_workfn and
always use it as the work function and update the users to set the
->reset_workfn field instead of overriding the work function using
PREPARE_WORK().

It would probably be best to route this with other related updates
through the workqueue tree.

Compile tested.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: linux-nvme@lists.infradead.org

9ca97374

03 2月, 2014 1 次提交

NVMe: Namespace use after free on surprise removal · 9ac27090

由 Keith Busch 提交于 1月 31, 2014

An nvme block device may have open references when the device is
removed. New commands may still be sent on the removed device, so we
need to ref count the opens, return errors for new commands, and not
free the namespace and nvme_dev until all references are closed.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

9ac27090

30 1月, 2014 1 次提交

NVMe: Correct uses of INIT_WORK · bdfd70fd

由 Matthew Wilcox 提交于 1月 29, 2014

We need to initialise the work_struct when we initialise the rest of the
struct nvme_dev, otherwise we'll hit a lockdep warning when we remove
the device.  Use PREPARE_WORK to change the function pointer instead
of INIT_WORK.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

bdfd70fd

28 1月, 2014 7 次提交

NVMe: Include device and queue numbers in interrupt name · 3193f07b

由 Matthew Wilcox 提交于 1月 27, 2014

On larger systems with many drives, it may help debugging to know which
queue is tied to which interrupt, just by looking at /proc/interrupts.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

3193f07b

NVMe: Add a pci_driver shutdown method · 09ece142

由 Keith Busch 提交于 1月 27, 2014

We need to shut down the device cleanly when the system is being shut down.
This was in an earlier patch but was inadvertently lost during a rewrite.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

09ece142

NVMe: Disable admin queue on init failure · a1a5ef99

由 Keith Busch 提交于 12月 16, 2013

Disable the admin queue if device fails during initialization so the
queue's irq is freed.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[rewritten to use nvme_free_queues]
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

a1a5ef99

NVMe: Dynamically allocate partition numbers · 469071a3

由 Matthew Wilcox 提交于 12月 09, 2013

Some users need more than 64 partitions per device.  Rather than simply
increasing the number of partitions, switch to the dynamic partition
allocation scheme.

This means that minor numbers are not stable across boots, but since major
numbers aren't either, I cannot see this being a significant problem.
Tested-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

469071a3

NVMe: Async IO queue deletion · 4d115420

由 Keith Busch 提交于 12月 10, 2013

This attempts to delete all IO queues at the same time asynchronously on
shutdown. This is necessary for a present device that is not responding;
a shutdown operation previously would take 2 minutes per queue-pair
to timeout before moving on to the next queue, making a device removal
appear to take a very long time or "hung" as reported by users.

In the previous worst case, a removal may be stuck forever until a kill
signal is given if there are more than 32 queue pairs since it would run
out of admin command IDs after over an hour of timed out sync commands
(admin queue depth is 64).

This patch will wait for the admin command timeout for all commands to
complete, so the worst case now for an unresponsive controller is 60
seconds, though that still seems like a long time.

Since this adds another way to take queues offline, some duplicate code
resulted so I moved these into more convienient functions.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[make functions static, correct line length and whitespace issues]
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

4d115420

NVMe: Surprise removal handling · 0e53d180

由 Keith Busch 提交于 12月 10, 2013

This adds checks to see if the nvme pci device was removed. The check
reads the status register for the value of -1, which it should never be
unless the device is no longer present.

If a user performs a surprise removal on an nvme device, the driver will
be notified either by the pci driver remove callback if the platform's
slot is capable of this event, or via reading the device BAR status
register, which will indicate controller failure and trigger a reset.

Either way, the device is not present so all outstanding commands would
timeout. This will not send queue deletion commands to a drive that
isn't present and fail after ioremap, significantly speeding up surprise
removal; previously this took over 2 minutes per IO queue pair created,
but this will complete removing the device within a few seconds.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

0e53d180

NVMe: Abort timed out commands · c30341dc

由 Keith Busch 提交于 12月 10, 2013

Send nvme abort command to io requests that have timed out on an
initialized device. If the command is not returned after another timeout,
schedule the controller for reset.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[fix endianness issues]
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

c30341dc

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功