提交 · d6f66210f4b1aa2f5944f0e34e0f8db44f499f92 · openeuler / Kernel

03 11月, 2020 4 次提交

nvme-tcp: avoid race between time out and tear down · d6f66210

由 Chao Leng 提交于 10月 22, 2020

Now use teardown_lock to serialize for time out and tear down. This may
cause abnormal: first cancel all request in tear down, then time out may
complete the request again, but the request may already be freed or
restarted.

To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue. At the same time we need to delete
teardown_lock.
Signed-off-by: NChao Leng <lengchao@huawei.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

d6f66210

nvme-rdma: avoid race between time out and tear down · 3017013d

由 Chao Leng 提交于 10月 22, 2020

Now use teardown_lock to serialize for time out and tear down. This may
cause abnormal: first cancel all request in tear down, then time out may
complete the request again, but the request may already be freed or
restarted.

To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue. At the same time we need to delete
teardown_lock.
Signed-off-by: NChao Leng <lengchao@huawei.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3017013d

nvme: introduce nvme_sync_io_queues · 04800fbf

由 Chao Leng 提交于 10月 22, 2020

Introduce sync io queues for some scenarios which just only need sync
io queues not sync all queues.
Signed-off-by: NChao Leng <lengchao@huawei.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

04800fbf

Revert "nvme-pci: remove last_sq_tail" · 38210800

由 Keith Busch 提交于 10月 30, 2020

Multiple CPUs may be mapped to the same hctx, allowing mulitple
submission contexts to attempt commit_rqs(). We need to verify we're
not writing the same doorbell value multiple times since that's a spec
violation.

Revert commit 54b2fcee.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1878596Reported-by: N"B.L. Jones" <brandon.gustav@googlemail.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>

38210800

29 10月, 2020 4 次提交

xsysace: use platform_get_resource() and platform_get_irq_optional() · 7cb6e22b

由 Andy Shevchenko 提交于 10月 27, 2020

Use platform_get_resource() to fetch the memory resource and
platform_get_irq_optional() to get optional IRQ instead of
open-coded variants.

IRQ is not supposed to be changed at runtime, so there is
no functional change in ace_fsm_yieldirq().

On the other hand we now take first resources instead of last ones
to proceed. I can't imagine how broken should be firmware to have
a garbage in the first resource slots. But if it the case, it needs
to be documented.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: NMichal Simek <michal.simek@xilinx.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7cb6e22b

null_blk: Fix locking in zoned mode · aa1c09cb

由 Damien Le Moal 提交于 10月 29, 2020

When the zoned mode is enabled in null_blk, Serializing read, write
and zone management operations for each zone is necessary to protect
device level information for managing zone resources (zone open and
closed counters) as well as each zone condition and write pointer
position. Commit 35bc10b2 ("null_blk: synchronization fix for
zoned device") introduced a spinlock to implement this serialization.
However, when memory backing is also enabled, GFP_NOIO memory
allocations are executed under the spinlock, resulting in might_sleep()
warnings. Furthermore, the zone_lock spinlock is locked/unlocked using
spin_lock_irq/spin_unlock_irq, similarly to the memory backing code with
the nullb->lock spinlock. This nested use of irq locks wrecks the irq
enabled/disabled state.

Fix all this by introducing a bitmap for per-zone lock, with locking
implemented using wait_on_bit_lock_io() and clear_and_wake_up_bit().
This locking mechanism allows keeping a zone locked while executing
null_process_cmd(), serializing all operations to the zone while
allowing to sleep during memory backing allocation with GFP_NOIO.
Device level zone resource management information is protected using
a spinlock which is not held while executing null_process_cmd();

Fixes: 35bc10b2 ("null_blk: synchronization fix for zoned device")
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

aa1c09cb

null_blk: Fix zone reset all tracing · f9c91042

由 Damien Le Moal 提交于 10月 29, 2020

In the cae of the REQ_OP_ZONE_RESET_ALL operation, the command sector is
ignored and the operation is applied to all sequential zones. For these
commands, tracing the effect of the command using the command sector to
determine the target zone is thus incorrect.

Fix null_zone_mgmt() zone condition tracing in the case of
REQ_OP_ZONE_RESET_ALL to apply tracing to all sequential zones that are
not already empty.

Fixes: 766c3297 ("null_blk: add trace in null_blk_zoned.c")
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f9c91042

nbd: don't update block size after device is started · b40813dd

由 Ming Lei 提交于 10月 28, 2020

Mounted NBD device can be resized, one use case is rbd-nbd.

Fix the issue by setting up default block size, then not touch it
in nbd_size_update() any more. This kind of usage is aligned with loop
which has same use case too.

Cc: stable@vger.kernel.org
Fixes: c8a83a6b ("nbd: Use set_blocksize() to set device blocksize")
Reported-by: Nlining <lining2020x@163.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jan Kara <jack@suse.cz>
Tested-by: Nlining <lining2020x@163.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b40813dd

28 10月, 2020 1 次提交

null_blk: synchronization fix for zoned device · 35bc10b2

由 Kanchan Joshi 提交于 9月 28, 2020

Parallel write,read,zone-mgmt operations accessing/altering zone state
and write-pointer may get into race. Avoid the situation by using a new
spinlock for zoned device.
Concurrent zone-appends (on a zone) returning same write-pointer issue
is also avoided using this lock.

Cc: stable@vger.kernel.org
Fixes: e0489ed5 ("null_blk: Support REQ_OP_ZONE_APPEND")
Signed-off-by: NKanchan Joshi <joshi.k@samsung.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

35bc10b2

27 10月, 2020 7 次提交

nvmet: fix a NULL pointer dereference when tracing the flush command · 3c3751f2

由 Chaitanya Kulkarni 提交于 10月 22, 2020

When target side trace in turned on and flush command is issued from the
host it results in the following Oops.

[  856.789724] BUG: kernel NULL pointer dereference, address: 0000000000000068
[  856.790686] #PF: supervisor read access in kernel mode
[  856.791262] #PF: error_code(0x0000) - not-present page
[  856.791863] PGD 6d7110067 P4D 6d7110067 PUD 66f0ad067 PMD 0
[  856.792527] Oops: 0000 [#1] SMP NOPTI
[  856.792950] CPU: 15 PID: 7034 Comm: nvme Tainted: G           OE     5.9.0nvme-5.9+ #71
[  856.793790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214
[  856.794956] RIP: 0010:trace_event_raw_event_nvmet_req_init+0x13e/0x170 [nvmet]
[  856.795734] Code: 41 5c 41 5d c3 31 d2 31 f6 e8 4e 9b b8 e0 e9 0e ff ff ff 49 8b 55 00 48 8b 38 8b 0
[  856.797740] RSP: 0018:ffffc90001be3a60 EFLAGS: 00010246
[  856.798375] RAX: 0000000000000000 RBX: ffff8887e7d2c01c RCX: 0000000000000000
[  856.799234] RDX: 0000000000000020 RSI: 0000000057e70ea2 RDI: ffff8887e7d2c034
[  856.800088] RBP: ffff88869f710578 R08: ffff888807500d40 R09: 00000000fffffffe
[  856.800951] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff8887e7d2c034
[  856.801807] R13: ffff88869f7105c8 R14: 0000000000000040 R15: ffff88869f710440
[  856.802667] FS:  00007f6a22bd8780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000
[  856.803635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  856.804367] CR2: 0000000000000068 CR3: 00000006d73e0000 CR4: 00000000003506e0
[  856.805283] Call Trace:
[  856.805613]  nvmet_req_init+0x27c/0x480 [nvmet]
[  856.806200]  nvme_loop_queue_rq+0xcb/0x1d0 [nvme_loop]
[  856.806862]  blk_mq_dispatch_rq_list+0x123/0x7b0
[  856.807459]  ? kvm_sched_clock_read+0x14/0x30
[  856.808025]  __blk_mq_sched_dispatch_requests+0xc7/0x170
[  856.808708]  blk_mq_sched_dispatch_requests+0x30/0x60
[  856.809372]  __blk_mq_run_hw_queue+0x70/0x100
[  856.809935]  __blk_mq_delay_run_hw_queue+0x156/0x170
[  856.810574]  blk_mq_run_hw_queue+0x86/0xe0
[  856.811104]  blk_mq_sched_insert_request+0xef/0x160
[  856.811733]  blk_execute_rq+0x69/0xc0
[  856.812212]  ? blk_mq_rq_ctx_init+0xd0/0x230
[  856.812784]  nvme_execute_passthru_rq+0x57/0x130 [nvme_core]
[  856.813461]  nvme_submit_user_cmd+0xeb/0x300 [nvme_core]
[  856.814099]  nvme_user_cmd.isra.82+0x11e/0x1a0 [nvme_core]
[  856.814752]  blkdev_ioctl+0x1dc/0x2c0
[  856.815197]  block_ioctl+0x3f/0x50
[  856.815606]  __x64_sys_ioctl+0x84/0xc0
[  856.816074]  do_syscall_64+0x33/0x40
[  856.816533]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  856.817168] RIP: 0033:0x7f6a222ed107
[  856.817617] Code: 44 00 00 48 8b 05 81 cd 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 8
[  856.819901] RSP: 002b:00007ffca848f058 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  856.820846] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6a222ed107
[  856.821726] RDX: 00007ffca848f060 RSI: 00000000c0484e43 RDI: 0000000000000003
[  856.822603] RBP: 0000000000000003 R08: 000000000000003f R09: 0000000000000005
[  856.823478] R10: 00007ffca848ece0 R11: 0000000000000202 R12: 00007ffca84912d3
[  856.824359] R13: 00007ffca848f4d0 R14: 0000000000000002 R15: 000000000067e900
[  856.825236] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corel

Move the nvmet_req_init() tracepoint after we parse the command in
nvmet_req_init() so that we can get rid of the duplicate
nvmet_find_namespace() call.
Rename __assign_disk_name() ->  __assign_req_name(). Now that we call
tracepoint after parsing the command simplify the newly added
__assign_req_name() which fixes this bug.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

3c3751f2

nvme-fc: remove nvme_fc_terminate_io() · ac9b820e

由 James Smart 提交于 10月 23, 2020

__nvme_fc_terminate_io() is now called by only 1 place, in reset_work.
Consoldate and move the functionality of terminate_io into reset_work.

In reset_work, rather than calling the create_association directly,
schedule the connect work element to do its thing. After scheduling,
flush the connect work element to continue with semantic of not
returning until connect has been attempted at least once.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

ac9b820e

nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery · 95ced8a2

由 James Smart 提交于 10月 23, 2020

nvme_fc_error_recovery() special cases handling when in CONNECTING state
and calls __nvme_fc_terminate_io(). __nvme_fc_terminate_io() itself
special cases CONNECTING state and calls the routine to abort outstanding
ios.

Simplify the sequence by putting the call to abort outstanding I/Os
directly in nvme_fc_error_recovery.

Move the location of __nvme_fc_abort_outstanding_ios(), and
nvme_fc_terminate_exchange() which is called by it, to avoid adding
function prototypes for nvme_fc_error_recovery().
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

95ced8a2

nvme-fc: remove err_work work item · 9c2bb257

由 James Smart 提交于 10月 23, 2020

err_work was created to handle errors (mainly I/O timeouts) while in
CONNECTING state. The flag for err_work_active is also unneeded.

Remove err_work_active and err_work.  The actions to abort I/Os are moved
inline to nvme_error_recovery().
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9c2bb257

nvme-fc: track error_recovery while connecting · caf1cbe3

由 James Smart 提交于 10月 23, 2020

Whenever there are errors during CONNECTING, the driver recovers by
aborting all outstanding ios and counts on the io completion to fail them
and thus the connection/association they are on. However, the connection
failure depends on a failure state from the core routines. Not all
commands that are issued by the core routine are guaranteed to cause a
failure of the core routine. They may be treated as a failure status and
the status is then ignored.

As such, whenever the transport enters error_recovery while CONNECTING,
it will set a new flag indicating an association failed. The
create_association routine which creates and initializes the controller,
will monitor the state of the flag as well as the core routine error
status and ensure the association fails if there was an error.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

caf1cbe3

nvme-rdma: handle unexpected nvme completion data length · 25c1ca6e

由 zhenwei pi 提交于 10月 25, 2020

Receiving a zero length message leads to the following warnings because
the CQE is processed twice:

refcount_t: underflow; use-after-free.
WARNING: CPU: 0 PID: 0 at lib/refcount.c:28

RIP: 0010:refcount_warn_saturate+0xd9/0xe0
Call Trace:
 <IRQ>
 nvme_rdma_recv_done+0xf3/0x280 [nvme_rdma]
 __ib_process_cq+0x76/0x150 [ib_core]
 ...

Sanity check the received data length, to avoids this.

Thanks to Chao Leng & Sagi for suggestions.
Signed-off-by: Nzhenwei pi <pizhenwei@bytedance.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

25c1ca6e

nvme: ignore zone validate errors on subsequent scans · 8685699c

由 Keith Busch 提交于 10月 23, 2020

Revalidating nvme zoned namespaces requires IO commands, and there are
controller states that prevent IO. For example, a sanitize in progress
is required to fail all IO, but we don't want to remove a namespace
we've previously added just because the controller is in such a state.
Suppress the error in this case.
Reported-by: NMichael Nguyen <michael.nguyen@wdc.com>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

8685699c

26 10月, 2020 2 次提交

treewide: Convert macro and uses of __section(foo) to __section("foo") · 33def849

由 Joe Perches 提交于 10月 21, 2020

Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.

Remove the quote operator # from compiler_attributes.h __section macro.

Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.

Conversion done using the script at:

https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.plSigned-off-by: NJoe Perches <joe@perches.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@gooogle.com>
Reviewed-by: NMiguel Ojeda <ojeda@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33def849

mm: remove kzfree() compatibility definition · 23224e45

由 Eric Biggers 提交于 10月 23, 2020

Commit 453431a5 ("mm, treewide: rename kzfree() to
kfree_sensitive()") renamed kzfree() to kfree_sensitive(),
but it left a compatibility definition of kzfree() to avoid
being too disruptive.

Since then a few more instances of kzfree() have slipped in.

Just get rid of them and remove the compatibility definition
once and for all.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

23224e45

25 10月, 2020 2 次提交

i2c: core: Restore acpi_walk_dep_device_list() getting called after registering the ACPI i2c devs · 8058d699

由 Hans de Goede 提交于 10月 14, 2020

Commit 21653a41 ("i2c: core: Call i2c_acpi_install_space_handler()
before i2c_acpi_register_devices()")'s intention was to only move the
acpi_install_address_space_handler() call to the point before where
the ACPI declared i2c-children of the adapter where instantiated by
i2c_acpi_register_devices().

But i2c_acpi_install_space_handler() had a call to
acpi_walk_dep_device_list() hidden (that is I missed it) at the end
of it, so as an unwanted side-effect now acpi_walk_dep_device_list()
was also being called before i2c_acpi_register_devices().

Move the acpi_walk_dep_device_list() call to the end of
i2c_acpi_register_devices(), so that it is once again called *after*
the i2c_client-s hanging of the adapter have been created.

This fixes the Microsoft Surface Go 2 hanging at boot.

Fixes: 21653a41 ("i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209627Reported-by: NRainer Finke <rainer@finke.cc>
Reported-by: NKieran Bingham <kieran.bingham@ideasonboard.com>
Suggested-by: NMaximilian Luz <luzmaximilian@gmail.com>
Tested-by: NKieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NWolfram Sang <wsa@kernel.org>

8058d699

random32: make prandom_u32() output unpredictable · c51f8f88

由 George Spelvin 提交于 8月 09, 2020

Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output.  An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.

It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack.  Oops.

This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key.  (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.)  Speed is prioritized over security;
attacks are rare, while performance is always wanted.

Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.

Commit f227e3ec ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution.  This patch replaces
it.
Reported-by: NAmit Klein <aksecurity@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Fixes: f227e3ec ("random32: update the net random state on interrupt and activity")
Signed-off-by: NGeorge Spelvin <lkml@sdf.org>
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
[ willy: partial reversal of f227e3ec; moved SIPROUND definitions
  to prandom.h for later use; merged George's prandom_seed() proposal;
  inlined siprand_u32(); replaced the net_rand_state[] array with 4
  members to fix a build issue; cosmetic cleanups to make checkpatch
  happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: NWilly Tarreau <w@1wt.eu>

c51f8f88

24 10月, 2020 2 次提交

ata: pata_ns87415.c: Document support on parisc with superio chip · 2e34ae02

由 Helge Deller 提交于 10月 23, 2020

I tested this driver on my HP PA-RISC C3000 workstation and it does
work with the built-in TEAC CD-532E-B CD-ROM drive.
So drop the TODO item and adjust the file header.
Signed-off-by: NHelge Deller <deller@gmx.de>

2e34ae02

ata: fix some kernel-doc markups · 94bd5719

由 Mauro Carvalho Chehab 提交于 10月 23, 2020

Some functions have different names between their prototypes
and the kernel-doc markup.
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

94bd5719

23 10月, 2020 14 次提交

nvme-fc: shorten reconnect delay if possible for FC · f673714a

由 James Smart 提交于 10月 16, 2020

We've had several complaints about a 10s reconnect delay (the default)
when there was an error while there is connectivity to a subsystem.
The max_reconnects and reconnect_delay are set in common code prior to
calling the transport to create the controller.

This change checks if the default reconnect delay is being used, and if
so, it adjusts it to a shorter period (2s) for the nvme-fc transport.
It does so by calculating the controller loss tmo window, changing the
value of the reconnect delay, and then recalculating the maximum number
of reconnect attempts allowed.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

f673714a

nvme-fc: wait for queues to freeze before calling update_hr_hw_queues · 88e837ed

由 James Smart 提交于 10月 16, 2020

On reconnect, the code currently does not freeze the controller before
possibly updating the number hw queues for the controller.

Add the freeze before updating the number of hw queues.  Note: the queues
are already started and remain started through the reconnect.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

88e837ed

nvme-fc: fix error loop in create_hw_io_queues · 514a6dc9

由 James Smart 提交于 10月 16, 2020

The loop that backs out of hw io queue creation continues through index
0, which corresponds to the admin queue as well.

Fix the loop so it only proceeds through indexes 1..n which correspond to
I/O queues.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

514a6dc9

nvme-fc: fix io timeout to abort I/O · 52793d62

由 James Smart 提交于 10月 16, 2020

Currently, an I/O timeout unconditionally invokes
nvme_fc_error_recovery() which checks for LIVE or CONNECTING state.  If
live, the routine resets the controller which initiates a reconnect -
which is valid.  If CONNECTING, err_work is scheduled.  Err_work then
calls the terminate_io routine, which also checks for CONNECTING and
noops any further action on outstanding I/O.  The result is nothing
happened to the timed out io.  As such, if the command was dropped on
the wire, it will never timeout / complete, and the connect process
will hang.

Change the behavior of the io timeout routine to unconditionally abort
the I/O.  I/O completion handling will note that an io failed due to an
abort and will terminate the connection / association as needed.  If the
abort was unable to happen, continue with a call to
nvme_fc_error_recovery(). To ensure something different happens in
nvme_fc_error_recovery() rework it so at it will abort all I/Os on the
association to force a failure.

As I/O aborts now may occur outside of delete_association, counting for
completion must be wary and only count those aborted during
delete_association when TERMIO is set on the controller.
Signed-off-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

52793d62

xen/events: unmask a fifo event channel only if it was masked · eabe7417

由 Juergen Gross 提交于 10月 22, 2020

Unmasking an event channel with fifo events channels being used can
require a hypercall to be made, so try to avoid that by checking
whether the event channel was really masked.
Suggested-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NJan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20201022094907.28560-5-jgross@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

eabe7417

xen/events: only register debug interrupt for 2-level events · d04b1ae5

由 Juergen Gross 提交于 10月 22, 2020

xen_debug_interrupt() is specific to 2-level event handling. So don't
register it with fifo event handling being active.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NJan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20201022094907.28560-4-jgross@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

d04b1ae5

xen/events: make struct irq_info private to events_base.c · 7e14cde1

由 Juergen Gross 提交于 10月 22, 2020

The struct irq_info of Xen's event handling is used only for two
evtchn_ops functions outside of events_base.c. Those two functions
can easily be switched to avoid that usage.

This allows to make struct irq_info and its related access functions
private to events_base.c.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NJan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20201022094907.28560-3-jgross@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

7e14cde1

xen: remove no longer used functions · 58940487

由 Juergen Gross 提交于 10月 22, 2020

With the switch to the lateeoi model for interdomain event channels
some functions are no longer in use. Remove them.
Suggested-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Reviewed-by: NJan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20201022094907.28560-2-jgross@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

58940487

hil/parisc: Disable HIL driver when it gets stuck · 879bc2d2

由 Helge Deller 提交于 10月 19, 2020

When starting a HP machine with HIL driver but without an HIL keyboard
or HIL mouse attached, it may happen that data written to the HIL loop
gets stuck (e.g. because the transaction queue is full).  Usually one
will then have to reboot the machine because all you see is and endless
output of:
 Transaction add failed: transaction already queued?

In the higher layers hp_sdc_enqueue_transaction() is called to queued up
a HIL packet. This function returns an error code, and this patch adds
the necessary checks for this return code and disables the HIL driver if
further packets can't be sent.

Tested on a HP 730 and a HP 715/64 machine.
Signed-off-by: NHelge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org>

879bc2d2

ACPI: utils: remove unreachable breaks · abcba2e1

由 Tom Rix 提交于 10月 19, 2020

A break following a return statement is pointless, so drop all of
the breaks following return statements from this file.
Signed-off-by: NTom Rix <trix@redhat.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

abcba2e1

PM: sleep: remove unreachable break · d298787d

由 Tom Rix 提交于 10月 19, 2020

A break following a return statement is pointless, so drop it.
Signed-off-by: NTom Rix <trix@redhat.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d298787d

PM: AVS: Drop the avs directory and the corresponding Kconfig · 785b5bb4

由 Ulf Hansson 提交于 10月 06, 2020

All avs drivers have now been moved to their corresponding soc specific
directories. Additionally, they don't depend on the POWER_AVS Kconfig
anymore. Therefore, let's simply drop the drivers/power/avs directory
altogether.

Cc: Nishanth Menon <nm@ti.com>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Reviewed-by: NNishanth Menon <nm@ti.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

785b5bb4

null_blk: use zone status for max active/open · fd78874b

由 Keith Busch 提交于 10月 22, 2020

The block layer provides special status codes when requests go beyond
the zone resource limits. Use these codes instead of the generic IOERR
for requests that exceed the max active or open limits the null_blk
device was configured with so that applications know how these special
conditions should be handled.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NNiklas Cassel <niklas.cassel@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fd78874b

PM: AVS: qcom-cpr: Move the driver to the qcom specific drivers · a7305e68

由 Ulf Hansson 提交于 10月 06, 2020

The avs drivers are all SoC specific drivers that doesn't share any code.
Instead they are located in a directory, mostly to keep similar
functionality together. From a maintenance point of view, it makes better
sense to collect SoC specific drivers like these, into the SoC specific
directories.

Therefore, let's move the qcom-cpr driver to the qcom directory.
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: NNiklas Cassel <nks@flawful.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a7305e68

22 10月, 2020 4 次提交

nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru · 150dfb6c

由 Chaitanya Kulkarni 提交于 10月 20, 2020

By default, we set the passthru request allocation flag such that it
returns the error in the following code path and we fail the I/O when
BLK_MQ_REQ_NOWAIT is used for request allocation :-

nvme_alloc_request()
 blk_mq_alloc_request()
  blk_mq_queue_enter()
   if (flag & BLK_MQ_REQ_NOWAIT)
        return -EBUSY; <-- return if busy.

On some controllers using BLK_MQ_REQ_NOWAIT ends up in I/O error where
the controller is perfectly healthy and not in a degraded state.

Block layer request allocation does allow us to wait instead of
immediately returning the error when we BLK_MQ_REQ_NOWAIT flag is not
used. This has shown to fix the I/O error problem reported under
heavy random write workload.

Remove the BLK_MQ_REQ_NOWAIT parameter for passthru request allocation
which resolves this issue.
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

150dfb6c

nvmet: cleanup nvmet_passthru_map_sg() · 5e063101

由 Logan Gunthorpe 提交于 10月 16, 2020

Clean up some confusing elements of nvmet_passthru_map_sg() by returning
early if the request is greater than the maximum bio size. This allows
us to drop the sg_cnt variable.

This should not result in any functional change but makes the code
clearer and more understandable. The original code allocated a truncated
bio then would return EINVAL when bio_add_pc_page() filled that bio. The
new code just returns EINVAL early if this would happen.

Fixes: c1fef73f ("nvmet: add passthru code to process commands")
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Suggested-by: NDouglas Gilbert <dgilbert@interlog.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

5e063101

nvmet: limit passthru MTDS by BIO_MAX_PAGES · df06047d

由 Logan Gunthorpe 提交于 10月 16, 2020

nvmet_passthru_map_sg() only supports mapping a single BIO, not a chain
so the effective maximum transfer should also be limitted by
BIO_MAX_PAGES (presently this works out to 1MB).

For PCI passthru devices the max_sectors would typically be more
limitting than BIO_MAX_PAGES, but this may not be true for all passthru
devices.

Fixes: c1fef73f ("nvmet: add passthru code to process commands")
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

df06047d

nvmet: fix uninitialized work for zero kato · 85bd23f3

由 zhenwei pi 提交于 10月 15, 2020

When connecting a controller with a zero kato value using the following
command line

   nvme connect -t tcp -n NQN -a ADDR -s PORT --keep-alive-tmo=0

the warning below can be reproduced:

WARNING: CPU: 1 PID: 241 at kernel/workqueue.c:1627 __queue_delayed_work+0x6d/0x90
with trace:
  mod_delayed_work_on+0x59/0x90
  nvmet_update_cc+0xee/0x100 [nvmet]
  nvmet_execute_prop_set+0x72/0x80 [nvmet]
  nvmet_tcp_try_recv_pdu+0x2f7/0x770 [nvmet_tcp]
  nvmet_tcp_io_work+0x63f/0xb2d [nvmet_tcp]
  ...

This is caused by queuing up an uninitialized work.  Althrough the
keep-alive timer is disabled during allocating the controller (fixed in
0d3b6a8d), ka_work still has a chance to run (called by
nvmet_start_ctrl).

Fixes: 0d3b6a8d ("nvmet: Disable keep-alive timer when kato is cleared to 0h")
Signed-off-by: Nzhenwei pi <pizhenwei@bytedance.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

85bd23f3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功