1. 03 11月, 2020 4 次提交
  2. 29 10月, 2020 4 次提交
    • A
      xsysace: use platform_get_resource() and platform_get_irq_optional() · 7cb6e22b
      Andy Shevchenko 提交于
      Use platform_get_resource() to fetch the memory resource and
      platform_get_irq_optional() to get optional IRQ instead of
      open-coded variants.
      
      IRQ is not supposed to be changed at runtime, so there is
      no functional change in ace_fsm_yieldirq().
      
      On the other hand we now take first resources instead of last ones
      to proceed. I can't imagine how broken should be firmware to have
      a garbage in the first resource slots. But if it the case, it needs
      to be documented.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: NMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7cb6e22b
    • D
      null_blk: Fix locking in zoned mode · aa1c09cb
      Damien Le Moal 提交于
      When the zoned mode is enabled in null_blk, Serializing read, write
      and zone management operations for each zone is necessary to protect
      device level information for managing zone resources (zone open and
      closed counters) as well as each zone condition and write pointer
      position. Commit 35bc10b2 ("null_blk: synchronization fix for
      zoned device") introduced a spinlock to implement this serialization.
      However, when memory backing is also enabled, GFP_NOIO memory
      allocations are executed under the spinlock, resulting in might_sleep()
      warnings. Furthermore, the zone_lock spinlock is locked/unlocked using
      spin_lock_irq/spin_unlock_irq, similarly to the memory backing code with
      the nullb->lock spinlock. This nested use of irq locks wrecks the irq
      enabled/disabled state.
      
      Fix all this by introducing a bitmap for per-zone lock, with locking
      implemented using wait_on_bit_lock_io() and clear_and_wake_up_bit().
      This locking mechanism allows keeping a zone locked while executing
      null_process_cmd(), serializing all operations to the zone while
      allowing to sleep during memory backing allocation with GFP_NOIO.
      Device level zone resource management information is protected using
      a spinlock which is not held while executing null_process_cmd();
      
      Fixes: 35bc10b2 ("null_blk: synchronization fix for zoned device")
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      aa1c09cb
    • D
      null_blk: Fix zone reset all tracing · f9c91042
      Damien Le Moal 提交于
      In the cae of the REQ_OP_ZONE_RESET_ALL operation, the command sector is
      ignored and the operation is applied to all sequential zones. For these
      commands, tracing the effect of the command using the command sector to
      determine the target zone is thus incorrect.
      
      Fix null_zone_mgmt() zone condition tracing in the case of
      REQ_OP_ZONE_RESET_ALL to apply tracing to all sequential zones that are
      not already empty.
      
      Fixes: 766c3297 ("null_blk: add trace in null_blk_zoned.c")
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f9c91042
    • M
      nbd: don't update block size after device is started · b40813dd
      Ming Lei 提交于
      Mounted NBD device can be resized, one use case is rbd-nbd.
      
      Fix the issue by setting up default block size, then not touch it
      in nbd_size_update() any more. This kind of usage is aligned with loop
      which has same use case too.
      
      Cc: stable@vger.kernel.org
      Fixes: c8a83a6b ("nbd: Use set_blocksize() to set device blocksize")
      Reported-by: Nlining <lining2020x@163.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jan Kara <jack@suse.cz>
      Tested-by: Nlining <lining2020x@163.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b40813dd
  3. 28 10月, 2020 1 次提交
  4. 27 10月, 2020 7 次提交
    • C
      nvmet: fix a NULL pointer dereference when tracing the flush command · 3c3751f2
      Chaitanya Kulkarni 提交于
      When target side trace in turned on and flush command is issued from the
      host it results in the following Oops.
      
      [  856.789724] BUG: kernel NULL pointer dereference, address: 0000000000000068
      [  856.790686] #PF: supervisor read access in kernel mode
      [  856.791262] #PF: error_code(0x0000) - not-present page
      [  856.791863] PGD 6d7110067 P4D 6d7110067 PUD 66f0ad067 PMD 0
      [  856.792527] Oops: 0000 [#1] SMP NOPTI
      [  856.792950] CPU: 15 PID: 7034 Comm: nvme Tainted: G           OE     5.9.0nvme-5.9+ #71
      [  856.793790] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e3214
      [  856.794956] RIP: 0010:trace_event_raw_event_nvmet_req_init+0x13e/0x170 [nvmet]
      [  856.795734] Code: 41 5c 41 5d c3 31 d2 31 f6 e8 4e 9b b8 e0 e9 0e ff ff ff 49 8b 55 00 48 8b 38 8b 0
      [  856.797740] RSP: 0018:ffffc90001be3a60 EFLAGS: 00010246
      [  856.798375] RAX: 0000000000000000 RBX: ffff8887e7d2c01c RCX: 0000000000000000
      [  856.799234] RDX: 0000000000000020 RSI: 0000000057e70ea2 RDI: ffff8887e7d2c034
      [  856.800088] RBP: ffff88869f710578 R08: ffff888807500d40 R09: 00000000fffffffe
      [  856.800951] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff8887e7d2c034
      [  856.801807] R13: ffff88869f7105c8 R14: 0000000000000040 R15: ffff88869f710440
      [  856.802667] FS:  00007f6a22bd8780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000
      [  856.803635] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  856.804367] CR2: 0000000000000068 CR3: 00000006d73e0000 CR4: 00000000003506e0
      [  856.805283] Call Trace:
      [  856.805613]  nvmet_req_init+0x27c/0x480 [nvmet]
      [  856.806200]  nvme_loop_queue_rq+0xcb/0x1d0 [nvme_loop]
      [  856.806862]  blk_mq_dispatch_rq_list+0x123/0x7b0
      [  856.807459]  ? kvm_sched_clock_read+0x14/0x30
      [  856.808025]  __blk_mq_sched_dispatch_requests+0xc7/0x170
      [  856.808708]  blk_mq_sched_dispatch_requests+0x30/0x60
      [  856.809372]  __blk_mq_run_hw_queue+0x70/0x100
      [  856.809935]  __blk_mq_delay_run_hw_queue+0x156/0x170
      [  856.810574]  blk_mq_run_hw_queue+0x86/0xe0
      [  856.811104]  blk_mq_sched_insert_request+0xef/0x160
      [  856.811733]  blk_execute_rq+0x69/0xc0
      [  856.812212]  ? blk_mq_rq_ctx_init+0xd0/0x230
      [  856.812784]  nvme_execute_passthru_rq+0x57/0x130 [nvme_core]
      [  856.813461]  nvme_submit_user_cmd+0xeb/0x300 [nvme_core]
      [  856.814099]  nvme_user_cmd.isra.82+0x11e/0x1a0 [nvme_core]
      [  856.814752]  blkdev_ioctl+0x1dc/0x2c0
      [  856.815197]  block_ioctl+0x3f/0x50
      [  856.815606]  __x64_sys_ioctl+0x84/0xc0
      [  856.816074]  do_syscall_64+0x33/0x40
      [  856.816533]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  856.817168] RIP: 0033:0x7f6a222ed107
      [  856.817617] Code: 44 00 00 48 8b 05 81 cd 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 8
      [  856.819901] RSP: 002b:00007ffca848f058 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
      [  856.820846] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6a222ed107
      [  856.821726] RDX: 00007ffca848f060 RSI: 00000000c0484e43 RDI: 0000000000000003
      [  856.822603] RBP: 0000000000000003 R08: 000000000000003f R09: 0000000000000005
      [  856.823478] R10: 00007ffca848ece0 R11: 0000000000000202 R12: 00007ffca84912d3
      [  856.824359] R13: 00007ffca848f4d0 R14: 0000000000000002 R15: 000000000067e900
      [  856.825236] Modules linked in: nvme_loop(OE) nvmet(OE) nvme_fabrics(OE) null_blk nvme(OE) nvme_corel
      
      Move the nvmet_req_init() tracepoint after we parse the command in
      nvmet_req_init() so that we can get rid of the duplicate
      nvmet_find_namespace() call.
      Rename __assign_disk_name() ->  __assign_req_name(). Now that we call
      tracepoint after parsing the command simplify the newly added
      __assign_req_name() which fixes this bug.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      3c3751f2
    • J
      nvme-fc: remove nvme_fc_terminate_io() · ac9b820e
      James Smart 提交于
      __nvme_fc_terminate_io() is now called by only 1 place, in reset_work.
      Consoldate and move the functionality of terminate_io into reset_work.
      
      In reset_work, rather than calling the create_association directly,
      schedule the connect work element to do its thing. After scheduling,
      flush the connect work element to continue with semantic of not
      returning until connect has been attempted at least once.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ac9b820e
    • J
      nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery · 95ced8a2
      James Smart 提交于
      nvme_fc_error_recovery() special cases handling when in CONNECTING state
      and calls __nvme_fc_terminate_io(). __nvme_fc_terminate_io() itself
      special cases CONNECTING state and calls the routine to abort outstanding
      ios.
      
      Simplify the sequence by putting the call to abort outstanding I/Os
      directly in nvme_fc_error_recovery.
      
      Move the location of __nvme_fc_abort_outstanding_ios(), and
      nvme_fc_terminate_exchange() which is called by it, to avoid adding
      function prototypes for nvme_fc_error_recovery().
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      95ced8a2
    • J
      nvme-fc: remove err_work work item · 9c2bb257
      James Smart 提交于
      err_work was created to handle errors (mainly I/O timeouts) while in
      CONNECTING state. The flag for err_work_active is also unneeded.
      
      Remove err_work_active and err_work.  The actions to abort I/Os are moved
      inline to nvme_error_recovery().
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9c2bb257
    • J
      nvme-fc: track error_recovery while connecting · caf1cbe3
      James Smart 提交于
      Whenever there are errors during CONNECTING, the driver recovers by
      aborting all outstanding ios and counts on the io completion to fail them
      and thus the connection/association they are on.  However, the connection
      failure depends on a failure state from the core routines.  Not all
      commands that are issued by the core routine are guaranteed to cause a
      failure of the core routine. They may be treated as a failure status and
      the status is then ignored.
      
      As such, whenever the transport enters error_recovery while CONNECTING,
      it will set a new flag indicating an association failed. The
      create_association routine which creates and initializes the controller,
      will monitor the state of the flag as well as the core routine error
      status and ensure the association fails if there was an error.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      caf1cbe3
    • Z
      nvme-rdma: handle unexpected nvme completion data length · 25c1ca6e
      zhenwei pi 提交于
      Receiving a zero length message leads to the following warnings because
      the CQE is processed twice:
      
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 0 PID: 0 at lib/refcount.c:28
      
      RIP: 0010:refcount_warn_saturate+0xd9/0xe0
      Call Trace:
       <IRQ>
       nvme_rdma_recv_done+0xf3/0x280 [nvme_rdma]
       __ib_process_cq+0x76/0x150 [ib_core]
       ...
      
      Sanity check the received data length, to avoids this.
      
      Thanks to Chao Leng & Sagi for suggestions.
      Signed-off-by: Nzhenwei pi <pizhenwei@bytedance.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      25c1ca6e
    • K
      nvme: ignore zone validate errors on subsequent scans · 8685699c
      Keith Busch 提交于
      Revalidating nvme zoned namespaces requires IO commands, and there are
      controller states that prevent IO. For example, a sanitize in progress
      is required to fail all IO, but we don't want to remove a namespace
      we've previously added just because the controller is in such a state.
      Suppress the error in this case.
      Reported-by: NMichael Nguyen <michael.nguyen@wdc.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8685699c
  5. 26 10月, 2020 2 次提交
  6. 25 10月, 2020 2 次提交
    • H
      i2c: core: Restore acpi_walk_dep_device_list() getting called after registering the ACPI i2c devs · 8058d699
      Hans de Goede 提交于
      Commit 21653a41 ("i2c: core: Call i2c_acpi_install_space_handler()
      before i2c_acpi_register_devices()")'s intention was to only move the
      acpi_install_address_space_handler() call to the point before where
      the ACPI declared i2c-children of the adapter where instantiated by
      i2c_acpi_register_devices().
      
      But i2c_acpi_install_space_handler() had a call to
      acpi_walk_dep_device_list() hidden (that is I missed it) at the end
      of it, so as an unwanted side-effect now acpi_walk_dep_device_list()
      was also being called before i2c_acpi_register_devices().
      
      Move the acpi_walk_dep_device_list() call to the end of
      i2c_acpi_register_devices(), so that it is once again called *after*
      the i2c_client-s hanging of the adapter have been created.
      
      This fixes the Microsoft Surface Go 2 hanging at boot.
      
      Fixes: 21653a41 ("i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=209627Reported-by: NRainer Finke <rainer@finke.cc>
      Reported-by: NKieran Bingham <kieran.bingham@ideasonboard.com>
      Suggested-by: NMaximilian Luz <luzmaximilian@gmail.com>
      Tested-by: NKieran Bingham <kieran.bingham@ideasonboard.com>
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NWolfram Sang <wsa@kernel.org>
      8058d699
    • G
      random32: make prandom_u32() output unpredictable · c51f8f88
      George Spelvin 提交于
      Non-cryptographic PRNGs may have great statistical properties, but
      are usually trivially predictable to someone who knows the algorithm,
      given a small sample of their output.  An LFSR like prandom_u32() is
      particularly simple, even if the sample is widely scattered bits.
      
      It turns out the network stack uses prandom_u32() for some things like
      random port numbers which it would prefer are *not* trivially predictable.
      Predictability led to a practical DNS spoofing attack.  Oops.
      
      This patch replaces the LFSR with a homebrew cryptographic PRNG based
      on the SipHash round function, which is in turn seeded with 128 bits
      of strong random key.  (The authors of SipHash have *not* been consulted
      about this abuse of their algorithm.)  Speed is prioritized over security;
      attacks are rare, while performance is always wanted.
      
      Replacing all callers of prandom_u32() is the quick fix.
      Whether to reinstate a weaker PRNG for uses which can tolerate it
      is an open question.
      
      Commit f227e3ec ("random32: update the net random state on interrupt
      and activity") was an earlier attempt at a solution.  This patch replaces
      it.
      Reported-by: NAmit Klein <aksecurity@gmail.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: tytso@mit.edu
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Marc Plumb <lkml.mplumb@gmail.com>
      Fixes: f227e3ec ("random32: update the net random state on interrupt and activity")
      Signed-off-by: NGeorge Spelvin <lkml@sdf.org>
      Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
      [ willy: partial reversal of f227e3ec; moved SIPROUND definitions
        to prandom.h for later use; merged George's prandom_seed() proposal;
        inlined siprand_u32(); replaced the net_rand_state[] array with 4
        members to fix a build issue; cosmetic cleanups to make checkpatch
        happy; fixed RANDOM32_SELFTEST build ]
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      c51f8f88
  7. 24 10月, 2020 2 次提交
  8. 23 10月, 2020 14 次提交
  9. 22 10月, 2020 4 次提交
    • C
      nvmet: don't use BLK_MQ_REQ_NOWAIT for passthru · 150dfb6c
      Chaitanya Kulkarni 提交于
      By default, we set the passthru request allocation flag such that it
      returns the error in the following code path and we fail the I/O when
      BLK_MQ_REQ_NOWAIT is used for request allocation :-
      
      nvme_alloc_request()
       blk_mq_alloc_request()
        blk_mq_queue_enter()
         if (flag & BLK_MQ_REQ_NOWAIT)
              return -EBUSY; <-- return if busy.
      
      On some controllers using BLK_MQ_REQ_NOWAIT ends up in I/O error where
      the controller is perfectly healthy and not in a degraded state.
      
      Block layer request allocation does allow us to wait instead of
      immediately returning the error when we BLK_MQ_REQ_NOWAIT flag is not
      used. This has shown to fix the I/O error problem reported under
      heavy random write workload.
      
      Remove the BLK_MQ_REQ_NOWAIT parameter for passthru request allocation
      which resolves this issue.
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      150dfb6c
    • L
      nvmet: cleanup nvmet_passthru_map_sg() · 5e063101
      Logan Gunthorpe 提交于
      Clean up some confusing elements of nvmet_passthru_map_sg() by returning
      early if the request is greater than the maximum bio size. This allows
      us to drop the sg_cnt variable.
      
      This should not result in any functional change but makes the code
      clearer and more understandable. The original code allocated a truncated
      bio then would return EINVAL when bio_add_pc_page() filled that bio. The
      new code just returns EINVAL early if this would happen.
      
      Fixes: c1fef73f ("nvmet: add passthru code to process commands")
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Suggested-by: NDouglas Gilbert <dgilbert@interlog.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      5e063101
    • L
      nvmet: limit passthru MTDS by BIO_MAX_PAGES · df06047d
      Logan Gunthorpe 提交于
      nvmet_passthru_map_sg() only supports mapping a single BIO, not a chain
      so the effective maximum transfer should also be limitted by
      BIO_MAX_PAGES (presently this works out to 1MB).
      
      For PCI passthru devices the max_sectors would typically be more
      limitting than BIO_MAX_PAGES, but this may not be true for all passthru
      devices.
      
      Fixes: c1fef73f ("nvmet: add passthru code to process commands")
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      df06047d
    • Z
      nvmet: fix uninitialized work for zero kato · 85bd23f3
      zhenwei pi 提交于
      When connecting a controller with a zero kato value using the following
      command line
      
         nvme connect -t tcp -n NQN -a ADDR -s PORT --keep-alive-tmo=0
      
      the warning below can be reproduced:
      
      WARNING: CPU: 1 PID: 241 at kernel/workqueue.c:1627 __queue_delayed_work+0x6d/0x90
      with trace:
        mod_delayed_work_on+0x59/0x90
        nvmet_update_cc+0xee/0x100 [nvmet]
        nvmet_execute_prop_set+0x72/0x80 [nvmet]
        nvmet_tcp_try_recv_pdu+0x2f7/0x770 [nvmet_tcp]
        nvmet_tcp_io_work+0x63f/0xb2d [nvmet_tcp]
        ...
      
      This is caused by queuing up an uninitialized work.  Althrough the
      keep-alive timer is disabled during allocating the controller (fixed in
      0d3b6a8d), ka_work still has a chance to run (called by
      nvmet_start_ctrl).
      
      Fixes: 0d3b6a8d ("nvmet: Disable keep-alive timer when kato is cleared to 0h")
      Signed-off-by: Nzhenwei pi <pizhenwei@bytedance.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      85bd23f3