1. 02 9月, 2020 4 次提交
    • G
      block: Make request_queue.rpm_status an enum · db04e18d
      Geert Uytterhoeven 提交于
      request_queue.rpm_status is assigned values of the rpm_status enum only,
      so reflect that in its type.
      
      Note that including <linux/pm.h> is (currently) a no-op, as it is
      already included through <linux/genhd.h> and <linux/device.h>, but it is
      better to play it safe.
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      db04e18d
    • J
      Merge branch 'block-5.9' into for-5.10/block · a98278ec
      Jens Axboe 提交于
      * block-5.9:
        blk-stat: make q->stats->lock irqsafe
        blk-iocost: ioc_pd_free() shouldn't assume irq disabled
        block: fix locking in bdev_del_partition
        block: release disk reference in hd_struct_free_work
        block: ensure bdi->io_pages is always initialized
        nvme-pci: cancel nvme device request before disabling
        nvme: only use power of two io boundaries
        nvme: fix controller instance leak
        nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'
        nvme: Fix NULL dereference for pci nvme controllers
        nvme-rdma: fix reset hang if controller died in the middle of a reset
        nvme-rdma: fix timeout handler
        nvme-rdma: serialize controller teardown sequences
        nvme-tcp: fix reset hang if controller died in the middle of a reset
        nvme-tcp: fix timeout handler
        nvme-tcp: serialize controller teardown sequences
        nvme: have nvme_wait_freeze_timeout return if it timed out
        nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance
        nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu
      a98278ec
    • T
      blk-stat: make q->stats->lock irqsafe · e11d80a8
      Tejun Heo 提交于
      blk-iocost calls blk_stat_enable_accounting() while holding an irqsafe lock
      which triggers a lockdep splat because q->stats->lock isn't irqsafe. Let's
      make it irqsafe.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: cd006509 ("blk-iocost: account for IO size when testing latencies")
      Cc: stable@vger.kernel.org # v5.8+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e11d80a8
    • T
      blk-iocost: ioc_pd_free() shouldn't assume irq disabled · 5aeac7c4
      Tejun Heo 提交于
      ioc_pd_free() grabs irq-safe ioc->lock without ensuring that irq is disabled
      when it can be called with irq disabled or enabled. This has a small chance
      of causing A-A deadlocks and triggers lockdep splats. Use irqsave operations
      instead.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: 7caa4715 ("blkcg: implement blk-iocost")
      Cc: stable@vger.kernel.org # v5.4+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5aeac7c4
  2. 01 9月, 2020 3 次提交
  3. 31 8月, 2020 12 次提交
    • L
      Linux 5.9-rc3 · f75aef39
      Linus Torvalds 提交于
      f75aef39
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · e43327c7
      Linus Torvalds 提交于
      Pull crypto fixes from Herbert Xu:
      
       - fix regression in af_alg that affects iwd
      
       - restore polling delay in qat
      
       - fix double free in ingenic on error path
      
       - fix potential build failure in sa2ul due to missing Kconfig dependency
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: af_alg - Work around empty control messages without MSG_MORE
        crypto: sa2ul - add Kconfig selects to fix build error
        crypto: ingenic - Drop kfree for memory allocated with devm_kzalloc
        crypto: qat - add delay before polling mailbox
      e43327c7
    • L
      Merge tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · dcc5c6f0
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
       "Three interrupt related fixes for X86:
      
         - Move disabling of the local APIC after invoking fixup_irqs() to
           ensure that interrupts which are incoming are noted in the IRR and
           not ignored.
      
         - Unbreak affinity setting.
      
           The rework of the entry code reused the regular exception entry
           code for device interrupts. The vector number is pushed into the
           errorcode slot on the stack which is then lifted into an argument
           and set to -1 because that's regs->orig_ax which is used in quite
           some places to check whether the entry came from a syscall.
      
           But it was overlooked that orig_ax is used in the affinity cleanup
           code to validate whether the interrupt has arrived on the new
           target. It turned out that this vector check is pointless because
           interrupts are never moved from one vector to another on the same
           CPU. That check is a historical leftover from the time where x86
           supported multi-CPU affinities, but not longer needed with the now
           strict single CPU affinity. Famous last words ...
      
         - Add a missing check for an empty cpumask into the matrix allocator.
      
           The affinity change added a warning to catch the case where an
           interrupt is moved on the same CPU to a different vector. This
           triggers because a condition with an empty cpumask returns an
           assignment from the allocator as the allocator uses for_each_cpu()
           without checking the cpumask for being empty. The historical
           inconsistent for_each_cpu() behaviour of ignoring the cpumask and
           unconditionally claiming that CPU0 is in the mask struck again.
           Sigh.
      
        plus a new entry into the MAINTAINER file for the HPE/UV platform"
      
      * tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/matrix: Deal with the sillyness of for_each_cpu() on UP
        x86/irq: Unbreak interrupt affinity setting
        x86/hotplug: Silence APIC only after all interrupts are migrated
        MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers
      dcc5c6f0
    • L
      Merge tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d2283cdc
      Linus Torvalds 提交于
      Pull irq fixes from Thomas Gleixner:
       "A set of fixes for interrupt chip drivers:
      
         - Revert the platform driver conversion of interrupt chip drivers as
           it turned out to create more problems than it solves.
      
         - Fix a trivial typo in the new module helpers which made probing
           reliably fail.
      
         - Small fixes in the STM32 and MIPS Ingenic drivers
      
         - The TI firmware rework which had badly managed dependencies and had
           to wait post rc1"
      
      * tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/ingenic: Leave parent IRQ unmasked on suspend
        irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake
        irqchip: Revert modular support for drivers using IRQCHIP_PLATFORM_DRIVER helperse
        irqchip: Fix probing deferal when using IRQCHIP_PLATFORM_DRIVER helpers
        arm64: dts: k3-am65: Update the RM resource types
        arm64: dts: k3-am65: ti-sci-inta/intr: Update to latest bindings
        arm64: dts: k3-j721e: ti-sci-inta/intr: Update to latest bindings
        irqchip/ti-sci-inta: Add support for INTA directly connecting to GIC
        irqchip/ti-sci-inta: Do not store TISCI device id in platform device id field
        dt-bindings: irqchip: Convert ti, sci-inta bindings to yaml
        dt-bindings: irqchip: ti, sci-inta: Update docs to support different parent.
        irqchip/ti-sci-intr: Add support for INTR being a parent to INTR
        dt-bindings: irqchip: Convert ti, sci-intr bindings to yaml
        dt-bindings: irqchip: ti, sci-intr: Update bindings to drop the usage of gic as parent
        firmware: ti_sci: Add support for getting resource with subtype
        firmware: ti_sci: Drop unused structure ti_sci_rm_type_map
        firmware: ti_sci: Drop the device id to resource type translation
      d2283cdc
    • L
      Merge tag 'sched-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0063a82d
      Linus Torvalds 提交于
      Pull scheduler fix from Thomas Gleixner:
       "A single fix for the scheduler:
      
         - Make is_idle_task() __always_inline to prevent the compiler from
           putting it out of line into the wrong section because it's used
           inside noinstr sections"
      
      * tag 'sched-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Use __always_inline on is_idle_task()
      0063a82d
    • L
      Merge tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b69bea8a
      Linus Torvalds 提交于
      Pull locking fixes from Thomas Gleixner:
       "A set of fixes for lockdep, tracing and RCU:
      
         - Prevent recursion by using raw_cpu_* operations
      
         - Fixup the interrupt state in the cpu idle code to be consistent
      
         - Push rcu_idle_enter/exit() invocations deeper into the idle path so
           that the lock operations are inside the RCU watching sections
      
         - Move trace_cpu_idle() into generic code so it's called before RCU
           goes idle.
      
         - Handle raw_local_irq* vs. local_irq* operations correctly
      
         - Move the tracepoints out from under the lockdep recursion handling
           which turned out to be fragile and inconsistent"
      
      * tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        lockdep,trace: Expose tracepoints
        lockdep: Only trace IRQ edges
        mips: Implement arch_irqs_disabled()
        arm64: Implement arch_irqs_disabled()
        nds32: Implement arch_irqs_disabled()
        locking/lockdep: Cleanup
        x86/entry: Remove unused THUNKs
        cpuidle: Move trace_cpu_idle() into generic code
        cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic
        sched,idle,rcu: Push rcu_idle deeper into the idle path
        cpuidle: Fixup IRQ state
        lockdep: Use raw_cpu_*() for per-cpu variables
      b69bea8a
    • L
      Merge tag '5.9-rc2-smb-fix' of git://git.samba.org/sfrench/cifs-2.6 · 3edd8db2
      Linus Torvalds 提交于
      Pull cfis fix from Steve French:
       "DFS fix for referral problem when using SMB1"
      
      * tag '5.9-rc2-smb-fix' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: fix check of tcon dfs in smb1
      3edd8db2
    • L
      Merge tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 8bb5021c
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
      
       - Revert our removal of PROT_SAO, at least one user expressed an
         interest in using it on Power9. Instead don't allow it to be used in
         guests unless enabled explicitly at compile time.
      
       - A fix for a crash introduced by a recent change to FP handling.
      
       - Revert a change to our idle code that left Power10 with no idle
         support.
      
       - One minor fix for the new scv system call path to set PPR.
      
       - Fix a crash in our "generic" PMU if branch stack events were enabled.
      
       - A fix for the IMC PMU, to correctly identify host kernel samples.
      
       - The ADB_PMU powermac code was found to be incompatible with
         VMAP_STACK, so make them incompatible in Kconfig until the code can
         be fixed.
      
       - A build fix in drivers/video/fbdev/controlfb.c, and a documentation
         fix.
      
      Thanks to Alexey Kardashevskiy, Athira Rajeev, Christophe Leroy,
      Giuseppe Sacco, Madhavan Srinivasan, Milton Miller, Nicholas Piggin,
      Pratik Rajesh Sampat, Randy Dunlap, Shawn Anastasio, Vaidyanathan
      Srinivasan.
      
      * tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/32s: Disable VMAP stack which CONFIG_ADB_PMU
        Revert "powerpc/powernv/idle: Replace CPU feature check with PVR check"
        powerpc/perf: Fix reading of MSR[HV/PR] bits in trace-imc
        powerpc/perf: Fix crashes with generic_compat_pmu & BHRB
        powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode
        powerpc/64s: scv entry should set PPR
        Documentation/powerpc: fix malformed table in syscall64-abi
        video: fbdev: controlfb: Fix build for COMPILE_TEST=y && PPC_PMAC=n
        selftests/powerpc: Update PROT_SAO test to skip ISA 3.1
        powerpc/64s: Disallow PROT_SAO in LPARs by default
        Revert "powerpc/64s: Remove PROT_SAO support"
      8bb5021c
    • L
      Merge tag 'usb-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 6f0306d1
      Linus Torvalds 提交于
      Pull USB fixes from Greg KH:
       "Let's try this again...  Here are some USB fixes for 5.9-rc3.
      
        This differs from the previous pull request for this release in that
        the usb gadget patch now does not break some systems, and actually
        does what it was intended to do. Many thanks to Marek Szyprowski for
        quickly noticing and testing the patch from Andy Shevchenko to resolve
        this issue.
      
        Additionally, some more new USB quirks have been added to get some new
        devices to work properly based on user reports.
      
        Other than that, the patches are all here, and they contain:
      
         - usb gadget driver fixes
      
         - xhci driver fixes
      
         - typec fixes
      
         - new quirks and ids
      
         - fixes for USB patches that went into 5.9-rc1.
      
        All of these have been tested in linux-next with no reported issues"
      
      * tag 'usb-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
        usb: storage: Add unusual_uas entry for Sony PSZ drives
        USB: Ignore UAS for JMicron JMS567 ATA/ATAPI Bridge
        usb: host: ohci-exynos: Fix error handling in exynos_ohci_probe()
        USB: gadget: u_f: Unbreak offset calculation in VLAs
        USB: quirks: Ignore duplicate endpoint on Sound Devices MixPre-D
        usb: typec: tcpm: Fix Fix source hard reset response for TDA 2.3.1.1 and TDA 2.3.1.2 failures
        USB: PHY: JZ4770: Fix static checker warning.
        USB: gadget: f_ncm: add bounds checks to ncm_unwrap_ntb()
        USB: gadget: u_f: add overflow checks to VLA macros
        xhci: Always restore EP_SOFT_CLEAR_TOGGLE even if ep reset failed
        xhci: Do warm-reset when both CAS and XDEV_RESUME are set
        usb: host: xhci: fix ep context print mismatch in debugfs
        usb: uas: Add quirk for PNY Pro Elite
        tools: usb: move to tools buildsystem
        USB: Fix device driver race
        USB: Also match device drivers using the ->match vfunc
        usb: host: xhci-tegra: fix tegra_xusb_get_phy()
        usb: host: xhci-tegra: otg usb2/usb3 port init
        usb: hcd: Fix use after free in usb_hcd_pci_remove()
        usb: typec: ucsi: Hold con->lock for the entire duration of ucsi_register_port()
        ...
      6f0306d1
    • L
      Merge tag 'edac_urgent_for_v5.9_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras · 42df60fc
      Linus Torvalds 提交于
      Pull EDAC fix from Borislav Petkov:
       "A fix to properly clear ghes_edac driver state on driver remove so
        that a subsequent load can probe the system properly (Shiju Jose)"
      
      * tag 'edac_urgent_for_v5.9_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
        EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()
      42df60fc
    • L
      Merge tag 'dma-mapping-5.9-2' of git://git.infradead.org/users/hch/dma-mapping · c4011283
      Linus Torvalds 提交于
      Pull dma-mapping fix from Christoph Hellwig:
       "Fix a possibly uninitialized variable (Dan Carpenter)"
      
      * tag 'dma-mapping-5.9-2' of git://git.infradead.org/users/hch/dma-mapping:
        dma-pool: Fix an uninitialized variable bug in atomic_pool_expand()
      c4011283
    • T
      genirq/matrix: Deal with the sillyness of for_each_cpu() on UP · 784a0830
      Thomas Gleixner 提交于
      Most of the CPU mask operations behave the same way, but for_each_cpu() and
      it's variants ignore the cpumask argument and claim that CPU0 is always in
      the mask. This is historical, inconsistent and annoying behaviour.
      
      The matrix allocator uses for_each_cpu() and can be called on UP with an
      empty cpumask. The calling code does not expect that this succeeds but
      until commit e027ffff ("x86/irq: Unbreak interrupt affinity setting")
      this went unnoticed. That commit added a WARN_ON() to catch cases which
      move an interrupt from one vector to another on the same CPU. The warning
      triggers on UP.
      
      Add a check for the cpumask being empty to prevent this.
      
      Fixes: 2f75d9e1 ("genirq: Implement bitmap matrix allocator")
      Reported-by: Nkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      784a0830
  4. 30 8月, 2020 7 次提交
    • L
      Merge tag 'fallthrough-fixes-5.9-rc3' of... · 1127b219
      Linus Torvalds 提交于
      Merge tag 'fallthrough-fixes-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
      
      Pull fallthrough fixes from Gustavo A. R. Silva:
       "Fix some minor issues introduced by the recent treewide fallthrough
        conversions:
      
         - Fix identation issue
      
         - Fix erroneous fallthrough annotation
      
         - Remove unnecessary fallthrough annotation
      
         - Fix code comment changed by fallthrough conversion"
      
      * tag 'fallthrough-fixes-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
        arm64/cpuinfo: Remove unnecessary fallthrough annotation
        media: dib0700: Fix identation issue in dib8096_set_param_override()
        afs: Remove erroneous fallthough annotation
        iio: dpot-dac: fix code comment in dpot_dac_read_raw()
      1127b219
    • L
      fsldma: fix very broken 32-bit ppc ioread64 functionality · 0a4c56c8
      Linus Torvalds 提交于
      Commit ef91bb19 ("kernel.h: Silence sparse warning in
      lower_32_bits") caused new warnings to show in the fsldma driver, but
      that commit was not to blame: it only exposed some very incorrect code
      that tried to take the low 32 bits of an address.
      
      That made no sense for multiple reasons, the most notable one being that
      that code was intentionally limited to only 32-bit ppc builds, so "only
      low 32 bits of an address" was completely nonsensical.  There were no
      high bits to mask off to begin with.
      
      But even more importantly fropm a correctness standpoint, turning the
      address into an integer then caused the subsequent address arithmetic to
      be completely wrong too, and the "+1" actually incremented the address
      by one, rather than by four.
      
      Which again was incorrect, since the code was reading two 32-bit values
      and trying to make a 64-bit end result of it all.  Surprisingly, the
      iowrite64() did not suffer from the same odd and incorrect model.
      
      This code has never worked, but it's questionable whether anybody cared:
      of the two users that actually read the 64-bit value (by way of some C
      preprocessor hackery and eventually the 'get_cdar()' inline function),
      one of them explicitly ignored the value, and the other one might just
      happen to work despite the incorrect value being read.
      
      This patch at least makes it not fail the build any more, and makes the
      logic superficially sane.  Whether it makes any difference to the code
      _working_ or not shall remain a mystery.
      Compile-tested-by: NGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a4c56c8
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · e77aee13
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "A core fix for ACPI matching and two driver bugfixes"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: iproc: Fix shifting 31 bits
        i2c: rcar: in slave mode, clear NACK earlier
        i2c: acpi: Remove dead code, i.e. i2c_acpi_match_device()
        i2c: core: Don't fail PRP0001 enumeration when no ID table exist
      e77aee13
    • L
      Merge tag 's390-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 1b46b921
      Linus Torvalds 提交于
      Pull s390 fixes from Vasily Gorbik:
      
       - Disable preemption trace in percpu macros since the lockdep code
         itself uses percpu variables now and it causes recursions.
      
       - Fix kernel space 4-level paging broken by recent vmem rework.
      
      * tag 's390-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390/vmem: fix vmem_add_range for 4-level paging
        s390: don't trace preemption in percpu macros
      1b46b921
    • L
      Merge tag 'for-linus-5.9-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · c8b5563a
      Linus Torvalds 提交于
      Pull xen fixes from Juergen Gross:
       "Two fixes for Xen: one needed for ongoing work to support virtio with
        Xen, and one for a corner case in IRQ handling with Xen"
      
      * tag 'for-linus-5.9-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        arm/xen: Add misuse warning to virt_to_gfn
        xen/xenbus: Fix granting of vmalloc'd memory
        XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt XEN data pointer which contains XEN specific information.
      c8b5563a
    • L
      Merge tag 'hwmon-for-v5.9-rc3' of... · e4cad138
      Linus Torvalds 提交于
      Merge tag 'hwmon-for-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
      
       - Fix tempeerature scale in gsc-hwmon driver
      
       - Fix divide by 0 error in nct7904 driver
      
       - Drop non-existing attribute from pmbus/isl68137 driver
      
       - Fix status check in applesmc driver
      
      * tag 'hwmon-for-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (gsc-hwmon) Scale temperature to millidegrees
        hwmon: (applesmc) check status earlier.
        hwmon: (nct7904) Correct divide by 0
        hwmon: (pmbus/isl68137) remove READ_TEMPERATURE_1 telemetry for RAA228228
      e4cad138
    • J
      Merge branch 'nvme-5.9-rc' of git://git.infradead.org/nvme into block-5.9 · 5d220bcd
      Jens Axboe 提交于
      Pull NVMe fixes from Sagi:
      
      "- instance leak and io boundary fixes from Keith
       - fc locking fix from Christophe
       - various tcp/rdma reset during traffic fixes from Me
       - pci use-after-free fix from Tong
       - tcp target null deref fix from Ziye"
      
      * 'nvme-5.9-rc' of git://git.infradead.org/nvme:
        nvme-pci: cancel nvme device request before disabling
        nvme: only use power of two io boundaries
        nvme: fix controller instance leak
        nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'
        nvme: Fix NULL dereference for pci nvme controllers
        nvme-rdma: fix reset hang if controller died in the middle of a reset
        nvme-rdma: fix timeout handler
        nvme-rdma: serialize controller teardown sequences
        nvme-tcp: fix reset hang if controller died in the middle of a reset
        nvme-tcp: fix timeout handler
        nvme-tcp: serialize controller teardown sequences
        nvme: have nvme_wait_freeze_timeout return if it timed out
        nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance
        nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu
      5d220bcd
  5. 29 8月, 2020 14 次提交
    • T
      nvme-pci: cancel nvme device request before disabling · 7ad92f65
      Tong Zhang 提交于
      This patch addresses an irq free warning and null pointer dereference
      error problem when nvme devices got timeout error during initialization.
      This problem happens when nvme_timeout() function is called while
      nvme_reset_work() is still in execution. This patch fixed the problem by
      setting flag of the problematic request to NVME_REQ_CANCELLED before
      calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns
      an error code and let nvme_submit_sync_cmd() fail gracefully.
      The following is console output.
      
      [   62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller
      [   62.488796] nvme nvme0: could not set timestamp (881)
      [   62.494888] ------------[ cut here ]------------
      [   62.495142] Trying to free already-free IRQ 11
      [   62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370
      [   62.495742] Modules linked in:
      [   62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8
      [   62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
      [   62.496772] Workqueue: nvme-reset-wq nvme_reset_work
      [   62.497019] RIP: 0010:free_irq+0x1f7/0x370
      [   62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70
      [   62.498133] RSP: 0000:ffffa96800043d40 EFLAGS: 00010086
      [   62.498391] RAX: 0000000000000000 RBX: ffff9b87fc458400 RCX: 0000000000000000
      [   62.498741] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff9693d72c
      [   62.499091] RBP: ffff9b87fd4c8f60 R08: ffffa96800043bfd R09: 0000000000000163
      [   62.499440] R10: ffffa96800043bf8 R11: ffffa96800043bfd R12: ffff9b87fd4c8e00
      [   62.499790] R13: ffff9b87fd4c8ea4 R14: 000000000000000b R15: ffff9b87fd76b000
      [   62.500140] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
      [   62.500534] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   62.500816] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
      [   62.501165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   62.501515] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   62.501864] Call Trace:
      [   62.501993]  pci_free_irq+0x13/0x20
      [   62.502167]  nvme_reset_work+0x5d0/0x12a0
      [   62.502369]  ? update_load_avg+0x59/0x580
      [   62.502569]  ? ttwu_queue_wakelist+0xa8/0xc0
      [   62.502780]  ? try_to_wake_up+0x1a2/0x450
      [   62.502979]  process_one_work+0x1d2/0x390
      [   62.503179]  worker_thread+0x45/0x3b0
      [   62.503361]  ? process_one_work+0x390/0x390
      [   62.503568]  kthread+0xf9/0x130
      [   62.503726]  ? kthread_park+0x80/0x80
      [   62.503911]  ret_from_fork+0x22/0x30
      [   62.504090] ---[ end trace de9ed4a70f8d71e2 ]---
      [  123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller
      [  123.914670] nvme nvme0: 1/0/0 default/read/poll queues
      [  123.916310] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [  123.917469] #PF: supervisor write access in kernel mode
      [  123.917725] #PF: error_code(0x0002) - not-present page
      [  123.917976] PGD 0 P4D 0
      [  123.918109] Oops: 0002 [#1] SMP PTI
      [  123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G        W         5.8.0+ #8
      [  123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4
      [  123.919219] Workqueue: nvme-reset-wq nvme_reset_work
      [  123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
      [  123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
      [  123.920657] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
      [  123.920912] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
      [  123.921258] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
      [  123.921602] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
      [  123.921949] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
      [  123.922295] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
      [  123.922641] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
      [  123.923032] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  123.923312] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
      [  123.923660] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  123.924007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  123.924353] Call Trace:
      [  123.924479]  blk_mq_alloc_tag_set+0x137/0x2a0
      [  123.924694]  nvme_reset_work+0xed6/0x12a0
      [  123.924898]  process_one_work+0x1d2/0x390
      [  123.925099]  worker_thread+0x45/0x3b0
      [  123.925280]  ? process_one_work+0x390/0x390
      [  123.925486]  kthread+0xf9/0x130
      [  123.925642]  ? kthread_park+0x80/0x80
      [  123.925825]  ret_from_fork+0x22/0x30
      [  123.926004] Modules linked in:
      [  123.926158] CR2: 0000000000000000
      [  123.926322] ---[ end trace de9ed4a70f8d71e3 ]---
      [  123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80
      [  123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4
      [  123.927734] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286
      [  123.927989] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000
      [  123.928336] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000
      [  123.928679] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000
      [  123.929025] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000
      [  123.929370] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000
      [  123.929715] FS:  0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000
      [  123.930106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  123.930384] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0
      [  123.930731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  123.931077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Co-developed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NTong Zhang <ztong0001@gmail.com>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      7ad92f65
    • K
      nvme: only use power of two io boundaries · e83d776f
      Keith Busch 提交于
      The kernel requires a power of two for boundaries because that's the
      only way it can efficiently split commands that cross them. A
      controller, however, may report a non-power of two boundary.
      
      The driver had been rounding the controller's value to one the kernel
      can use, but splitting on the wrong boundary provides no benefit on the
      device side, and incurs additional submission overhead from non-optimal
      splits.
      
      Don't provide any boundary hint if the controller's value can't be used
      and log a warning when first scanning a disk's unreported IO boundary.
      Since the chunk sector logic has grown, move it to a separate function.
      
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      e83d776f
    • K
      nvme: fix controller instance leak · 192f6c29
      Keith Busch 提交于
      If the driver has to unbind from the controller for an early failure
      before the subsystem has been set up, there won't be a subsystem holding
      the controller's instance, so the controller needs to free its own
      instance in this case.
      
      Fixes: 733e4b69 ("nvme: Assign subsys instance from first ctrl")
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      192f6c29
    • C
      nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()' · 70e37988
      Christophe JAILLET 提交于
      The way 'spin_lock()' and 'spin_lock_irqsave()' are used is not consistent
      in this function.
      
      Use 'spin_lock_irqsave()' also here, as there is no guarantee that
      interruptions are disabled at that point, according to surrounding code.
      
      Fixes: a97ec51b ("nvmet_fc: Rework target side abort handling")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      70e37988
    • S
      nvme: Fix NULL dereference for pci nvme controllers · 7cd49f75
      Sagi Grimberg 提交于
      PCIe controllers do not have fabric opts, verify they exist before
      showing ctrl_loss_tmo or reconnect_delay attributes.
      
      Fixes: 764075fd ("nvme: expose reconnect_delay and ctrl_loss_tmo via sysfs")
      Reported-by: NTobias Markus <tobias@markus-regensburg.de>
      Reviewed-by: NKeith Busch <kbusch@kernel.org>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      7cd49f75
    • S
      nvme-rdma: fix reset hang if controller died in the middle of a reset · 2362acb6
      Sagi Grimberg 提交于
      If the controller becomes unresponsive in the middle of a reset, we
      will hang because we are waiting for the freeze to complete, but that
      cannot happen since we have commands that are inflight holding the
      q_usage_counter, and we can't blindly fail requests that times out.
      
      So give a timeout and if we cannot wait for queue freeze before
      unfreezing, fail and have the error handling take care how to
      proceed (either schedule a reconnect of remove the controller).
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      2362acb6
    • S
      nvme-rdma: fix timeout handler · 0475a8dc
      Sagi Grimberg 提交于
      When a request times out in a LIVE state, we simply trigger error
      recovery and let the error recovery handle the request cancellation,
      however when a request times out in a non LIVE state, we make sure to
      complete it immediately as it might block controller setup or teardown
      and prevent forward progress.
      
      However tearing down the entire set of I/O and admin queues causes
      freeze/unfreeze imbalance (q->mq_freeze_depth) because and is really
      an overkill to what we actually need, which is to just fence controller
      teardown that may be running, stop the queue, and cancel the request if
      it is not already completed.
      
      Now that we have the controller teardown_lock, we can safely serialize
      request cancellation. This addresses a hang caused by calling extra
      queue freeze on controller namespaces, causing unfreeze to not complete
      correctly.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      0475a8dc
    • S
      nvme-rdma: serialize controller teardown sequences · 5110f402
      Sagi Grimberg 提交于
      In the timeout handler we may need to complete a request because the
      request that timed out may be an I/O that is a part of a serial sequence
      of controller teardown or initialization. In order to complete the
      request, we need to fence any other context that may compete with us
      and complete the request that is timing out.
      
      In this case, we could have a potential double completion in case
      a hard-irq or a different competing context triggered error recovery
      and is running inflight request cancellation concurrently with the
      timeout handler.
      
      Protect using a ctrl teardown_lock to serialize contexts that may
      complete a cancelled request due to error recovery or a reset.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      5110f402
    • S
      nvme-tcp: fix reset hang if controller died in the middle of a reset · e5c01f4f
      Sagi Grimberg 提交于
      If the controller becomes unresponsive in the middle of a reset, we will
      hang because we are waiting for the freeze to complete, but that cannot
      happen since we have commands that are inflight holding the
      q_usage_counter, and we can't blindly fail requests that times out.
      
      So give a timeout and if we cannot wait for queue freeze before
      unfreezing, fail and have the error handling take care how to proceed
      (either schedule a reconnect of remove the controller).
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      e5c01f4f
    • S
      nvme-tcp: fix timeout handler · 236187c4
      Sagi Grimberg 提交于
      When a request times out in a LIVE state, we simply trigger error
      recovery and let the error recovery handle the request cancellation,
      however when a request times out in a non LIVE state, we make sure to
      complete it immediately as it might block controller setup or teardown
      and prevent forward progress.
      
      However tearing down the entire set of I/O and admin queues causes
      freeze/unfreeze imbalance (q->mq_freeze_depth) because and is really
      an overkill to what we actually need, which is to just fence controller
      teardown that may be running, stop the queue, and cancel the request if
      it is not already completed.
      
      Now that we have the controller teardown_lock, we can safely serialize
      request cancellation. This addresses a hang caused by calling extra
      queue freeze on controller namespaces, causing unfreeze to not complete
      correctly.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      236187c4
    • S
      nvme-tcp: serialize controller teardown sequences · d4d61470
      Sagi Grimberg 提交于
      In the timeout handler we may need to complete a request because the
      request that timed out may be an I/O that is a part of a serial sequence
      of controller teardown or initialization. In order to complete the
      request, we need to fence any other context that may compete with us
      and complete the request that is timing out.
      
      In this case, we could have a potential double completion in case
      a hard-irq or a different competing context triggered error recovery
      and is running inflight request cancellation concurrently with the
      timeout handler.
      
      Protect using a ctrl teardown_lock to serialize contexts that may
      complete a cancelled request due to error recovery or a reset.
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      d4d61470
    • S
      nvme: have nvme_wait_freeze_timeout return if it timed out · 7cf0d7c0
      Sagi Grimberg 提交于
      Users can detect if the wait has completed or not and take appropriate
      actions based on this information (e.g. weather to continue
      initialization or rather fail and schedule another initialization
      attempt).
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      7cf0d7c0
    • S
      nvme-fabrics: don't check state NVME_CTRL_NEW for request acceptance · d7144f5c
      Sagi Grimberg 提交于
      NVME_CTRL_NEW should never see any I/O, because in order to start
      initialization it has to transition to NVME_CTRL_CONNECTING and from
      there it will never return to this state.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      d7144f5c
    • Z
      nvmet-tcp: Fix NULL dereference when a connect data comes in h2cdata pdu · a6ce7d7b
      Ziye Yang 提交于
      When handling commands without in-capsule data, we assign the ttag
      assuming we already have the queue commands array allocated (based
      on the queue size information in the connect data payload). However
      if the connect itself did not send the connect data in-capsule we
      have yet to allocate the queue commands,and we will assign a bogus
      ttag and suffer a NULL dereference when we receive the corresponding
      h2cdata pdu.
      
      Fix this by checking if we already allocated commands before
      dereferencing it when handling h2cdata, if we didn't, its for sure a
      connect and we should use the preallocated connect command.
      Signed-off-by: NZiye Yang <ziye.yang@intel.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      a6ce7d7b