1. 24 3月, 2019 40 次提交
    • S
      clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability · e19ca3fe
      Samuel Holland 提交于
      commit c950ca8c35eeb32224a63adc47e12f9e226da241 upstream.
      
      The Allwinner A64 SoC is known[1] to have an unstable architectural
      timer, which manifests itself most obviously in the time jumping forward
      a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a
      timer frequency of 24 MHz, implying that the time went slightly backward
      (and this was interpreted by the kernel as it jumping forward and
      wrapping around past the epoch).
      
      Investigation revealed instability in the low bits of CNTVCT at the
      point a high bit rolls over. This leads to power-of-two cycle forward
      and backward jumps. (Testing shows that forward jumps are about twice as
      likely as backward jumps.) Since the counter value returns to normal
      after an indeterminate read, each "jump" really consists of both a
      forward and backward jump from the software perspective.
      
      Unless the kernel is trapping CNTVCT reads, a userspace program is able
      to read the register in a loop faster than it changes. A test program
      running on all 4 CPU cores that reported jumps larger than 100 ms was
      run for 13.6 hours and reported the following:
      
       Count | Event
      -------+---------------------------
        9940 | jumped backward      699ms
         268 | jumped backward     1398ms
           1 | jumped backward     2097ms
       16020 | jumped forward       175ms
        6443 | jumped forward       699ms
        2976 | jumped forward      1398ms
           9 | jumped forward    356516ms
           9 | jumped forward    357215ms
           4 | jumped forward    714430ms
           1 | jumped forward   3578440ms
      
      This works out to a jump larger than 100 ms about every 5.5 seconds on
      each CPU core.
      
      The largest jump (almost an hour!) was the following sequence of reads:
          0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000
      
      Note that the middle bits don't necessarily all read as all zeroes or
      all ones during the anomalous behavior; however the low 10 bits checked
      by the function in this patch have never been observed with any other
      value.
      
      Also note that smaller jumps are much more common, with backward jumps
      of 2048 (2^11) cycles observed over 400 times per second on each core.
      (Of course, this is partially explained by lower bits rolling over more
      frequently.) Any one of these could have caused the 95 year time skip.
      
      Similar anomalies were observed while reading CNTPCT (after patching the
      kernel to allow reads from userspace). However, the CNTPCT jumps are
      much less frequent, and only small jumps were observed. The same program
      as before (except now reading CNTPCT) observed after 72 hours:
      
       Count | Event
      -------+---------------------------
          17 | jumped backward      699ms
          52 | jumped forward       175ms
        2831 | jumped forward       699ms
           5 | jumped forward      1398ms
      
      Further investigation showed that the instability in CNTPCT/CNTVCT also
      affected the respective timer's TVAL register. The following values were
      observed immediately after writing CNVT_TVAL to 0x10000000:
      
       CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL Error
      --------------------+------------+--------------------+-----------------
       0x000000d4a2d8bfff | 0x10003fff | 0x000000d4b2d8bfff | +0x00004000
       0x000000d4a2d94000 | 0x0fffffff | 0x000000d4b2d97fff | -0x00004000
       0x000000d4a2d97fff | 0x10003fff | 0x000000d4b2d97fff | +0x00004000
       0x000000d4a2d9c000 | 0x0fffffff | 0x000000d4b2d9ffff | -0x00004000
      
      The pattern of errors in CNTV_TVAL seemed to depend on exactly which
      value was written to it. For example, after writing 0x10101010:
      
       CNTVCT             | CNTV_TVAL  | CNTV_CVAL          | CNTV_TVAL Error
      --------------------+------------+--------------------+-----------------
       0x000001ac3effffff | 0x1110100f | 0x000001ac4f10100f | +0x1000000
       0x000001ac40000000 | 0x1010100f | 0x000001ac5110100f | -0x1000000
       0x000001ac58ffffff | 0x1110100f | 0x000001ac6910100f | +0x1000000
       0x000001ac66000000 | 0x1010100f | 0x000001ac7710100f | -0x1000000
       0x000001ac6affffff | 0x1110100f | 0x000001ac7b10100f | +0x1000000
       0x000001ac6e000000 | 0x1010100f | 0x000001ac7f10100f | -0x1000000
      
      I was also twice able to reproduce the issue covered by Allwinner's
      workaround[4], that writing to TVAL sometimes fails, and both CVAL and
      TVAL are left with entirely bogus values. One was the following values:
      
       CNTVCT             | CNTV_TVAL  | CNTV_CVAL
      --------------------+------------+--------------------------------------
       0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the past)
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      
      ========================================================================
      
      Because the CPU can read the CNTPCT/CNTVCT registers faster than they
      change, performing two reads of the register and comparing the high bits
      (like other workarounds) is not a workable solution. And because the
      timer can jump both forward and backward, no pair of reads can
      distinguish a good value from a bad one. The only way to guarantee a
      good value from consecutive reads would be to read _three_ times, and
      take the middle value only if the three values are 1) each unique and
      2) increasing. This takes at minimum 3 counter cycles (125 ns), or more
      if an anomaly is detected.
      
      However, since there is a distinct pattern to the bad values, we can
      optimize the common case (1022/1024 of the time) to a single read by
      simply ignoring values that match the error pattern. This still takes no
      more than 3 cycles in the worst case, and requires much less code. As an
      additional safety check, we still limit the loop iteration to the number
      of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods.
      
      For the TVAL registers, the simple solution is to not use them. Instead,
      read or write the CVAL and calculate the TVAL value in software.
      
      Although the manufacturer is aware of at least part of the erratum[4],
      there is no official name for it. For now, use the kernel-internal name
      "UNKNOWN1".
      
      [1]: https://github.com/armbian/build/commit/a08cd6fe7ae9
      [2]: https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/
      [3]: https://irclog.whitequark.org/linux-sunxi/2018-01-26
      [4]: https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272Acked-by: NMaxime Ripard <maxime.ripard@bootlin.com>
      Tested-by: NAndre Przywara <andre.przywara@arm.com>
      Signed-off-by: NSamuel Holland <samuel@sholland.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e19ca3fe
    • S
      clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown · ef8062e2
      Stuart Menefy 提交于
      commit d2f276c8d3c224d5b493c42b6cf006ae4e64fb1c upstream.
      
      When shutting down the timer, ensure that after we have stopped the
      timer any pending interrupts are cleared. This fixes a problem when
      suspending, as interrupts are disabled before the timer is stopped,
      so the timer interrupt may still be asserted, preventing the system
      entering a low power state when the wfi is executed.
      Signed-off-by: NStuart Menefy <stuart.menefy@mathembedded.com>
      Reviewed-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: <stable@vger.kernel.org> # v4.3+
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef8062e2
    • S
      clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR · c1f45c10
      Stuart Menefy 提交于
      commit a5719a40aef956ba704f2aa1c7b977224d60fa96 upstream.
      
      When a timer tick occurs and the clock is in one-shot mode, the timer
      needs to be stopped to prevent it triggering subsequent interrupts.
      Currently this code is in exynos4_mct_tick_clear(), but as it is
      only needed when an ISR occurs move it into exynos4_mct_tick_isr(),
      leaving exynos4_mct_tick_clear() just doing what its name suggests it
      should.
      Signed-off-by: NStuart Menefy <stuart.menefy@mathembedded.com>
      Reviewed-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: stable@vger.kernel.org # v4.3+
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c1f45c10
    • S
      regulator: s2mpa01: Fix step values for some LDOs · 06607b1b
      Stuart Menefy 提交于
      commit 28c4f730d2a44f2591cb104091da29a38dac49fe upstream.
      
      The step values for some of the LDOs appears to be incorrect, resulting
      in incorrect voltages (or at least, ones which are different from the
      Samsung 3.4 vendor kernel).
      Signed-off-by: NStuart Menefy <stuart.menefy@mathembedded.com>
      Reviewed-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06607b1b
    • M
      regulator: max77620: Initialize values for DT properties · c288e34d
      Mark Zhang 提交于
      commit 0ab66b3c326ef8f77dae9f528118966365757c0c upstream.
      
      If regulator DT node doesn't exist, its of_parse_cb callback
      function isn't called. Then all values for DT properties are
      filled with zero. This leads to wrong register update for
      FPS and POK settings.
      Signed-off-by: NJinyoung Park <jinyoungp@nvidia.com>
      Signed-off-by: NMark Zhang <markz@nvidia.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c288e34d
    • K
      regulator: s2mps11: Fix steps for buck7, buck8 and LDO35 · 462aee48
      Krzysztof Kozlowski 提交于
      commit 56b5d4ea778c1b0989c5cdb5406d4a488144c416 upstream.
      
      LDO35 uses 25 mV step, not 50 mV.  Bucks 7 and 8 use 12.5 mV step
      instead of 6.25 mV.  Wrong step caused over-voltage (LDO35) or
      under-voltage (buck7 and 8) if regulators were used (e.g. on Exynos5420
      Arndale Octa board).
      
      Cc: <stable@vger.kernel.org>
      Fixes: cb74685e ("regulator: s2mps11: Add samsung s2mps11 regulator driver")
      Signed-off-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      462aee48
    • A
      spi: pxa2xx: Setup maximum supported DMA transfer length · 15ead7e2
      Andy Shevchenko 提交于
      commit ef070b4e4aa25bb5f8632ad196644026c11903bf upstream.
      
      When the commit b6ced294
      
         ("spi: pxa2xx: Switch to SPI core DMA mapping functionality")
      
      switches to SPI core provided DMA helpers, it missed to setup maximum
      supported DMA transfer length for the controller and thus users
      mistakenly try to send more data than supported with the following
      warning:
      
        ili9341 spi-PRP0001:01: DMA disabled for transfer length 153600 greater than 65536
      
      Setup maximum supported DMA transfer length in order to make users know
      the limit.
      
      Fixes: b6ced294 ("spi: pxa2xx: Switch to SPI core DMA mapping functionality")
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15ead7e2
    • V
      spi: ti-qspi: Fix mmap read when more than one CS in use · e51c5ec9
      Vignesh R 提交于
      commit 673c865efbdc5fec3cc525c46d71844d42c60072 upstream.
      
      Commit 4dea6c9b ("spi: spi-ti-qspi: add mmap mode read support") has
      has got order of parameter wrong when calling regmap_update_bits() to
      select CS for mmap access. Mask and value arguments are interchanged.
      Code will work on a system with single slave, but fails when more than
      one CS is in use. Fix this by correcting the order of parameters when
      calling regmap_update_bits().
      
      Fixes: 4dea6c9b ("spi: spi-ti-qspi: add mmap mode read support")
      Cc: stable@vger.kernel.org
      Signed-off-by: NVignesh R <vigneshr@ti.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e51c5ec9
    • A
      netfilter: ipt_CLUSTERIP: fix warning unused variable cn · 0d98ecb1
      Anders Roxell 提交于
      commit 206b8cc514d7ff2b79dd2d5ad939adc7c493f07a upstream.
      
      When CONFIG_PROC_FS isn't set the variable cn isn't used.
      
      net/ipv4/netfilter/ipt_CLUSTERIP.c: In function ‘clusterip_net_exit’:
      net/ipv4/netfilter/ipt_CLUSTERIP.c:849:24: warning: unused variable ‘cn’ [-Wunused-variable]
        struct clusterip_net *cn = clusterip_pernet(net);
                              ^~
      
      Rework so the variable 'cn' is declared inside "#ifdef CONFIG_PROC_FS".
      
      Fixes: b12f7bad5ad3 ("netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine")
      Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0d98ecb1
    • J
      mmc:fix a bug when max_discard is 0 · 6bd9959a
      Jiong Wu 提交于
      commit d4721339dcca7def04909a8e60da43c19a24d8bf upstream.
      
      The original purpose of the code I fix is to replace max_discard with
      max_trim if max_trim is less than max_discard. When max_discard is 0
      we should replace max_discard with max_trim as well, because
      max_discard equals 0 happens only when the max_do_calc_max_discard
      process is overflowed, so if mmc_can_trim(card) is true, max_discard
      should be replaced by an available max_trim.
      However, in the original code, there are two lines of code interfere
      the right process.
      1) if (max_discard && mmc_can_trim(card))
      when max_discard is 0, it skips the process checking if max_discard
      needs to be replaced with max_trim.
      2) if (max_trim < max_discard)
      the condition is false when max_discard is 0. it also skips the process
      that replaces max_discard with max_trim, in fact, we should replace the
      0-valued max_discard with max_trim.
      Signed-off-by: NJiong Wu <Lohengrin1024@gmail.com>
      Fixes: b305882f (mmc: core: optimize mmc_calc_max_discard)
      Cc: stable@vger.kernel.org # v4.17+
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bd9959a
    • B
      mmc: sdhci-esdhc-imx: fix HS400 timing issue · 2946910e
      BOUGH CHEN 提交于
      commit de0a0decf2edfc5b0c782915f4120cf990a9bd13 upstream.
      
      Now tuning reset will be done when the timing is MMC_TIMING_LEGACY/
      MMC_TIMING_MMC_HS/MMC_TIMING_SD_HS. But for timing MMC_TIMING_MMC_HS,
      we can not do tuning reset, otherwise HS400 timing is not right.
      
      Here is the process of init HS400, first finish tuning in HS200 mode,
      then switch to HS mode and 8 bit DDR mode, finally switch to HS400
      mode. If we do tuning reset in HS mode, this will cause HS400 mode
      lost the tuning setting, which will cause CRC error.
      Signed-off-by: NHaibo Chen <haibo.chen@nxp.com>
      Cc: stable@vger.kernel.org # v4.12+
      Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
      Fixes: d9370424 ("mmc: sdhci-esdhc-imx: reset tuning circuit when power on mmc card")
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2946910e
    • A
      ACPI / device_sysfs: Avoid OF modalias creation for removed device · c19b9673
      Andy Shevchenko 提交于
      commit f16eb8a4b096514ac06fb25bf599dcc792899b3d upstream.
      
      If SSDT overlay is loaded via ConfigFS and then unloaded the device,
      we would like to have OF modalias for, already gone. Thus, acpi_get_name()
      returns no allocated buffer for such case and kernel crashes afterwards:
      
       ACPI: Host-directed Dynamic ACPI Table Unload
       ads7950 spi-PRP0001:00: Dropping the link to regulator.0
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
       #PF error: [normal kernel read fault]
       PGD 80000000070d6067 P4D 80000000070d6067 PUD 70d0067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 0 PID: 40 Comm: kworker/u4:2 Not tainted 5.0.0+ #96
       Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48
       Workqueue: kacpi_hotplug acpi_device_del_work_fn
       RIP: 0010:create_of_modalias.isra.1+0x4c/0x150
       Code: 00 00 48 89 44 24 18 31 c0 48 8d 54 24 08 48 c7 44 24 10 00 00 00 00 48 c7 44 24 08 ff ff ff ff e8 7a b0 03 00 48 8b 4c 24 10 <0f> b6 01 84 c0 74 27 48 c7 c7 00 09 f4 a5 0f b6 f0 8d 50 20 f6 04
       RSP: 0000:ffffa51040297c10 EFLAGS: 00010246
       RAX: 0000000000001001 RBX: 0000000000000785 RCX: 0000000000000000
       RDX: 0000000000001001 RSI: 0000000000000286 RDI: ffffa2163dc042e0
       RBP: ffffa216062b1196 R08: 0000000000001001 R09: ffffa21639873000
       R10: ffffffffa606761d R11: 0000000000000001 R12: ffffa21639873218
       R13: ffffa2163deb5060 R14: ffffa216063d1010 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffffa2163e000000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 0000000007114000 CR4: 00000000001006f0
       Call Trace:
        __acpi_device_uevent_modalias+0xb0/0x100
        spi_uevent+0xd/0x40
      
       ...
      
      In order to fix above let create_of_modalias() check the status returned
      by acpi_get_name() and bail out in case of failure.
      
      Fixes: 8765c5ba ("ACPI / scan: Rework modalias creation when "compatible" is present")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201381Reported-by: NFerry Toth <fntoth@gmail.com>
      Tested-by: Ferry Toth<fntoth@gmail.com>
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: 4.1+ <stable@vger.kernel.org> # 4.1+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c19b9673
    • J
      xen: fix dom0 boot on huge systems · 468ff43f
      Juergen Gross 提交于
      commit 01bd2ac2f55a1916d81dace12fa8d7ae1c79b5ea upstream.
      
      Commit f7c90c2a ("x86/xen: don't write ptes directly in 32-bit
      PV guests") introduced a regression for booting dom0 on huge systems
      with lots of RAM (in the TB range).
      
      Reason is that on those hosts the p2m list needs to be moved early in
      the boot process and this requires temporary page tables to be created.
      Said commit modified xen_set_pte_init() to use a hypercall for writing
      a PTE, but this requires the page table being in the direct mapped
      area, which is not the case for the temporary page tables used in
      xen_relocate_p2m().
      
      As the page tables are completely written before being linked to the
      actual address space instead of set_pte() a plain write to memory can
      be used in xen_relocate_p2m().
      
      Fixes: f7c90c2a ("x86/xen: don't write ptes directly in 32-bit PV guests")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      468ff43f
    • J
      tracing/perf: Use strndup_user() instead of buggy open-coded version · 24d50976
      Jann Horn 提交于
      commit 83540fbc8812a580b6ad8f93f4c29e62e417687e upstream.
      
      The first version of this method was missing the check for
      `ret == PATH_MAX`; then such a check was added, but it didn't call kfree()
      on error, so there was still a small memory leak in the error case.
      Fix it by using strndup_user() instead of open-coding it.
      
      Link: http://lkml.kernel.org/r/20190220165443.152385-1-jannh@google.com
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 0eadcc7a ("perf/core: Fix perf_uprobe_init()")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24d50976
    • Z
      tracing: Do not free iter->trace in fail path of tracing_open_pipe() · f27077e5
      zhangyi (F) 提交于
      commit e7f0c424d0806b05d6f47be9f202b037eb701707 upstream.
      
      Commit d716ff71 ("tracing: Remove taking of trace_types_lock in
      pipe files") use the current tracer instead of the copy in
      tracing_open_pipe(), but it forget to remove the freeing sentence in
      the error path.
      
      There's an error path that can call kfree(iter->trace) after the iter->trace
      was assigned to tr->current_trace, which would be bad to free.
      
      Link: http://lkml.kernel.org/r/1550060946-45984-1-git-send-email-yi.zhang@huawei.com
      
      Cc: stable@vger.kernel.org
      Fixes: d716ff71 ("tracing: Remove taking of trace_types_lock in pipe files")
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f27077e5
    • T
      tracing: Use strncpy instead of memcpy for string keys in hist triggers · ebca08d7
      Tom Zanussi 提交于
      commit 9f0bbf3115ca9f91f43b7c74e9ac7d79f47fc6c2 upstream.
      
      Because there may be random garbage beyond a string's null terminator,
      it's not correct to copy the the complete character array for use as a
      hist trigger key.  This results in multiple histogram entries for the
      'same' string key.
      
      So, in the case of a string key, use strncpy instead of memcpy to
      avoid copying in the extra bytes.
      
      Before, using the gdbus entries in the following hist trigger as an
      example:
      
        # echo 'hist:key=comm' > /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
        # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist
      
        ...
      
        { comm: ImgDecoder #4                      } hitcount:        203
        { comm: gmain                              } hitcount:        213
        { comm: gmain                              } hitcount:        216
        { comm: StreamTrans #73                    } hitcount:        221
        { comm: mozStorage #3                      } hitcount:        230
        { comm: gdbus                              } hitcount:        233
        { comm: StyleThread#5                      } hitcount:        253
        { comm: gdbus                              } hitcount:        256
        { comm: gdbus                              } hitcount:        260
        { comm: StyleThread#4                      } hitcount:        271
      
        ...
      
        # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
        51
      
      After:
      
        # cat /sys/kernel/debug/tracing/events/sched/sched_waking/hist | egrep gdbus | wc -l
        1
      
      Link: http://lkml.kernel.org/r/50c35ae1267d64eee975b8125e151e600071d4dc.1549309756.git.tom.zanussi@linux.intel.com
      
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 79e577cb ("tracing: Support string type key properly")
      Signed-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ebca08d7
    • P
      CIFS: Fix read after write for files with read caching · 43eaa6cc
      Pavel Shilovsky 提交于
      commit 6dfbd84684700cb58b34e8602c01c12f3d2595c8 upstream.
      
      When we have a READ lease for a file and have just issued a write
      operation to the server we need to purge the cache and set oplock/lease
      level to NONE to avoid reading stale data. Currently we do that
      only if a write operation succedeed thus not covering cases when
      a request was sent to the server but a negative error code was
      returned later for some other reasons (e.g. -EIOCBQUEUED or -EINTR).
      Fix this by turning off caching regardless of the error code being
      returned.
      
      The patches fixes generic tests 075 and 112 from the xfs-tests.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43eaa6cc
    • P
      CIFS: Do not skip SMB2 message IDs on send failures · dc8e8ad9
      Pavel Shilovsky 提交于
      commit c781af7e0c1fed9f1d0e0ec31b86f5b21a8dca17 upstream.
      
      When we hit failures during constructing MIDs or sending PDUs
      through the network, we end up not using message IDs assigned
      to the packet. The next SMB packet will skip those message IDs
      and continue with the next one. This behavior may lead to a server
      not granting us credits until we use the skipped IDs. Fix this by
      reverting the current ID to the original value if any errors occur
      before we push the packet through the network stack.
      
      This patch fixes the generic/310 test from the xfs-tests.
      
      Cc: <stable@vger.kernel.org> # 4.19.x
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc8e8ad9
    • P
      CIFS: Do not reset lease state to NONE on lease break · 3ed9f22e
      Pavel Shilovsky 提交于
      commit 7b9b9edb49ad377b1e06abf14354c227e9ac4b06 upstream.
      
      Currently on lease break the client sets a caching level twice:
      when oplock is detected and when oplock is processed. While the
      1st attempt sets the level to the value provided by the server,
      the 2nd one resets the level to None unconditionally.
      This happens because the oplock/lease processing code was changed
      to avoid races between page cache flushes and oplock breaks.
      The commit c11f1df5 ("cifs: Wait for writebacks to complete
      before attempting write.") fixed the races for oplocks but didn't
      apply the same changes for leases resulting in overwriting the
      server granted value to None. Fix this by properly processing
      lease breaks.
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ed9f22e
    • A
      crypto: arm64/aes-ccm - fix bugs in non-NEON fallback routine · 41e2d1c4
      Ard Biesheuvel 提交于
      commit 969e2f59d589c15f6aaf306e590dde16f12ea4b3 upstream.
      
      Commit 5092fcf3 ("crypto: arm64/aes-ce-ccm: add non-SIMD generic
      fallback") introduced C fallback code to replace the NEON routines
      when invoked from a context where the NEON is not available (i.e.,
      from the context of a softirq taken while the NEON is already being
      used in kernel process context)
      
      Fix two logical flaws in the MAC calculation of the associated data.
      Reported-by: NEric Biggers <ebiggers@kernel.org>
      Fixes: 5092fcf3 ("crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback")
      Cc: stable@vger.kernel.org
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41e2d1c4
    • A
      crypto: arm64/aes-ccm - fix logical bug in AAD MAC handling · d5a5bded
      Ard Biesheuvel 提交于
      commit eaf46edf6ea89675bd36245369c8de5063a0272c upstream.
      
      The NEON MAC calculation routine fails to handle the case correctly
      where there is some data in the buffer, and the input fills it up
      exactly. In this case, we enter the loop at the end with w8 == 0,
      while a negative value is assumed, and so the loop carries on until
      the increment of the 32-bit counter wraps around, which is quite
      obviously wrong.
      
      So omit the loop altogether in this case, and exit right away.
      Reported-by: NEric Biggers <ebiggers@kernel.org>
      Fixes: a3fd8210 ("arm64/crypto: AES in CCM mode using ARMv8 Crypto ...")
      Cc: stable@vger.kernel.org
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5a5bded
    • E
      crypto: x86/morus - fix handling chunked inputs and MAY_SLEEP · 66700c89
      Eric Biggers 提交于
      commit 2060e284e9595fc3baed6e035903c05b93266555 upstream.
      
      The x86 MORUS implementations all fail the improved AEAD tests because
      they produce the wrong result with some data layouts.  The issue is that
      they assume that if the skcipher_walk API gives 'nbytes' not aligned to
      the walksize (a.k.a. walk.stride), then it is the end of the data.  In
      fact, this can happen before the end.
      
      Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
      incorrectly sleep in the skcipher_walk_*() functions while preemption
      has been disabled by kernel_fpu_begin().
      
      Fix these bugs.
      
      Fixes: 56e8e57f ("crypto: morus - Add common SIMD glue code for MORUS")
      Cc: <stable@vger.kernel.org> # v4.18+
      Cc: Ondrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66700c89
    • E
      crypto: x86/aesni-gcm - fix crash on empty plaintext · 8a9fcf4a
      Eric Biggers 提交于
      commit 3af349639597fea582a93604734d717e59a0e223 upstream.
      
      gcmaes_crypt_by_sg() dereferences the NULL pointer returned by
      scatterwalk_ffwd() when encrypting an empty plaintext and the source
      scatterlist ends immediately after the associated data.
      
      Fix it by only fast-forwarding to the src/dst data scatterlists if the
      data length is nonzero.
      
      This bug is reproduced by the "rfc4543(gcm(aes))" test vectors when run
      with the new AEAD test manager.
      
      Fixes: e8455207 ("crypto: aesni - Update aesni-intel_glue to use scatter/gather")
      Cc: <stable@vger.kernel.org> # v4.17+
      Cc: Dave Watson <davejwatson@fb.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a9fcf4a
    • E
      crypto: x86/aegis - fix handling chunked inputs and MAY_SLEEP · 5d2a5172
      Eric Biggers 提交于
      commit ba6771c0a0bc2fac9d6a8759bab8493bd1cffe3b upstream.
      
      The x86 AEGIS implementations all fail the improved AEAD tests because
      they produce the wrong result with some data layouts.  The issue is that
      they assume that if the skcipher_walk API gives 'nbytes' not aligned to
      the walksize (a.k.a. walk.stride), then it is the end of the data.  In
      fact, this can happen before the end.
      
      Also, when the CRYPTO_TFM_REQ_MAY_SLEEP flag is given, they can
      incorrectly sleep in the skcipher_walk_*() functions while preemption
      has been disabled by kernel_fpu_begin().
      
      Fix these bugs.
      
      Fixes: 1d373d4e ("crypto: x86 - Add optimized AEGIS implementations")
      Cc: <stable@vger.kernel.org> # v4.18+
      Cc: Ondrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5d2a5172
    • E
      crypto: testmgr - skip crc32c context test for ahash algorithms · 574c19d9
      Eric Biggers 提交于
      commit eb5e6730db98fcc4b51148b4a819fa4bf864ae54 upstream.
      
      Instantiating "cryptd(crc32c)" causes a crypto self-test failure because
      the crypto_alloc_shash() in alg_test_crc32c() fails.  This is because
      cryptd(crc32c) is an ahash algorithm, not a shash algorithm; so it can
      only be accessed through the ahash API, unlike shash algorithms which
      can be accessed through both the ahash and shash APIs.
      
      As the test is testing the shash descriptor format which is only
      applicable to shash algorithms, skip it for ahash algorithms.
      
      (Note that it's still important to fix crypto self-test failures even
       for weird algorithm instantiations like cryptd(crc32c) that no one
       would really use; in fips_enabled mode unprivileged users can use them
       to panic the kernel, and also they prevent treating a crypto self-test
       failure as a bug when fuzzing the kernel.)
      
      Fixes: 8e3ee85e ("crypto: crc32c - Test descriptor context format")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      574c19d9
    • E
      crypto: skcipher - set CRYPTO_TFM_NEED_KEY if ->setkey() fails · e6c703f1
      Eric Biggers 提交于
      commit b1f6b4bf416b49f00f3abc49c639371cdecaaad1 upstream.
      
      Some algorithms have a ->setkey() method that is not atomic, in the
      sense that setting a key can fail after changes were already made to the
      tfm context.  In this case, if a key was already set the tfm can end up
      in a state that corresponds to neither the old key nor the new key.
      
      For example, in lrw.c, if gf128mul_init_64k_bbe() fails due to lack of
      memory, then priv::table will be left NULL.  After that, encryption with
      that tfm will cause a NULL pointer dereference.
      
      It's not feasible to make all ->setkey() methods atomic, especially ones
      that have to key multiple sub-tfms.  Therefore, make the crypto API set
      CRYPTO_TFM_NEED_KEY if ->setkey() fails and the algorithm requires a
      key, to prevent the tfm from being used until a new key is set.
      
      [Cc stable mainly because when introducing the NEED_KEY flag I changed
       AF_ALG to rely on it; and unlike in-kernel crypto API users, AF_ALG
       previously didn't have this problem.  So these "incompletely keyed"
       states became theoretically accessible via AF_ALG -- though, the
       opportunities for causing real mischief seem pretty limited.]
      
      Fixes: f8d33fac ("crypto: skcipher - prevent using skciphers without setting key")
      Cc: <stable@vger.kernel.org> # v4.16+
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6c703f1
    • E
      crypto: pcbc - remove bogus memcpy()s with src == dest · bb1ae0aa
      Eric Biggers 提交于
      commit 251b7aea34ba3c4d4fdfa9447695642eb8b8b098 upstream.
      
      The memcpy()s in the PCBC implementation use walk->iv as both the source
      and destination, which has undefined behavior.  These memcpy()'s are
      actually unneeded, because walk->iv is already used to hold the previous
      plaintext block XOR'd with the previous ciphertext block.  Thus,
      walk->iv is already updated to its final value.
      
      So remove the broken and unnecessary memcpy()s.
      
      Fixes: 91652be5 ("[CRYPTO] pcbc: Add Propagated CBC template")
      Cc: <stable@vger.kernel.org> # v2.6.21+
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb1ae0aa
    • E
      crypto: morus - fix handling chunked inputs · c0bfdac6
      Eric Biggers 提交于
      commit d644f1c8746ed24f81075480f9e9cb3777ae8d65 upstream.
      
      The generic MORUS implementations all fail the improved AEAD tests
      because they produce the wrong result with some data layouts.  The issue
      is that they assume that if the skcipher_walk API gives 'nbytes' not
      aligned to the walksize (a.k.a. walk.stride), then it is the end of the
      data.  In fact, this can happen before the end.  Fix them.
      
      Fixes: 396be41f ("crypto: morus - Add generic MORUS AEAD implementations")
      Cc: <stable@vger.kernel.org> # v4.18+
      Cc: Ondrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c0bfdac6
    • E
      crypto: hash - set CRYPTO_TFM_NEED_KEY if ->setkey() fails · dc410d2d
      Eric Biggers 提交于
      commit ba7d7433a0e998c902132bd47330e355a1eaa894 upstream.
      
      Some algorithms have a ->setkey() method that is not atomic, in the
      sense that setting a key can fail after changes were already made to the
      tfm context.  In this case, if a key was already set the tfm can end up
      in a state that corresponds to neither the old key nor the new key.
      
      It's not feasible to make all ->setkey() methods atomic, especially ones
      that have to key multiple sub-tfms.  Therefore, make the crypto API set
      CRYPTO_TFM_NEED_KEY if ->setkey() fails and the algorithm requires a
      key, to prevent the tfm from being used until a new key is set.
      
      Note: we can't set CRYPTO_TFM_NEED_KEY for OPTIONAL_KEY algorithms, so
      ->setkey() for those must nevertheless be atomic.  That's fine for now
      since only the crc32 and crc32c algorithms set OPTIONAL_KEY, and it's
      not intended that OPTIONAL_KEY be used much.
      
      [Cc stable mainly because when introducing the NEED_KEY flag I changed
       AF_ALG to rely on it; and unlike in-kernel crypto API users, AF_ALG
       previously didn't have this problem.  So these "incompletely keyed"
       states became theoretically accessible via AF_ALG -- though, the
       opportunities for causing real mischief seem pretty limited.]
      
      Fixes: 9fa68f62 ("crypto: hash - prevent using keyed hashes without setting key")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc410d2d
    • A
      crypto: arm64/crct10dif - revert to C code for short inputs · 76f21678
      Ard Biesheuvel 提交于
      commit d72b9d4acd548251f55b16843fc7a05dc5c80de8 upstream.
      
      The SIMD routine ported from x86 used to have a special code path
      for inputs < 16 bytes, which got lost somewhere along the way.
      Instead, the current glue code aligns the input pointer to 16 bytes,
      which is not really necessary on this architecture (although it
      could be beneficial to performance to expose aligned data to the
      the NEON routine), but this could result in inputs of less than
      16 bytes to be passed in. This not only fails the new extended
      tests that Eric has implemented, it also results in the code
      reading past the end of the input, which could potentially result
      in crashes when dealing with less than 16 bytes of input at the
      end of a page which is followed by an unmapped page.
      
      So update the glue code to only invoke the NEON routine if the
      input is at least 16 bytes.
      Reported-by: NEric Biggers <ebiggers@kernel.org>
      Reviewed-by: NEric Biggers <ebiggers@kernel.org>
      Fixes: 6ef5737f ("crypto: arm64/crct10dif - port x86 SSE implementation to arm64")
      Cc: <stable@vger.kernel.org> # v4.10+
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76f21678
    • E
      crypto: arm64/aes-neonbs - fix returning final keystream block · 4bca5a9a
      Eric Biggers 提交于
      commit 12455e320e19e9cc7ad97f4ab89c280fe297387c upstream.
      
      The arm64 NEON bit-sliced implementation of AES-CTR fails the improved
      skcipher tests because it sometimes produces the wrong ciphertext.  The
      bug is that the final keystream block isn't returned from the assembly
      code when the number of non-final blocks is zero.  This can happen if
      the input data ends a few bytes after a page boundary.  In this case the
      last bytes get "encrypted" by XOR'ing them with uninitialized memory.
      
      Fix the assembly code to return the final keystream block when needed.
      
      Fixes: 88a3f582 ("crypto: arm64/aes - don't use IV buffer to return final keystream block")
      Cc: <stable@vger.kernel.org> # v4.11+
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4bca5a9a
    • A
      crypto: arm/crct10dif - revert to C code for short inputs · 0beb34b8
      Ard Biesheuvel 提交于
      commit 62fecf295e3c48be1b5f17c440b93875b9adb4d6 upstream.
      
      The SIMD routine ported from x86 used to have a special code path
      for inputs < 16 bytes, which got lost somewhere along the way.
      Instead, the current glue code aligns the input pointer to permit
      the NEON routine to use special versions of the vld1 instructions
      that assume 16 byte alignment, but this could result in inputs of
      less than 16 bytes to be passed in. This not only fails the new
      extended tests that Eric has implemented, it also results in the
      code reading past the end of the input, which could potentially
      result in crashes when dealing with less than 16 bytes of input
      at the end of a page which is followed by an unmapped page.
      
      So update the glue code to only invoke the NEON routine if the
      input is at least 16 bytes.
      Reported-by: NEric Biggers <ebiggers@kernel.org>
      Reviewed-by: NEric Biggers <ebiggers@kernel.org>
      Fixes: 1d481f1c ("crypto: arm/crct10dif - port x86 SSE implementation to ARM")
      Cc: <stable@vger.kernel.org> # v4.10+
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0beb34b8
    • E
      crypto: aegis - fix handling chunked inputs · 4c152af9
      Eric Biggers 提交于
      commit 0f533e67d26f228ea5dfdacc8a4bdeb487af5208 upstream.
      
      The generic AEGIS implementations all fail the improved AEAD tests
      because they produce the wrong result with some data layouts.  The issue
      is that they assume that if the skcipher_walk API gives 'nbytes' not
      aligned to the walksize (a.k.a. walk.stride), then it is the end of the
      data.  In fact, this can happen before the end.  Fix them.
      
      Fixes: f606a88e ("crypto: aegis - Add generic AEGIS AEAD implementations")
      Cc: <stable@vger.kernel.org> # v4.18+
      Cc: Ondrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c152af9
    • E
      crypto: aead - set CRYPTO_TFM_NEED_KEY if ->setkey() fails · 736807d6
      Eric Biggers 提交于
      commit 6ebc97006b196aafa9df0497fdfa866cf26f259b upstream.
      
      Some algorithms have a ->setkey() method that is not atomic, in the
      sense that setting a key can fail after changes were already made to the
      tfm context.  In this case, if a key was already set the tfm can end up
      in a state that corresponds to neither the old key nor the new key.
      
      For example, in gcm.c, if the kzalloc() fails due to lack of memory,
      then the CTR part of GCM will have the new key but GHASH will not.
      
      It's not feasible to make all ->setkey() methods atomic, especially ones
      that have to key multiple sub-tfms.  Therefore, make the crypto API set
      CRYPTO_TFM_NEED_KEY if ->setkey() fails, to prevent the tfm from being
      used until a new key is set.
      
      [Cc stable mainly because when introducing the NEED_KEY flag I changed
       AF_ALG to rely on it; and unlike in-kernel crypto API users, AF_ALG
       previously didn't have this problem.  So these "incompletely keyed"
       states became theoretically accessible via AF_ALG -- though, the
       opportunities for causing real mischief seem pretty limited.]
      
      Fixes: dc26c17f ("crypto: aead - prevent using AEADs without setting key")
      Cc: <stable@vger.kernel.org> # v4.16+
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      736807d6
    • A
      fix cgroup_do_mount() handling of failure exits · 7a8b0484
      Al Viro 提交于
      commit 399504e21a10be16dd1408ba0147367d9d82a10c upstream.
      
      same story as with last May fixes in sysfs (7b745a4e
      "unfuck sysfs_mount()"); new_sb is left uninitialized
      in case of early errors in kernfs_mount_ns() and papering
      over it by treating any error from kernfs_mount_ns() as
      equivalent to !new_ns ends up conflating the cases when
      objects had never been transferred to a superblock with
      ones when that has happened and resulting new superblock
      had been dropped.  Easily fixed (same way as in sysfs
      case).  Additionally, there's a superblock leak on
      kernfs_node_dentry() failure *and* a dentry leak inside
      kernfs_node_dentry() itself - the latter on probably
      impossible errors, but the former not impossible to trigger
      (as the matter of fact, injecting allocation failures
      at that point *does* trigger it).
      
      Cc: stable@kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a8b0484
    • O
      libnvdimm: Fix altmap reservation size calculation · 3b8da135
      Oliver O'Halloran 提交于
      commit 07464e88365e9236febaca9ed1a2e2006d8bc952 upstream.
      
      Libnvdimm reserves the first 8K of pfn and devicedax namespaces to
      store a superblock describing the namespace. This 8K reservation
      is contained within the altmap area which the kernel uses for the
      vmemmap backing for the pages within the namespace. The altmap
      allows for some pages at the start of the altmap area to be reserved
      and that mechanism is used to protect the superblock from being
      re-used as vmemmap backing.
      
      The number of PFNs to reserve is calculated using:
      
      	PHYS_PFN(SZ_8K)
      
      Which is implemented as:
      
       #define PHYS_PFN(x) ((unsigned long)((x) >> PAGE_SHIFT))
      
      So on systems where PAGE_SIZE is greater than 8K the reservation
      size is truncated to zero and the superblock area is re-used as
      vmemmap backing. As a result all the namespace information stored
      in the superblock (i.e. if it's a PFN or DAX namespace) is lost
      and the namespace needs to be re-created to get access to the
      contents.
      
      This patch fixes this by using PFN_UP() rather than PHYS_PFN() to ensure
      that at least one page is reserved. On systems with a 4K pages size this
      patch should have no effect.
      
      Cc: stable@vger.kernel.org
      Cc: Dan Williams <dan.j.williams@intel.com>
      Fixes: ac515c08 ("libnvdimm, pmem, pfn: move pfn setup to the core")
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b8da135
    • D
      libnvdimm/pmem: Honor force_raw for legacy pmem regions · 696c3752
      Dan Williams 提交于
      commit fa7d2e639cd90442d868dfc6ca1d4cc9d8bf206e upstream.
      
      For recovery, where non-dax access is needed to a given physical address
      range, and testing, allow the 'force_raw' attribute to override the
      default establishment of a dev_pagemap.
      
      Otherwise without this capability it is possible to end up with a
      namespace that can not be activated due to corrupted info-block, and one
      that can not be repaired due to a section collision.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 004f1afb ("libnvdimm, pmem: direct map legacy pmem by default")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      696c3752
    • W
      libnvdimm, pfn: Fix over-trim in trim_pfn_device() · 6a89ed7a
      Wei Yang 提交于
      commit f101ada7da6551127d192c2f1742c1e9e0f62799 upstream.
      
      When trying to see whether current nd_region intersects with others,
      trim_pfn_device() has already calculated the *size* to be expanded to
      SECTION size.
      
      Do not double append 'adjust' to 'size' when calculating whether the end
      of a region collides with the next pmem region.
      
      Fixes: ae86cbfef381 "libnvdimm, pfn: Pad pfn namespaces relative to other regions"
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NWei Yang <richardw.yang@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a89ed7a
    • D
      libnvdimm/label: Clear 'updating' flag after label-set update · 2b88d92e
      Dan Williams 提交于
      commit 966d23a006ca7b44ac8cf4d0c96b19785e0c3da0 upstream.
      
      The UEFI 2.7 specification sets expectations that the 'updating' flag is
      eventually cleared. To date, the libnvdimm core has never adhered to
      that protocol. The policy of the core matches the policy of other
      multi-device info-block formats like MD-Software-RAID that expect
      administrator intervention on inconsistent info-blocks, not automatic
      invalidation.
      
      However, some pre-boot environments may unfortunately attempt to "clean
      up" the labels and invalidate a set when it fails to find at least one
      "non-updating" label in the set. Clear the updating flag after set
      updates to minimize the window of vulnerability to aggressive pre-boot
      environments.
      
      Ideally implementations would not write to the label area outside of
      creating namespaces.
      
      Note that this only minimizes the window, it does not close it as the
      system can still crash while clearing the flag and the set can be
      subsequently deleted / invalidated by the pre-boot environment.
      
      Fixes: f524bf27 ("libnvdimm: write pmem label set")
      Cc: <stable@vger.kernel.org>
      Cc: Kelly Couch <kelly.j.couch@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b88d92e
    • D
      nfit/ars: Attempt short-ARS even in the no_init_ars case · f4dfb94a
      Dan Williams 提交于
      commit fa3ed4d981b1fc19acdd07fcb152a4bd3706892b upstream.
      
      The no_init_ars option is meant to prevent long-ARS, but short-ARS
      should be allowed to grab any immediate results.
      
      Fixes: bc6ba808 ("nfit, address-range-scrub: rework and simplify ARS...")
      Cc: <stable@vger.kernel.org>
      Reported-by: NErwin Tsaur <erwin.tsaur@oracle.com>
      Reviewed-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4dfb94a