1. 07 8月, 2019 40 次提交
    • T
      x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS · b88241ae
      Thomas Gleixner 提交于
      commit f36cf386e3fec258a341d446915862eded3e13d8 upstream
      
      Intel provided the following information:
      
       On all current Atom processors, instructions that use a segment register
       value (e.g. a load or store) will not speculatively execute before the
       last writer of that segment retires. Thus they will not use a
       speculatively written segment value.
      
      That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS
      entry paths can be excluded from the extra LFENCE if PTI is disabled.
      
      Create a separate bug flag for the through SWAPGS speculation and mark all
      out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs
      are excluded from the whole mitigation mess anyway.
      Reported-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b88241ae
    • J
      x86/entry/64: Use JMP instead of JMPQ · 931b6bfe
      Josh Poimboeuf 提交于
      commit 64dbc122b20f75183d8822618c24f85144a5a94d upstream
      
      Somehow the swapgs mitigation entry code patch ended up with a JMPQ
      instruction instead of JMP, where only the short jump is needed.  Some
      assembler versions apparently fail to optimize JMPQ into a two-byte JMP
      when possible, instead always using a 7-byte JMP with relocation.  For
      some reason that makes the entry code explode with a #GP during boot.
      
      Change it back to "JMP" as originally intended.
      
      Fixes: 18ec54fdd6d1 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations")
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      931b6bfe
    • J
      x86/speculation: Enable Spectre v1 swapgs mitigations · 23e7a7b3
      Josh Poimboeuf 提交于
      commit a2059825986a1c8143fd6698774fa9d83733bb11 upstream
      
      The previous commit added macro calls in the entry code which mitigate the
      Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are
      enabled.  Enable those features where applicable.
      
      The mitigations may be disabled with "nospectre_v1" or "mitigations=off".
      
      There are different features which can affect the risk of attack:
      
      - When FSGSBASE is enabled, unprivileged users are able to place any
        value in GS, using the wrgsbase instruction.  This means they can
        write a GS value which points to any value in kernel space, which can
        be useful with the following gadget in an interrupt/exception/NMI
        handler:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	// dependent load or store based on the value of %reg
      	// for example: mov %(reg1), %reg2
      
        If an interrupt is coming from user space, and the entry code
        speculatively skips the swapgs (due to user branch mistraining), it
        may speculatively execute the GS-based load and a subsequent dependent
        load or store, exposing the kernel data to an L1 side channel leak.
      
        Note that, on Intel, a similar attack exists in the above gadget when
        coming from kernel space, if the swapgs gets speculatively executed to
        switch back to the user GS.  On AMD, this variant isn't possible
        because swapgs is serializing with respect to future GS-based
        accesses.
      
        NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case
      	doesn't exist quite yet.
      
      - When FSGSBASE is disabled, the issue is mitigated somewhat because
        unprivileged users must use prctl(ARCH_SET_GS) to set GS, which
        restricts GS values to user space addresses only.  That means the
        gadget would need an additional step, since the target kernel address
        needs to be read from user space first.  Something like:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	mov (%reg1), %reg2
      	// dependent load or store based on the value of %reg2
      	// for example: mov %(reg2), %reg3
      
        It's difficult to audit for this gadget in all the handlers, so while
        there are no known instances of it, it's entirely possible that it
        exists somewhere (or could be introduced in the future).  Without
        tooling to analyze all such code paths, consider it vulnerable.
      
        Effects of SMAP on the !FSGSBASE case:
      
        - If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not
          susceptible to Meltdown), the kernel is prevented from speculatively
          reading user space memory, even L1 cached values.  This effectively
          disables the !FSGSBASE attack vector.
      
        - If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP
          still prevents the kernel from speculatively reading user space
          memory.  But it does *not* prevent the kernel from reading the
          user value from L1, if it has already been cached.  This is probably
          only a small hurdle for an attacker to overcome.
      
      Thanks to Dave Hansen for contributing the speculative_smap() function.
      
      Thanks to Andrew Cooper for providing the inside scoop on whether swapgs
      is serializing on AMD.
      
      [ tglx: Fixed the USER fence decision and polished the comment as suggested
        	by Dave Hansen ]
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23e7a7b3
    • J
      x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations · befb822c
      Josh Poimboeuf 提交于
      commit 18ec54fdd6d18d92025af097cd042a75cf0ea24c upstream
      
      Spectre v1 isn't only about array bounds checks.  It can affect any
      conditional checks.  The kernel entry code interrupt, exception, and NMI
      handlers all have conditional swapgs checks.  Those may be problematic in
      the context of Spectre v1, as kernel code can speculatively run with a user
      GS.
      
      For example:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg
      	mov (%reg), %reg1
      
      When coming from user space, the CPU can speculatively skip the swapgs, and
      then do a speculative percpu load using the user GS value.  So the user can
      speculatively force a read of any kernel value.  If a gadget exists which
      uses the percpu value as an address in another load/store, then the
      contents of the kernel value may become visible via an L1 side channel
      attack.
      
      A similar attack exists when coming from kernel space.  The CPU can
      speculatively do the swapgs, causing the user GS to get used for the rest
      of the speculative window.
      
      The mitigation is similar to a traditional Spectre v1 mitigation, except:
      
        a) index masking isn't possible; because the index (percpu offset)
           isn't user-controlled; and
      
        b) an lfence is needed in both the "from user" swapgs path and the
           "from kernel" non-swapgs path (because of the two attacks described
           above).
      
      The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a
      CR3 write when PTI is enabled.  Since CR3 writes are serializing, the
      lfences can be skipped in those cases.
      
      On the other hand, the kernel entry swapgs paths don't depend on PTI.
      
      To avoid unnecessary lfences for the user entry case, create two separate
      features for alternative patching:
      
        X86_FEATURE_FENCE_SWAPGS_USER
        X86_FEATURE_FENCE_SWAPGS_KERNEL
      
      Use these features in entry code to patch in lfences where needed.
      
      The features aren't enabled yet, so there's no functional change.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      befb822c
    • F
      x86/cpufeatures: Combine word 11 and 12 into a new scattered features word · b5dd7f61
      Fenghua Yu 提交于
      commit acec0ce081de0c36459eea91647faf99296445a3 upstream
      
      It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
      whole feature bits words. To better utilize feature words, re-define
      word 11 to host scattered features and move the four X86_FEATURE_CQM_*
      features into Linux defined word 11. More scattered features can be
      added in word 11 in the future.
      
      Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a
      Linux-defined leaf.
      
      Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful
      name in the next patch when CPUID.7.1:EAX occupies world 12.
      
      Maximum number of RMID and cache occupancy scale are retrieved from
      CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
      code into a separate function.
      
      KVM doesn't support resctrl now. So it's safe to move the
      X86_FEATURE_CQM_* features to scattered features word 11 for KVM.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Aaron Lewis <aaronlewis@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Babu Moger <babu.moger@amd.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: x86 <x86@kernel.org>
      Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua.yu@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5dd7f61
    • B
      x86/cpufeatures: Carve out CQM features retrieval · 16ad0b63
      Borislav Petkov 提交于
      commit 45fc56e629caa451467e7664fbd4c797c434a6c4 upstream
      
      ... into a separate function for better readability. Split out from a
      patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical,
      sole code movement separate for easy review.
      
      No functional changes.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: x86@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16ad0b63
    • S
      scsi: mpt3sas: Use 63-bit DMA addressing on SAS35 HBA · 9e034c61
      Suganath Prabu 提交于
      commit df9a606184bfdb5ae3ca9d226184e9489f5c24f7 upstream.
      
      Although SAS3 & SAS3.5 IT HBA controllers support 64-bit DMA addressing, as
      per hardware design, if DMA-able range contains all 64-bits
      set (0xFFFFFFFF-FFFFFFFF) then it results in a firmware fault.
      
      E.g. SGE's start address is 0xFFFFFFFF-FFFF000 and data length is 0x1000
      bytes. when HBA tries to DMA the data at 0xFFFFFFFF-FFFFFFFF location then
      HBA will fault the firmware.
      
      Driver will set 63-bit DMA mask to ensure the above address will not be
      used.
      
      Cc: <stable@vger.kernel.org> # 4.19.63
      Signed-off-by: NSuganath Prabu <suganath-prabu.subramani@broadcom.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e034c61
    • A
      x86/vdso: Prevent segfaults due to hoisted vclock reads · 3732a473
      Andy Lutomirski 提交于
      commit ff17bbe0bb405ad8b36e55815d381841f9fdeebc upstream.
      
      GCC 5.5.0 sometimes cleverly hoists reads of the pvclock and/or hvclock
      pages before the vclock mode checks.  This creates a path through
      vclock_gettime() in which no vclock is enabled at all (due to disabled
      TSC on old CPUs, for example) but the pvclock or hvclock page
      nevertheless read.  This will segfault on bare metal.
      
      This fixes commit 459e3a21535a ("gcc-9: properly declare the
      {pv,hv}clock_page storage") in the sense that, before that commit, GCC
      didn't seem to generate the offending code.  There was nothing wrong
      with that commit per se, and -stable maintainers should backport this to
      all supported kernels regardless of whether the offending commit was
      present, since the same crash could just as easily be triggered by the
      phase of the moon.
      
      On GCC 9.1.1, this doesn't seem to affect the generated code at all, so
      I'm not too concerned about performance regressions from this fix.
      
      Cc: stable@vger.kernel.org
      Cc: x86@kernel.org
      Cc: Borislav Petkov <bp@alien8.de>
      Reported-by: NDuncan Roe <duncan_roe@optusnet.com.au>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3732a473
    • L
      gcc-9: properly declare the {pv,hv}clock_page storage · 8320768d
      Linus Torvalds 提交于
      commit 459e3a21535ae3c7a9a123650e54f5c882b8fcbf upstream.
      
      The pvlock_page and hvclock_page variables are (as the name implies)
      addresses to pages, created by the linker script.
      
      But we declared them as just "extern u8" variables, which _works_, but
      now that gcc does some more bounds checking, it causes warnings like
      
          warning: array subscript 1 is outside array bounds of ‘u8[1]’
      
      when we then access more than one byte from those variables.
      
      Fix this by simply making the declaration of the variables match
      reality, which makes the compiler happy too.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8320768d
    • J
      objtool: Support GCC 9 cold subfunction naming scheme · 354887ae
      Josh Poimboeuf 提交于
      commit bcb6fb5da77c2a228adf07cc9cb1a0c2aa2001c6 upstream.
      
      Starting with GCC 8, a lot of unlikely code was moved out of line to
      "cold" subfunctions in .text.unlikely.
      
      For example, the unlikely bits of:
      
        irq_do_set_affinity()
      
      are moved out to the following subfunction:
      
        irq_do_set_affinity.cold.49()
      
      Starting with GCC 9, the numbered suffix has been removed.  So in the
      above example, the cold subfunction is instead:
      
        irq_do_set_affinity.cold()
      
      Tweak the objtool subfunction detection logic so that it detects both
      GCC 8 and GCC 9 naming schemes.
      Reported-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/015e9544b1f188d36a7f02fa31e9e95629aa5f50.1541040800.git.jpoimboe@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      354887ae
    • E
      ARC: enable uboot support unconditionally · 89f3896b
      Eugeniy Paltsev 提交于
      commit 493a2f812446e92bcb1e69a77381b4d39808d730 upstream.
      
      After reworking U-boot args handling code and adding paranoid
      arguments check we can eliminate CONFIG_ARC_UBOOT_SUPPORT and
      enable uboot support unconditionally.
      
      For JTAG case we can assume that core registers will come up
      reset value of 0 or in worst case we rely on user passing
      '-on=clear_regs' to Metaware debugger.
      
      Cc: stable@vger.kernel.org
      Tested-by: NCorentin LABBE <clabbe@baylibre.com>
      Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89f3896b
    • J
      eeprom: at24: make spd world-readable again · 8dd37627
      Jean Delvare 提交于
      commit 25e5ef302c24a6fead369c0cfe88c073d7b97ca8 upstream.
      
      The integration of the at24 driver into the nvmem framework broke the
      world-readability of spd EEPROMs. Fix it.
      Signed-off-by: NJean Delvare <jdelvare@suse.de>
      Cc: stable@vger.kernel.org
      Fixes: 57d15550 ("eeprom: at24: extend driver to plug into the NVMEM framework")
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Bartosz Golaszewski <brgl@bgdev.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      [Bartosz: backported to v4.19.y]
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dd37627
    • X
      drm/i915/gvt: fix incorrect cache entry for guest page mapping · a7340d31
      Xiaolin Zhang 提交于
      commit 7366aeb77cd840f3edea02c65065d40affaa7f45 upstream.
      
      GPU hang observed during the guest OCL conformance test which is caused
      by THP GTT feature used durning the test.
      
      It was observed the same GFN with different size (4K and 2M) requested
      from the guest in GVT. So during the guest page dma map stage, it is
      required to unmap first with orginal size and then remap again with
      requested size.
      
      Fixes: b901b252 ("drm/i915/gvt: Add 2M huge gtt support")
      Cc: stable@vger.kernel.org
      Reviewed-by: NZhenyu Wang <zhenyuw@linux.intel.com>
      Signed-off-by: NXiaolin Zhang <xiaolin.zhang@intel.com>
      Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7340d31
    • J
      IB/hfi1: Check for error on call to alloc_rsm_map_table · a1c020ce
      John Fleck 提交于
      commit cd48a82087231fdba0e77521102386c6ed0168d6 upstream.
      
      The call to alloc_rsm_map_table does not check if the kmalloc fails.
      Check for a NULL on alloc, and bail if it fails.
      
      Fixes: 372cc85a ("IB/hfi1: Extract RSM map table init from QOS")
      Link: https://lore.kernel.org/r/20190715164521.74174.27047.stgit@awfm-01.aw.intel.com
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NJohn Fleck <john.fleck@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1c020ce
    • Y
      IB/mlx5: Fix RSS Toeplitz setup to be aligned with the HW specification · e9cd4962
      Yishai Hadas 提交于
      commit b7165bd0d6cbb93732559be6ea8774653b204480 upstream.
      
      The specification for the Toeplitz function doesn't require to set the key
      explicitly to be symmetric. In case a symmetric functionality is required
      a symmetric key can be simply used.
      
      Wrongly forcing the algorithm to symmetric causes the wrong packet
      distribution and a performance degradation.
      
      Link: https://lore.kernel.org/r/20190723065733.4899-7-leon@kernel.org
      Cc: <stable@vger.kernel.org> # 4.7
      Fixes: 28d61370 ("IB/mlx5: Add RSS QP support")
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NAlex Vainman <alexv@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9cd4962
    • Y
      IB/mlx5: Fix clean_mr() to work in the expected order · 924308d2
      Yishai Hadas 提交于
      commit b9332dad987018745a0c0bb718d12dacfa760489 upstream.
      
      Any dma map underlying the MR should only be freed once the MR is fenced
      at the hardware.
      
      As of the above we first destroy the MKEY and just after that can safely
      call to dma_unmap_single().
      
      Link: https://lore.kernel.org/r/20190723065733.4899-6-leon@kernel.org
      Cc: <stable@vger.kernel.org> # 4.3
      Fixes: 8a187ee5 ("IB/mlx5: Support the new memory registration API")
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      924308d2
    • Y
      IB/mlx5: Move MRs to a kernel PD when freeing them to the MR cache · 7e5ce9f3
      Yishai Hadas 提交于
      commit 9ec4483a3f0f71a228a5933bc040441322bfb090 upstream.
      
      Fix unreg_umr to move the MR to a kernel owned PD (i.e. the UMR PD) which
      can't be accessed by userspace.
      
      This ensures that nothing can continue to access the MR once it has been
      placed in the kernels cache for reuse.
      
      MRs in the cache continue to have their HW state, including DMA tables,
      present. Even though the MR has been invalidated, changing the PD provides
      an additional layer of protection against use of the MR.
      
      Link: https://lore.kernel.org/r/20190723065733.4899-5-leon@kernel.org
      Cc: <stable@vger.kernel.org> # 3.10
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e5ce9f3
    • Y
      IB/mlx5: Use direct mkey destroy command upon UMR unreg failure · 3cfa1087
      Yishai Hadas 提交于
      commit afd1417404fba6dbfa6c0a8e5763bd348da682e4 upstream.
      
      Use a direct firmware command to destroy the mkey in case the unreg UMR
      operation has failed.
      
      This prevents a case that a mkey will leak out from the cache post a
      failure to be destroyed by a UMR WR.
      
      In case the MR cache limit didn't reach a call to add another entry to the
      cache instead of the destroyed one is issued.
      
      In addition, replaced a warn message to WARN_ON() as this flow is fatal
      and can't happen unless some bug around.
      
      Link: https://lore.kernel.org/r/20190723065733.4899-4-leon@kernel.org
      Cc: <stable@vger.kernel.org> # 4.10
      Fixes: 49780d42 ("IB/mlx5: Expose MR cache for mlx5_ib")
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3cfa1087
    • Y
      IB/mlx5: Fix unreg_umr to ignore the mkey state · 41be1928
      Yishai Hadas 提交于
      commit 6a053953739d23694474a5f9c81d1a30093da81a upstream.
      
      Fix unreg_umr to ignore the mkey state and do not fail if was freed.  This
      prevents a case that a user space application already changed the mkey
      state to free and then the UMR operation will fail leaving the mkey in an
      inappropriate state.
      
      Link: https://lore.kernel.org/r/20190723065733.4899-3-leon@kernel.org
      Cc: <stable@vger.kernel.org> # 3.19
      Fixes: 968e78dd ("IB/mlx5: Enhance UMR support to allow partial page table update")
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41be1928
    • J
      xen/swiotlb: fix condition for calling xen_destroy_contiguous_region() · 04fdca1f
      Juergen Gross 提交于
      commit 50f6393f9654c561df4cdcf8e6cfba7260143601 upstream.
      
      The condition in xen_swiotlb_free_coherent() for deciding whether to
      call xen_destroy_contiguous_region() is wrong: in case the region to
      be freed is not contiguous calling xen_destroy_contiguous_region() is
      the wrong thing to do: it would result in inconsistent mappings of
      multiple PFNs to the same MFN. This will lead to various strange
      crashes or data corruption.
      
      Instead of calling xen_destroy_contiguous_region() in that case a
      warning should be issued as that situation should never occur.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      04fdca1f
    • M
      nbd: replace kill_bdev() with __invalidate_device() again · eb828241
      Munehisa Kamata 提交于
      commit 2b5c8f0063e4b263cf2de82029798183cf85c320 upstream.
      
      Commit abbbdf12 ("replace kill_bdev() with __invalidate_device()")
      once did this, but 29eaadc0 ("nbd: stop using the bdev everywhere")
      resurrected kill_bdev() and it has been there since then. So buffer_head
      mappings still get killed on a server disconnection, and we can still
      hit the BUG_ON on a filesystem on the top of the nbd device.
      
        EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
        block nbd0: Receive control failed (result -32)
        block nbd0: shutting down sockets
        print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
        EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
        print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
        EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
        EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
        ------------[ cut here ]------------
        kernel BUG at fs/buffer.c:3057!
        invalid opcode: 0000 [#1] SMP PTI
        CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
        Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
        RIP: 0010:submit_bh_wbc+0x18b/0x190
        ...
        Call Trace:
         jbd2_write_superblock+0xf1/0x230 [jbd2]
         ? account_entity_enqueue+0xc5/0xf0
         jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
         jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
         ? __switch_to_asm+0x40/0x70
         ...
         ? lock_timer_base+0x67/0x80
         kjournald2+0x121/0x360 [jbd2]
         ? remove_wait_queue+0x60/0x60
         kthread+0xf8/0x130
         ? commit_timeout+0x10/0x10 [jbd2]
         ? kthread_bind+0x10/0x10
         ret_from_fork+0x35/0x40
      
      With __invalidate_device(), I no longer hit the BUG_ON with sync or
      unmount on the disconnected device.
      
      Fixes: 29eaadc0 ("nbd: stop using the bdev everywhere")
      Cc: linux-block@vger.kernel.org
      Cc: Ratna Manoj Bolla <manoj.br@gmail.com>
      Cc: nbd@other.debian.org
      Cc: stable@vger.kernel.org
      Cc: David Woodhouse <dwmw@amazon.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NMunehisa Kamata <kamatam@amazon.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb828241
    • W
      arm64: cpufeature: Fix feature comparison for CTR_EL0.{CWG,ERG} · 8dfef0f4
      Will Deacon 提交于
      commit 147b9635e6347104b91f48ca9dca61eb0fbf2a54 upstream.
      
      If CTR_EL0.{CWG,ERG} are 0b0000 then they must be interpreted to have
      their architecturally maximum values, which defeats the use of
      FTR_HIGHER_SAFE when sanitising CPU ID registers on heterogeneous
      machines.
      
      Introduce FTR_HIGHER_OR_ZERO_SAFE so that these fields effectively
      saturate at zero.
      
      Fixes: 3c739b57 ("arm64: Keep track of CPU feature registers")
      Cc: <stable@vger.kernel.org> # 4.4.x-
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dfef0f4
    • W
      arm64: compat: Allow single-byte watchpoints on all addresses · 2bddc985
      Will Deacon 提交于
      commit 849adec41203ac5837c40c2d7e08490ffdef3c2c upstream.
      
      Commit d968d2b8 ("ARM: 7497/1: hw_breakpoint: allow single-byte
      watchpoints on all addresses") changed the validation requirements for
      hardware watchpoints on arch/arm/. Update our compat layer to implement
      the same relaxation.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bddc985
    • W
      drivers/perf: arm_pmu: Fix failure path in PM notifier · c385cda0
      Will Deacon 提交于
      commit 0d7fd70f26039bd4b33444ca47f0e69ce3ae0354 upstream.
      
      Handling of the CPU_PM_ENTER_FAILED transition in the Arm PMU PM
      notifier code incorrectly skips restoration of the counters. Fix the
      logic so that CPU_PM_ENTER_FAILED follows the same path as CPU_PM_EXIT.
      
      Cc: <stable@vger.kernel.org>
      Fixes: da4e4f18 ("drivers/perf: arm_pmu: implement CPU_PM notifier")
      Reported-by: NAnders Roxell <anders.roxell@linaro.org>
      Acked-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c385cda0
    • H
      parisc: Fix build of compressed kernel even with debug enabled · 5f80ac50
      Helge Deller 提交于
      commit 3fe6c873af2f2247544debdbe51ec29f690a2ccf upstream.
      
      With debug info enabled (CONFIG_DEBUG_INFO=y) the resulting vmlinux may get
      that huge that we need to increase the start addresss for the decompression
      text section otherwise one will face a linker error.
      Reported-by: NSven Schnelle <svens@stackframe.org>
      Tested-by: NSven Schnelle <svens@stackframe.org>
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f80ac50
    • C
      cgroup: kselftest: relax fs_spec checks · 001f93d9
      Chris Down 提交于
      commit b59b1baab789eacdde809135542e3d4f256f6878 upstream.
      
      On my laptop most memcg kselftests were being skipped because it claimed
      cgroup v2 hierarchy wasn't mounted, but this isn't correct.  Instead, it
      seems current systemd HEAD mounts it with the name "cgroup2" instead of
      "cgroup":
      
          % grep cgroup /proc/mounts
          cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0
      
      I can't think of a reason to need to check fs_spec explicitly
      since it's arbitrary, so we can just rely on fs_vfstype.
      
      After these changes, `make TARGETS=cgroup kselftest` actually runs the
      cgroup v2 tests in more cases.
      
      Link: http://lkml.kernel.org/r/20190723210737.GA487@chrisdown.nameSigned-off-by: NChris Down <chris@chrisdown.name>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      001f93d9
    • S
      s390/dasd: fix endless loop after read unit address configuration · 6cb9e0d9
      Stefan Haberland 提交于
      commit 41995342b40c418a47603e1321256d2c4a2ed0fb upstream.
      
      After getting a storage server event that causes the DASD device driver
      to update its unit address configuration during a device shutdown there is
      the possibility of an endless loop in the device driver.
      
      In the system log there will be ongoing DASD error messages with RC: -19.
      
      The reason is that the loop starting the ruac request only terminates when
      the retry counter is decreased to 0. But in the sleep_on function there are
      early exit paths that do not decrease the retry counter.
      
      Prevent an endless loop by handling those cases separately.
      
      Remove the unnecessary do..while loop since the sleep_on function takes
      care of retries by itself.
      
      Fixes: 8e09f215 ("[S390] dasd: add hyper PAV support to DASD device driver, part 1")
      Cc: stable@vger.kernel.org # 2.6.25+
      Signed-off-by: NStefan Haberland <sth@linux.ibm.com>
      Reviewed-by: NJan Hoeppner <hoeppner@linux.ibm.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6cb9e0d9
    • Y
      mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker · beb0cc78
      Yang Shi 提交于
      commit fa1e512fac717f34e7c12d7a384c46e90a647392 upstream.
      
      Shakeel Butt reported premature oom on kernel with
      "cgroup_disable=memory" since mem_cgroup_is_root() returns false even
      though memcg is actually NULL.  The drop_caches is also broken.
      
      It is because commit aeed1d32 ("mm/vmscan.c: generalize
      shrink_slab() calls in shrink_node()") removed the !memcg check before
      !mem_cgroup_is_root().  And, surprisingly root memcg is allocated even
      though memory cgroup is disabled by kernel boot parameter.
      
      Add mem_cgroup_disabled() check to make reclaimer work as expected.
      
      Link: http://lkml.kernel.org/r/1563385526-20805-1-git-send-email-yang.shi@linux.alibaba.com
      Fixes: aeed1d32 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()")
      Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reported-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Jan Hadrava <had@kam.mff.cuni.cz>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.19+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      beb0cc78
    • S
      ALSA: hda: Fix 1-minute detection delay when i915 module is not available · 72651bbd
      Samuel Thibault 提交于
      commit 74bf71ed792ab0f64631cc65ccdb54c356c36d45 upstream.
      
      Distribution installation images such as Debian include different sets
      of modules which can be downloaded dynamically.  Such images may notably
      include the hda sound modules but not the i915 DRM module, even if the
      latter was enabled at build time, as reported on
      https://bugs.debian.org/931507
      
      In such a case hdac_i915 would be linked in and try to load the i915
      module, fail since it is not there, but still wait for a whole minute
      before giving up binding with it.
      
      This fixes such as case by only waiting for the binding if the module
      was properly loaded (or module support is disabled, in which case i915
      is already compiled-in anyway).
      
      Fixes: f9b54e19 ("ALSA: hda/i915: Allow delayed i915 audio component binding")
      Signed-off-by: NSamuel Thibault <samuel.thibault@ens-lyon.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72651bbd
    • O
      selinux: fix memory leak in policydb_init() · 46650ac2
      Ondrej Mosnacek 提交于
      commit 45385237f65aeee73641f1ef737d7273905a233f upstream.
      
      Since roles_init() adds some entries to the role hash table, we need to
      destroy also its keys/values on error, otherwise we get a memory leak in
      the error path.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: syzbot+fee3a14d4cdf92646287@syzkaller.appspotmail.com
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46650ac2
    • M
      mtd: rawnand: micron: handle on-die "ECC-off" devices correctly · e7bb4c81
      Marco Felsch 提交于
      commit 8493b2a06fc5b77ef5c579dc32b12761f7b7a84c upstream.
      
      Some devices are not supposed to support on-die ECC but experience
      shows that internal ECC machinery can actually be enabled through the
      "SET FEATURE (EFh)" command, even if a read of the "READ ID Parameter
      Tables" returns that it is not.
      
      Currently, the driver checks the "READ ID Parameter" field directly
      after having enabled the feature. If the check fails it returns
      immediately but leaves the ECC on. When using buggy chips like
      MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will
      go through the on-die ECC, confusing the host controller which is
      supposed to be the one handling correction.
      
      To address this in a common way we need to turn off the on-die ECC
      directly after reading the "READ ID Parameter" and before checking the
      "ECC status".
      
      Cc: stable@vger.kernel.org
      Fixes: dbc44edb ("mtd: rawnand: micron: Fix on-die ECC detection logic")
      Signed-off-by: NMarco Felsch <m.felsch@pengutronix.de>
      Reviewed-by: NBoris Brezillon <boris.brezillon@collabora.com>
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e7bb4c81
    • G
      IB/hfi1: Fix Spectre v1 vulnerability · fafaeae4
      Gustavo A. R. Silva 提交于
      commit 6497d0a9c53df6e98b25e2b79f2295d7caa47b6e upstream.
      
      sl is controlled by user-space, hence leading to a potential
      exploitation of the Spectre variant 1 vulnerability.
      
      Fix this by sanitizing sl before using it to index ibp->sl_to_sc.
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Link: https://lore.kernel.org/r/20190731175428.GA16736@embeddedorSigned-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fafaeae4
    • M
      gpiolib: fix incorrect IRQ requesting of an active-low lineevent · fdb0fb56
      Michael Wu 提交于
      commit 223ecaf140b1dd1c1d2a1a1d96281efc5c906984 upstream.
      
      When a pin is active-low, logical trigger edge should be inverted to match
      the same interrupt opportunity.
      
      For example, a button pushed triggers falling edge in ACTIVE_HIGH case; in
      ACTIVE_LOW case, the button pushed triggers rising edge. For user space the
      IRQ requesting doesn't need to do any modification except to configuring
      GPIOHANDLE_REQUEST_ACTIVE_LOW.
      
      For example, we want to catch the event when the button is pushed. The
      button on the original board drives level to be low when it is pushed, and
      drives level to be high when it is released.
      
      In user space we can do:
      
      	req.handleflags = GPIOHANDLE_REQUEST_INPUT;
      	req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
      
      	while (1) {
      		read(fd, &dat, sizeof(dat));
      		if (dat.id == GPIOEVENT_EVENT_FALLING_EDGE)
      			printf("button pushed\n");
      	}
      
      Run the same logic on another board which the polarity of the button is
      inverted; it drives level to be high when pushed, and level to be low when
      released. For this inversion we add flag GPIOHANDLE_REQUEST_ACTIVE_LOW:
      
      	req.handleflags = GPIOHANDLE_REQUEST_INPUT |
      		GPIOHANDLE_REQUEST_ACTIVE_LOW;
      	req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
      
      At the result, there are no any events caught when the button is pushed.
      By the way, button releasing will emit a "falling" event. The timing of
      "falling" catching is not expected.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael Wu <michael.wu@vatics.com>
      Tested-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fdb0fb56
    • J
      mmc: meson-mx-sdio: Fix misuse of GENMASK macro · 7e3efb65
      Joe Perches 提交于
      commit 665e985c2f41bebc3e6cee7e04c36a44afbc58f7 upstream.
      
      Arguments are supposed to be ordered high then low.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Reviewed-by: NNeil Armstrong <narmstrong@baylibre.com>
      Fixes: ed80a13b ("mmc: meson-mx-sdio: Add a driver for the Amlogic
      Meson8 and Meson8b SoCs")
      Cc: stable@vger.kernel.org
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e3efb65
    • D
      mmc: dw_mmc: Fix occasional hang after tuning on eMMC · 29841b5c
      Douglas Anderson 提交于
      commit ba2d139b02ba684c6c101de42fed782d6cd2b997 upstream.
      
      In commit 46d17952 ("mmc: dw_mmc: Wait for data transfer after
      response errors.") we fixed a tuning-induced hang that I saw when
      stress testing tuning on certain SD cards.  I won't re-hash that whole
      commit, but the summary is that as a normal part of tuning you need to
      deal with transfer errors and there were cases where these transfer
      errors was putting my system into a bad state causing all future
      transfers to fail.  That commit fixed handling of the transfer errors
      for me.
      
      In downstream Chrome OS my fix landed and had the same behavior for
      all SD/MMC commands.  However, it looks like when the commit landed
      upstream we limited it to only SD tuning commands.  Presumably this
      was to try to get around problems that Alim Akhtar reported on exynos
      [1].
      
      Unfortunately while stress testing reboots (and suspend/resume) on
      some rk3288-based Chromebooks I found the same problem on the eMMC on
      some of my Chromebooks (the ones with Hynix eMMC).  Since the eMMC
      tuning command is different (MMC_SEND_TUNING_BLOCK_HS200
      vs. MMC_SEND_TUNING_BLOCK) we were basically getting back into the
      same situation.
      
      I'm hoping that whatever problems exynos was having in the past are
      somehow magically fixed now and we can make the behavior the same for
      all commands.
      
      [1] https://lkml.kernel.org/r/CAGOxZ53WfNbaMe0_AM0qBqU47kAfgmPBVZC8K8Y-_J3mDMqW4A@mail.gmail.com
      
      Fixes: 46d17952 ("mmc: dw_mmc: Wait for data transfer after response errors.")
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Alim Akhtar <alim.akhtar@gmail.com>
      Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29841b5c
    • F
      Btrfs: fix race leading to fs corruption after transaction abort · 50d70040
      Filipe Manana 提交于
      commit cb2d3daddbfb6318d170e79aac1f7d5e4d49f0d7 upstream.
      
      When one transaction is finishing its commit, it is possible for another
      transaction to start and enter its initial commit phase as well. If the
      first ends up getting aborted, we have a small time window where the second
      transaction commit does not notice that the previous transaction aborted
      and ends up committing, writing a superblock that points to btrees that
      reference extent buffers (nodes and leafs) that were not persisted to disk.
      The consequence is that after mounting the filesystem again, we will be
      unable to load some btree nodes/leafs, either because the content on disk
      is either garbage (or just zeroes) or corresponds to the old content of a
      previouly COWed or deleted node/leaf, resulting in the well known error
      messages "parent transid verify failed on ...".
      The following sequence diagram illustrates how this can happen.
      
              CPU 1                                           CPU 2
      
       <at transaction N>
      
       btrfs_commit_transaction()
         (...)
         --> sets transaction state to
             TRANS_STATE_UNBLOCKED
         --> sets fs_info->running_transaction
             to NULL
      
                                                          (...)
                                                          btrfs_start_transaction()
                                                            start_transaction()
                                                              wait_current_trans()
                                                                --> returns immediately
                                                                    because
                                                                    fs_info->running_transaction
                                                                    is NULL
                                                              join_transaction()
                                                                --> creates transaction N + 1
                                                                --> sets
                                                                    fs_info->running_transaction
                                                                    to transaction N + 1
                                                                --> adds transaction N + 1 to
                                                                    the fs_info->trans_list list
                                                              --> returns transaction handle
                                                                  pointing to the new
                                                                  transaction N + 1
                                                          (...)
      
                                                          btrfs_sync_file()
                                                            btrfs_start_transaction()
                                                              --> returns handle to
                                                                  transaction N + 1
                                                            (...)
      
         btrfs_write_and_wait_transaction()
           --> writeback of some extent
               buffer fails, returns an
      	 error
         btrfs_handle_fs_error()
           --> sets BTRFS_FS_STATE_ERROR in
               fs_info->fs_state
         --> jumps to label "scrub_continue"
         cleanup_transaction()
           btrfs_abort_transaction(N)
             --> sets BTRFS_FS_STATE_TRANS_ABORTED
                 flag in fs_info->fs_state
             --> sets aborted field in the
                 transaction and transaction
      	   handle structures, for
                 transaction N only
           --> removes transaction from the
               list fs_info->trans_list
                                                            btrfs_commit_transaction(N + 1)
                                                              --> transaction N + 1 was not
      							    aborted, so it proceeds
                                                              (...)
                                                              --> sets the transaction's state
                                                                  to TRANS_STATE_COMMIT_START
                                                              --> does not find the previous
                                                                  transaction (N) in the
                                                                  fs_info->trans_list, so it
                                                                  doesn't know that transaction
                                                                  was aborted, and the commit
                                                                  of transaction N + 1 proceeds
                                                              (...)
                                                              --> sets transaction N + 1 state
                                                                  to TRANS_STATE_UNBLOCKED
                                                              btrfs_write_and_wait_transaction()
                                                                --> succeeds writing all extent
                                                                    buffers created in the
                                                                    transaction N + 1
                                                              write_all_supers()
                                                                 --> succeeds
                                                                 --> we now have a superblock on
                                                                     disk that points to trees
                                                                     that refer to at least one
                                                                     extent buffer that was
                                                                     never persisted
      
      So fix this by updating the transaction commit path to check if the flag
      BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting
      the transaction to the TRANS_STATE_COMMIT_START we do not find any previous
      transaction in the fs_info->trans_list. If the flag is set, just fail the
      transaction commit with -EROFS, as we do in other places. The exact error
      code for the previous transaction abort was already logged and reported.
      
      Fixes: 49b25e05 ("btrfs: enhance transaction abort infrastructure")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50d70040
    • F
      Btrfs: fix incremental send failure after deduplication · 009d7a4e
      Filipe Manana 提交于
      commit b4f9a1a87a48c255bb90d8a6c3d555a1abb88130 upstream.
      
      When doing an incremental send operation we can fail if we previously did
      deduplication operations against a file that exists in both snapshots. In
      that case we will fail the send operation with -EIO and print a message
      to dmesg/syslog like the following:
      
        BTRFS error (device sdc): Send: inconsistent snapshot, found updated \
        extent for inode 257 without updated inode item, send root is 258, \
        parent root is 257
      
      This requires that we deduplicate to the same file in both snapshots for
      the same amount of times on each snapshot. The issue happens because a
      deduplication only updates the iversion of an inode and does not update
      any other field of the inode, therefore if we deduplicate the file on
      each snapshot for the same amount of time, the inode will have the same
      iversion value (stored as the "sequence" field on the inode item) on both
      snapshots, therefore it will be seen as unchanged between in the send
      snapshot while there are new/updated/deleted extent items when comparing
      to the parent snapshot. This makes the send operation return -EIO and
      print an error message.
      
      Example reproducer:
      
        $ mkfs.btrfs -f /dev/sdb
        $ mount /dev/sdb /mnt
      
        # Create our first file. The first half of the file has several 64Kb
        # extents while the second half as a single 512Kb extent.
        $ xfs_io -f -s -c "pwrite -S 0xb8 -b 64K 0 512K" /mnt/foo
        $ xfs_io -c "pwrite -S 0xb8 512K 512K" /mnt/foo
      
        # Create the base snapshot and the parent send stream from it.
        $ btrfs subvolume snapshot -r /mnt /mnt/mysnap1
        $ btrfs send -f /tmp/1.snap /mnt/mysnap1
      
        # Create our second file, that has exactly the same data as the first
        # file.
        $ xfs_io -f -c "pwrite -S 0xb8 0 1M" /mnt/bar
      
        # Create the second snapshot, used for the incremental send, before
        # doing the file deduplication.
        $ btrfs subvolume snapshot -r /mnt /mnt/mysnap2
      
        # Now before creating the incremental send stream:
        #
        # 1) Deduplicate into a subrange of file foo in snapshot mysnap1. This
        #    will drop several extent items and add a new one, also updating
        #    the inode's iversion (sequence field in inode item) by 1, but not
        #    any other field of the inode;
        #
        # 2) Deduplicate into a different subrange of file foo in snapshot
        #    mysnap2. This will replace an extent item with a new one, also
        #    updating the inode's iversion by 1 but not any other field of the
        #    inode.
        #
        # After these two deduplication operations, the inode items, for file
        # foo, are identical in both snapshots, but we have different extent
        # items for this inode in both snapshots. We want to check this doesn't
        # cause send to fail with an error or produce an incorrect stream.
      
        $ xfs_io -r -c "dedupe /mnt/bar 0 0 512K" /mnt/mysnap1/foo
        $ xfs_io -r -c "dedupe /mnt/bar 512K 512K 512K" /mnt/mysnap2/foo
      
        # Create the incremental send stream.
        $ btrfs send -p /mnt/mysnap1 -f /tmp/2.snap /mnt/mysnap2
        ERROR: send ioctl failed with -5: Input/output error
      
      This issue started happening back in 2015 when deduplication was updated
      to not update the inode's ctime and mtime and update only the iversion.
      Back then we would hit a BUG_ON() in send, but later in 2016 send was
      updated to return -EIO and print the error message instead of doing the
      BUG_ON().
      
      A test case for fstests follows soon.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203933
      Fixes: 1c919a5e ("btrfs: don't update mtime/ctime on deduped inodes")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      009d7a4e
    • M
      kbuild: initialize CLANG_FLAGS correctly in the top Makefile · 4c5a4425
      Masahiro Yamada 提交于
      commit 5241ab4cf42d3a93b933b55d3d53f43049081fa1 upstream.
      
      CLANG_FLAGS is initialized by the following line:
      
        CLANG_FLAGS     := --target=$(notdir $(CROSS_COMPILE:%-=%))
      
      ..., which is run only when CROSS_COMPILE is set.
      
      Some build targets (bindeb-pkg etc.) recurse to the top Makefile.
      
      When you build the kernel with Clang but without CROSS_COMPILE,
      the same compiler flags such as -no-integrated-as are accumulated
      into CLANG_FLAGS.
      
      If you run 'make CC=clang' and then 'make CC=clang bindeb-pkg',
      Kbuild will recompile everything needlessly due to the build command
      change.
      
      Fix this by correctly initializing CLANG_FLAGS.
      
      Fixes: 238bcbc4e07f ("kbuild: consolidate Clang compiler flags")
      Cc: <stable@vger.kernel.org> # v5.0+
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Acked-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c5a4425
    • M
      kconfig: Clear "written" flag to avoid data loss · 3736612d
      M. Vefa Bicakci 提交于
      commit 0c5b6c28ed68becb692b43eae5e44d5aa7e160ce upstream.
      
      Prior to this commit, starting nconfig, xconfig or gconfig, and saving
      the .config file more than once caused data loss, where a .config file
      that contained only comments would be written to disk starting from the
      second save operation.
      
      This bug manifests itself because the SYMBOL_WRITTEN flag is never
      cleared after the first call to conf_write, and subsequent calls to
      conf_write then skip all of the configuration symbols due to the
      SYMBOL_WRITTEN flag being set.
      
      This commit resolves this issue by clearing the SYMBOL_WRITTEN flag
      from all symbols before conf_write returns.
      
      Fixes: 8e2442a5f86e ("kconfig: fix missing choice values in auto.conf")
      Cc: linux-stable <stable@vger.kernel.org> # 4.19+
      Signed-off-by: NM. Vefa Bicakci <m.v.b@runbox.com>
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3736612d
    • Y
      drm/nouveau: fix memory leak in nouveau_conn_reset() · 4c6500b5
      Yongxin Liu 提交于
      [ Upstream commit 09b90e2fe35faeace2488234e2a7728f2ea8ba26 ]
      
      In nouveau_conn_reset(), if connector->state is true,
      __drm_atomic_helper_connector_destroy_state() will be called,
      but the memory pointed by asyc isn't freed. Memory leak happens
      in the following function __drm_atomic_helper_connector_reset(),
      where newly allocated asyc->state will be assigned to connector->state.
      
      So using nouveau_conn_atomic_destroy_state() instead of
      __drm_atomic_helper_connector_destroy_state to free the "old" asyc.
      
      Here the is the log showing memory leak.
      
      unreferenced object 0xffff8c5480483c80 (size 192):
        comm "kworker/0:2", pid 188, jiffies 4294695279 (age 53.179s)
        hex dump (first 32 bytes):
          00 f0 ba 7b 54 8c ff ff 00 00 00 00 00 00 00 00  ...{T...........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000005005c0d0>] kmem_cache_alloc_trace+0x195/0x2c0
          [<00000000a122baed>] nouveau_conn_reset+0x25/0xc0 [nouveau]
          [<000000004fd189a2>] nouveau_connector_create+0x3a7/0x610 [nouveau]
          [<00000000c73343a8>] nv50_display_create+0x343/0x980 [nouveau]
          [<000000002e2b03c3>] nouveau_display_create+0x51f/0x660 [nouveau]
          [<00000000c924699b>] nouveau_drm_device_init+0x182/0x7f0 [nouveau]
          [<00000000cc029436>] nouveau_drm_probe+0x20c/0x2c0 [nouveau]
          [<000000007e961c3e>] local_pci_probe+0x47/0xa0
          [<00000000da14d569>] work_for_cpu_fn+0x1a/0x30
          [<0000000028da4805>] process_one_work+0x27c/0x660
          [<000000001d415b04>] worker_thread+0x22b/0x3f0
          [<0000000003b69f1f>] kthread+0x12f/0x150
          [<00000000c94c29b7>] ret_from_fork+0x3a/0x50
      Signed-off-by: NYongxin Liu <yongxin.liu@windriver.com>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4c6500b5